diff --git a/html_output/channel/boston-meetup/index.html b/html_output/channel/boston-meetup/index.html deleted file mode 100644 index 28d3a36..0000000 --- a/html_output/channel/boston-meetup/index.html +++ /dev/null @@ -1,459 +0,0 @@ - - - - - - Slack Export - #boston-meetup - - - - - -
- - - -
- - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-28 12:49:23
-
-

@Michael Robinson has joined the channel

- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-04-28 12:49:45
-
-

@Sheeri Cabral (Collibra) has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-04-28 12:49:45
-
-

@Harel Shein has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-28 12:51:20
-
-

Please join the meetup group: https://www.meetup.com/boston-data-lineage-meetup-group/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Viraj Parekh - (vmpvmp94@gmail.com) -
-
2023-04-28 12:51:49
-
-

@Viraj Parekh has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Eric Veleker - (eric@atlan.com) -
-
2023-05-02 15:16:45
-
-

@Eric Veleker has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-09 11:07:39
-
-

I’m courting 2 orgs that would give us free space in Boston 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-09 12:01:54
-
-

Sweet! Thank you. Please let me know if/how I can help

- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-24 11:04:42
-
-

I’m having no luck getting a venue, should we try CIC Boston?

- -

[11:04 AM] It’s a short walk from South Station (train terminal) and close to the airport too

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yuanli Wang - (yuanliw@bu.edu) -
-
2023-05-25 20:33:32
-
-

@Yuanli Wang has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nam Nguyen - (nam@astrafy.io) -
-
2023-07-14 05:37:43
-
-

@Nam Nguyen has joined the channel

- - - -
-
-
-
- - - -
-
- - - - \ No newline at end of file diff --git a/html_output/channel/dagster-integration/index.html b/html_output/channel/dagster-integration/index.html deleted file mode 100644 index 7aa61e4..0000000 --- a/html_output/channel/dagster-integration/index.html +++ /dev/null @@ -1,2285 +0,0 @@ - - - - - - Slack Export - #dagster-integration - - - - - -
- - - -
- - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-01-21 21:26:13
-
-

@Dalin Kim has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-01-21 21:28:55
-
-

@Kevin Mellott has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nafisah Islam - (nafisahislam@northwesternmutual.com) -
-
2022-01-21 21:28:56
-
-

@Nafisah Islam has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Antonio Moctezuma - (antoniomoctezuma@northwesternmutual.com) -
-
2022-01-21 21:28:56
-
-

@Antonio Moctezuma has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Joshua Wankowski - (joshuawankowski@northwesternmutual.com) -
-
2022-01-21 21:28:56
-
-

@Joshua Wankowski has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-01-21 21:28:56
-
-

@Maciej Obuchowski has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-01-21 21:28:56
-
-

@Julien Le Dem has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-01-21 22:17:56
-
-

Hello, my team would like to contribute to the OpenLineage-Dagster integration work and wanted to start a public channel for general discussion on this topic.

- -

Issue #489 is currently open for review and includes the proposal for the integration. As we proceed with the initial implementation, we’d appreciate feedback from the community to make sure the approach is reasonable and OpenLineage events are captured accurately.

- -

Looking forward to more discussions. Thanks!

-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
- 👏 Eric Veleker -
- -
-
-
-
- - - - - -
-
- - - - -
- -
firas - (firas.omrane.contact@gmail.com) -
-
2022-01-22 17:08:23
-
-

@firas has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Laurent Paris - (laurent@datakin.com) -
-
2022-01-24 11:49:36
-
-

@Laurent Paris has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-01-24 20:40:19
-
-

FYI, I reached out to the Dagster community and they replied to @Dalin Kim’s ticket: https://github.com/OpenLineage/OpenLineage/issues/489#issuecomment-1020718071

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-01-24 20:40:41
-
-

Thank you for getting this going @Dalin Kim!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-02-04 15:53:24
-
-

@Michael Robinson has joined the channel

- - - -
- 👋 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-02-04 15:59:17
-
-

Let me intro @Michael Robinson who among other things is looking for topics to speak about in the OpenLineage monthly meeting

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-02-04 16:08:51
-
-

Thanks, @Julien Le Dem. If anyone is interested in speaking at the next OL TSC meeting about their work on the Dagster integration, please reply here or message me. The integration is on the agenda for the upcoming https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting|meeting on 2/9 at 9 am PT. @Dalin Kim

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-02-04 17:18:32
-
-

*Thread Reply:* Hi Michael, While I’m currently going through internal review process before creating a PR, I can do a quick demo on the current OpenLineage sensor approach and get some initial feedback if that is okay.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-02-09 13:04:24
-
-

*Thread Reply:* thanks for the demo!

- - - -
- 👍 Dalin Kim -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-02-10 22:37:06
-
-

*Thread Reply:* Thanks for the opportunity!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-02-15 13:53:17
-
-

Hello, I created a pull request for the OpenLineage sensor work here. This initial work handles the basic lifecycles of Dagster jobs & ops (pipelines & steps), and more discussion will be needed to define how we can handle datasets. Hopefully, this PR is acceptable as an initial groundwork, and all feedback is appreciated. Thanks!

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-15 15:39:14
-
-

*Thread Reply:* Thanks for the PR! I'll take a look tomorrow.

- - - -
- 👍 Dalin Kim -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-16 09:37:27
-
-

*Thread Reply:* @Dalin Kim looks great! I approved it. One thing to do is to rebase on main and force-with-lease push: it appears that there are some conflicts.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-02-16 11:59:49
-
-

*Thread Reply:* @Maciej Obuchowski Thank you for the review. Just had a small conflict in CHANGELOG, which has been resolved. For integration test, do you suggest a similar approach like airflow integration using flask?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-02-16 12:03:08
-
-

*Thread Reply:* Also, I reached out to Sandy from Dagster for a review to make sure this is good from Dagster side of things.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-16 12:09:12
-
-

*Thread Reply:* > For integration test, do you suggest a similar approach like airflow integration using flask? -Yes, I think we can reuse that part. The most important thing is that we have real Dagster running "real" workloads.

- - - -
- 👍 Dalin Kim -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-02-17 00:45:37
-
-

*Thread Reply:* Documentation has been updated based on feedback from Yuhan from Dagster team, and I believe everything is set from my end.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-17 05:25:16
-
-

*Thread Reply:* Great! Let's just get rid of this linting error and I'll merge it then.

- -

https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/2204/workflows/cdb412e6-b41a-4fab-bc8e-d8bee71d051d/jobs/18963

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-02-17 11:27:41
-
-

*Thread Reply:* Thank you. I overlooked this when updating docstring. All should be good now.

- -

One final question - should we make the dagster unit test job “required” in the ci and how can that be configured?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-17 11:54:08
-
-

*Thread Reply:* @Willy Lulciuc I think you configured it, am I right?

- -

@Dalin Kim one more rebase, please 🙏 -I've turned auto-merge on, but unfortunately I can't rebase your branch on fork.

- - - -
- 👍 Dalin Kim -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-02-17 11:58:34
-
-

*Thread Reply:* Rebased and pushed

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-17 12:23:44
-
-

*Thread Reply:* @Dalin Kim merged!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-02-17 12:24:28
-
-

*Thread Reply:* @Maciej Obuchowski Awesome! Thank you so much for all your help!

- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-02-15 14:01:43
-
-

One small question on ci - airflow integration test checks are stuck in the “expected” state. Is this expected or is there something I missed in the ci update?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-15 14:47:49
-
-

*Thread Reply:* Sorry - we're not running integration tests for airflow on forks due to security reason. If you're not touching any Airflow files then it should not affect you at all 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-15 14:48:23
-
-

*Thread Reply:* In other words, it's just as expected.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-02-15 15:09:42
-
-

*Thread Reply:* Thank you for the clarification

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-02-17 11:54:18
-
-

@Willy Lulciuc has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nicola Monger - (nicola.monger@moonpig.com) -
-
2022-02-17 16:52:44
-
-

@Nicola Monger has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-18 13:03:29
-
-

@John Thomas has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dominique Tipton - (dominiquetipton@northwesternmutual.com) -
-
2022-03-01 17:21:49
-
-

@Dominique Tipton has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dominique Tipton - (dominiquetipton@northwesternmutual.com) -
-
2022-03-01 17:36:02
-
-

Hi all 👋

- -

I have opened up an issue/proposal on getting datasets incorporated with the dagster integration. I would love to get some feedback and conversations going with the community on the proposed approach. Thanks!

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Dalin Kim, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-04 06:59:34
-
-

FYI @Dalin Kim @Dominique Tipton Dagster 0.14.3 broke something, and unit tests started to fail. I've pinned version to 0.14.2 for now, but can you take a look?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-03-04 09:21:58
-
-

*Thread Reply:* Thanks for letting us know. We’ll take a look and follow up.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-03-04 12:32:26
-
-

*Thread Reply:* Just as an update on findings, it appears that this MR introduced a breaking change for the test helper function that creates a test EventLogRecord.

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-04 06:59:46
- -
-
-
- - - - - -
-
- - - - -
- -
Dalin Kim - (dalinkim@northwesternmutual.com) -
-
2022-03-04 14:38:59
-
-

Here is PR to fix the failing tests with latest Dagster version.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-04 15:35:01
-
-

Thanks! Merged.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ofek Braunstein - (ofekbraunshtein@gmail.com) -
-
2022-03-06 12:35:36
-
-

@Ofek Braunstein has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David ROBERT - (david.robert.ext@louisvuitton.com) -
-
2022-03-18 11:19:18
-
-

@David ROBERT has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-21 11:11:09
- -
-
-
- - - - - -
-
- - - - -
- -
Dominique Tipton - (dominiquetipton@northwesternmutual.com) -
-
2022-03-21 12:56:33
-
-

*Thread Reply:* Thanks for the heads up. We’ll look into it and follow up

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dominique Tipton - (dominiquetipton@northwesternmutual.com) -
-
2022-03-22 16:02:01
-
-

*Thread Reply:* Here is the PR to fix the error with the latest Dagster version

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-23 08:20:41
-
-

*Thread Reply:* Thanks! Merged.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dominique Tipton - (dominiquetipton@northwesternmutual.com) -
-
2022-03-23 09:49:53
-
-

*Thread Reply:* Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
marc_pan - (pxy0592@gmail.com) -
-
2022-03-28 23:03:45
-
-

@marc_pan has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nico Ritschel - (nico@antmoney.com) -
-
2022-03-30 20:23:13
-
-

@Nico Ritschel has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Orbit - -
-
2022-03-31 13:21:47
-
-

@Orbit has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nico Ritschel - (nico@antmoney.com) -
-
2022-03-31 14:18:06
-
-

I love the pattern of parsing logs in this integration, so much more flexible compared to the Airflow integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-31 15:59:33
-
-

*Thread Reply:* That's definitely the advantage of the log-parsing method. The Airflow integration, especially the most recent version for Airflow 2.3+, has the advantage of being more robust when it comes to delivering lineage in real-time

- - - -
- 🙌 Nico Ritschel -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Nico Ritschel - (nico@antmoney.com) -
-
2022-03-31 19:18:29
-
-

*Thread Reply:* Thanks for the heads up on this integration!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nico Ritschel - (nico@antmoney.com) -
-
2022-03-31 19:20:10
-
-

*Thread Reply:* I suspect future integrations will move towards this pattern as well? Sorry, off-topic for this channel, but this was the first place I've seen this metadata collection method in this project.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nico Ritschel - (nico@antmoney.com) -
-
2022-03-31 19:22:14
-
-

*Thread Reply:* I would be curious to explore similar external executor integrations for Airflow, say for Papermill or Kubernetes (via the corresponding operators). Suppose one would need to pass job metadata through to the respective platform where logs are actually collected.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhir Rao - (sudhir@zemosolabs.com) -
-
2022-04-01 13:25:58
-
-

@Sudhir Rao has joined the channel

- - - -
-
-
-
- - - -
-
- - - - \ No newline at end of file diff --git a/html_output/channel/data-council-meetup/index.html b/html_output/channel/data-council-meetup/index.html deleted file mode 100644 index bf5cbd0..0000000 --- a/html_output/channel/data-council-meetup/index.html +++ /dev/null @@ -1,423 +0,0 @@ - - - - - - Slack Export - #data-council-meetup - - - - - -
- - - -
- - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-14 15:42:50
-
-

@Michael Robinson has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-03-14 15:43:31
-
-

@Ross Turk has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2023-03-14 15:43:31
-
-

@John Thomas has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-03-14 15:43:32
-
-

@Julien Le Dem has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-03-14 15:44:36
-
-

👋

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-03-14 17:44:04
-
-

I will be arriving Monday evening and leaving Friday morning.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john.thomas@astronomer.io) -
-
2023-03-20 14:37:23
-
-

@John Thomas has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-03-23 12:16:54
-
-

I’m arriving Wednesday afternoon and leaving Thursday afternoon.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dev Jadhav - (dev.jadhav@loxsolution.com) -
-
2023-04-07 08:32:12
-
-

@Dev Jadhav has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nam Nguyen - (nam@astrafy.io) -
-
2023-07-14 05:37:46
-
-

@Nam Nguyen has joined the channel

- - - -
-
-
-
- - - -
-
- - - - \ No newline at end of file diff --git a/html_output/channel/dev-discuss/index.html b/html_output/channel/dev-discuss/index.html deleted file mode 100644 index aee329f..0000000 --- a/html_output/channel/dev-discuss/index.html +++ /dev/null @@ -1,2454 +0,0 @@ - - - - - - Slack Export - #dev-discuss - - - - - -
- - - -
- - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-11-14 12:13:06
-
-

@Harel Shein has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-14 12:13:10
-
-

@Maciej Obuchowski has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-11-14 12:13:46
-
-

@Julien Le Dem has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-14 12:13:46
-
-

@Paweł Leszczyński has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-14 12:13:46
-
-

@Jakub Dardziński has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-14 12:13:46
-
-

@Michael Robinson has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-11-14 12:13:46
-
-

@Willy Lulciuc has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Peter Hicks - (peter.hicks@astronomer.io) -
-
2023-11-14 12:13:46
-
-

@Peter Hicks has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-14 12:13:57
-
-

👋

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@rossturk.com) -
-
2023-11-14 12:14:02
-
-

@Ross Turk has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-14 12:16:19
-
-

👋

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-11-14 12:18:42
-
-

👋

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-11-14 12:18:53
-
-

👋

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@rossturk.com) -
-
2023-11-14 12:29:47
-
-

🌊

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-14 13:53:08
-
-

👋

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-14 18:30:48
- -
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-15 04:35:37
-
-

*Thread Reply:* hey look, more fun -https://github.com/OpenLineage/OpenLineage/pull/2263

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-15 05:03:58
-
-

*Thread Reply:* nice to have fun with you Jakub

- - - -
- 🙂 Jakub Dardziński, Harel Shein, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-15 05:42:34
-
-

*Thread Reply:* Can't wait to see it on the 1st January.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-11-15 06:56:03
-
-

*Thread Reply:* Ain’t no party like a dev ex improvement party

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-15 11:45:53
-
-

*Thread Reply:* Gentoo installation party is in similar category of fun

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-11-15 03:32:27
-
-

@Paweł Leszczyński approved PR #2661 with minor comments, I think the enum defined in the db layer is one comment we’ll need to address before merging; otherwise solid work dude 👌

-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 Paweł Leszczyński, Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-15 03:34:42
-
-

_Minor_: We can consider defining a _run_state column and eventually dropping the event_type. That is, we can consider columns prefixed with _ to be "remappings" of OL properties to Marquez. -> didn't get this one. Is it for now or some future plans?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-11-15 03:36:02
-
-

*Thread Reply:* future 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-15 03:36:10
-
-

*Thread Reply:* ok

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-15 03:36:23
-
-

*Thread Reply:* I will then replace enum with string

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-11-15 03:36:10
-
-

also, what about this PR? https://github.com/MarquezProject/marquez/pull/2654

-
- - - - - - - -
-
Labels
- docs, api -
- -
-
Comments
- 4 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-15 03:36:33
-
-

*Thread Reply:* this is the next to go

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-15 03:36:38
-
-

*Thread Reply:* and i consider it ready

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-15 03:37:31
-
-

*Thread Reply:* Then we have a draft one with streaming support https://github.com/MarquezProject/marquez/pull/2682/files -> which has an integration test of lineage endpoint working for streaming jobs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-15 03:38:32
-
-

*Thread Reply:* I still need to work on #2682 but you can review #2654. once you get some sleep, of course 😉

- - - -
- ❤️ Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-15 11:44:44
-
-

Got the doc + poc for hook-level coverage: https://docs.google.com/document/d/1q0shiUxopASO8glgMqjDn89xigJnGrQuBMbcRdolUdk/edit?usp=sharing

- - - -
- 👀 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-15 12:24:27
-
-

*Thread Reply:* did you check if LineageCollector is instantiated once per process?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-15 12:26:37
-
-

*Thread Reply:* Using it only via get_hook_lineage_collector

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-15 12:17:31
-
-

is it time to support hudi?

- -
- - - - - - - - - -
- - -
- 😂 Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-15 14:57:10
-
-

Anyone have thoughts about how to address the question about “pain points” here? https://openlineage.slack.com/archives/C01CK9T7HKR/p1700064564825909. (Listing pros is easy — it’s the cons we don’t have boilerplate for)

-
- - -
- - - } - - Naresh reddy - (https://openlineage.slack.com/team/U066HKFCHUG) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-15 14:58:08
-
-

*Thread Reply:* Maybe something like “OL has many desirable integrations, including a best-in-class Spark integration, but it’s like any other open standard in that it requires contributions in order to approach total coverage. Thankfully, we have many active contributors, and integrations are being added or improved upon all the time.”

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-15 16:04:51
-
-

*Thread Reply:* Maybe rephrase pain points to "something we're not actively focusing on"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-15 14:59:19
-
-

Apparently an admin can view a Slack archive at any time at this URL: https://openlineage.slack.com/services/export. Only public channels are available, though.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-11-15 16:53:09
-
-

*Thread Reply:* you are now admin

- - - -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-11-15 17:32:26
-
-

have we discussed adding column level lineage support to Airflow? https://marquezproject.slack.com/archives/C01E8MQGJP7/p1700087438599279?thread_ts=1700084629.245949&cid=C01E8MQGJP7

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-15 17:33:19
-
-

*Thread Reply:* we have it in SQL operators

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-11-15 17:34:25
-
-

*Thread Reply:* OOh any docs / code? or if you’d like to respond in the MQZ slack 🙏

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-15 17:35:19
-
-

*Thread Reply:* I’ll reply there

- - - -
- ❤️ Willy Lulciuc, Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-15 17:50:23
-
-

Any opinions about a free task management alternative to the free version of Notion (10-person limit)? Looking at Trello for keeping track of talks.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-11-15 19:32:17
-
-

*Thread Reply:* What about GitHub projects?

- - - -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-16 09:27:46
-
-

*Thread Reply:* Projects is the way to go, thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-16 10:23:34
-
-

*Thread Reply:* Set up a Projects board. New projects are private by default. We could make it public. The one thing that’s missing that we could use is a built-in date field for alerting about upcoming deadlines…

- - - -
- 🙌 Harel Shein, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-16 09:31:24
-
-

worlds are colliding: 6point6 has been acquired by Accenture

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-16 09:31:59
-
-

*Thread Reply:* https://newsroom.accenture.com/news/2023/accenture-to-expand-government-transformation-capabilities-in-the-uk-with-acquisition-of-6point6

-
-
newsroom.accenture.com
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-16 10:03:27
-
-

*Thread Reply:* We should sell OL to governments

- - - -
- 🙃 Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-11-16 10:20:36
-
-

*Thread Reply:* we may have to rebrand to ClosedLineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-16 10:23:37
-
-

*Thread Reply:* not in this way; just emit any event second time to secret NSA endpoint

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-16 11:13:17
-
-

*Thread Reply:* we would need to improve our stock photo game

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-16 12:17:22
-
-

CFP for Berlin Buzzwords went up: https://2024.berlinbuzzwords.de/call-for-papers/ -Still over 3 months to submit 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-16 12:42:56
-
-

*Thread Reply:* thanks, updated the talks board

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-16 12:43:10
- -
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-16 15:19:53
-
-

*Thread Reply:* I'm in, will think what to talk about and appreciate any advice 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-11-17 13:42:19
-
-

just searching for OpenLineage in the Datahub code base. They have an “interesting” approach? https://github.com/datahub-project/datahub/blob/2b0811b9875d7d7ea11fb01d0157a21fdd[…]odules/airflow-plugin/src/datahubairflowplugin/_extractors.py

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-11-17 13:47:21
-
-

*Thread Reply:* It looks like the datahub airflow plugin uses OL. but turns it off -https://github.com/datahub-project/datahub/blob/2b0811b9875d7d7ea11fb01d0157a21fdd67f020/docs/lineage/airflow.md -disable_openlineage_plugin true Disable the OpenLineage plugin to avoid duplicative processing. -They reuse the extractors but then “patch” the behavior.

-
- - - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-11-17 13:48:52
-
-

*Thread Reply:* Of course this approach will need changing again with AF 2.7

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-11-17 13:49:02
-
-

*Thread Reply:* It’s their choice 🤷

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-11-17 13:51:23
-
-

*Thread Reply:* It looks like we can possibly learn from their approach in SQL parsing: https://datahubproject.io/docs/lineage/airflow/#automatic-lineage-extraction

-
-
datahubproject.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-17 16:42:51
-
-

*Thread Reply:* what's that approach? I only know they have been claiming best SQL parsing capabilities

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-11-17 20:54:48
-
-

*Thread Reply:* I haven’t looked in the details but I’m assuming it is in this repo. (my comment is entirely based on the claim here)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-20 02:58:07
-
-

*Thread Reply:* <https://www.acryldata.io/blog/extracting-column-level-lineage-from-sql> -> The interesting difference is that in order to find table schemas, they use their data catalog to evaluate column-level lineage instead of doing this on the client side.

- -

My understanding by example is: If you do -create table x as select ** from y -you need to resolve ** to know column level lineage. Our approach is to do that on the client side, probably with an extra call to database. Their approach is to do that based on the data catalog information.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-11-17 20:56:54
-
-

I’m off on vacation. See you in a week

- - - -
- ❤️ Jakub Dardziński, Maciej Obuchowski, Paweł Leszczyński, Harel Shein, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-21 05:23:31
-
-

Maybe move today's meeting earlier, since no one from west coast is joining? @Harel Shein

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-11-21 09:27:22
-
-

*Thread Reply:* Ah! That would have been a good idea, but I can’t :(

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-11-21 09:27:44
-
-

*Thread Reply:* Do you prefer an earlier meeting tomorrow?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-21 09:28:54
-
-

*Thread Reply:* maybe let's keep today's meeting then

- - - -
- 👍 Harel Shein -
- -
-
-
-
- - - -
-
- - - - \ No newline at end of file diff --git a/html_output/channel/general/index.html b/html_output/channel/general/index.html deleted file mode 100644 index 3323de9..0000000 --- a/html_output/channel/general/index.html +++ /dev/null @@ -1,151812 +0,0 @@ - - - - - - Slack Export - #general - - - - - -
- - - -
- - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-20 21:01:02
-
-

@Julien Le Dem has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars.th.lan@gmail.com) -
-
2020-10-21 08:23:39
-
-

@Mars Lan has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Wes McKinney - (wesmckinn@gmail.com) -
-
2020-10-21 11:39:13
-
-

@Wes McKinney has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ryan Blue - (rblue@netflix.com) -
-
2020-10-21 12:46:39
-
-

@Ryan Blue has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Drew Banin - (drew@fishtownanalytics.com) -
-
2020-10-21 12:53:42
-
-

@Drew Banin has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2020-10-21 13:29:49
-
-

@Willy Lulciuc has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lewis Hemens - (lewis@dataform.co) -
-
2020-10-21 13:52:50
-
-

@Lewis Hemens has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-21 14:15:41
-
-

This is the official start of the OpenLineage initiative. Thank you all for joining. First item is to provide feedback on the doc: https://docs.google.com/document/d/1qL_mkd9lFfe_FMoLTyPIn80-fpvZUAdEIfrabn8bfLE/edit

- - - -
- 🎉 Willy Lulciuc, Abe Gong -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-10-21 23:22:03
-
-

@Abe Gong has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Shirshanka Das - (sdas@linkedin.com) -
-
2020-10-22 13:50:35
-
-

@Shirshanka Das has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
deleted_profile - (fengtao04@gmail.com) -
-
2020-10-23 15:03:44
-
-

@deleted_profile has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Chris White - (chris@prefect.io) -
-
2020-10-23 19:30:36
-
-

@Chris White has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-24 19:29:04
-
-

Thanks all for joining. In addition to the google doc, I have opened a pull request with an initial openapi spec: https://github.com/OpenLineage/OpenLineage/pull/1 -The goal is to specify the initial model (just plain lineage) that will be extended with various facets. -It does not intend to restrict to HTTP. Those same PUT calls without output can be translated to any async protocol

-
-
GitHub
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-24 19:31:09
- -
-
-
- - - - - -
-
- - - - -
- -
Wes McKinney - (wesmckinn@gmail.com) -
-
2020-10-25 12:13:26
-
-

Am I the only weirdo that would prefer a Google Group mailing list to Slack for communicating?

- - - -
- 👍 Ryan Blue -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-25 17:22:09
-
-

*Thread Reply:* slack is the new email?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Wes McKinney - (wesmckinn@gmail.com) -
-
2020-10-25 17:40:19
-
-

*Thread Reply:* :(

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ryan Blue - (rblue@netflix.com) -
-
2020-10-27 12:27:04
-
-

*Thread Reply:* I'd prefer a google group as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ryan Blue - (rblue@netflix.com) -
-
2020-10-27 12:27:25
-
-

*Thread Reply:* I think that is better for keeping people engaged, since it isn't just a ton of history to go through

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ryan Blue - (rblue@netflix.com) -
-
2020-10-27 12:27:38
-
-

*Thread Reply:* And I think it is also better for having thoughtful design discussions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-29 15:40:14
-
-

*Thread Reply:* I’m happy to create a google group if that would help.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-29 15:45:23
-
-

*Thread Reply:* Here it is: https://groups.google.com/g/openlineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-29 15:46:34
-
-

*Thread Reply:* Slack is more of a way to nudge discussions along, we can use github issues or the mailing list to discuss specific points

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-11-03 17:34:53
-
-

*Thread Reply:* @Ryan Blue and @Wes McKinney any recommendations on automating sending github issues update to that list?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ryan Blue - (rblue@netflix.com) -
-
2020-11-03 17:35:34
-
-

*Thread Reply:* I don't really know how to do that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ravi Suhag - (suhag.ravi@gmail.com) -
-
2021-04-02 07:18:25
-
-

*Thread Reply:* @Julien Le Dem How about using Github discussions. They are specifically meant to solve this problem. Feature is still in beta, but it be enabled from repository settings. One positive side i see is that it will really easy to follow through and one separate place to go and look for discussions and ideas which are being discussed.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-02 19:51:55
-
-

*Thread Reply:* I just enabled it: https://github.com/OpenLineage/OpenLineage/discussions

- - - -
- 🙌 Ravi Suhag -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Wes McKinney - (wesmckinn@gmail.com) -
-
2020-10-25 12:14:06
-
-

Or GitHub Issues

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-25 17:21:44
-
-

*Thread Reply:* the plan is to use github issues for discussions on the spec. This is to supplement

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Laurent Paris - (laurent@datakin.com) -
-
2020-10-26 19:28:17
-
-

@Laurent Paris has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Josh Benamram - (josh@databand.ai) -
-
2020-10-27 21:17:30
-
-

@Josh Benamram has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Victor Shafran - (victor.shafran@databand.ai) -
-
2020-10-28 04:07:27
-
-

@Victor Shafran has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Victor Shafran - (victor.shafran@databand.ai) -
-
2020-10-28 04:09:00
-
-

👋 Hi everyone!

- - - -
- 👋 Willy Lulciuc, Abe Gong, Drew Banin, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Zhamak Dehghani - (zdehghan@thoughtworks.com) -
-
2020-10-29 17:59:31
-
-

@Zhamak Dehghani has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-11-02 18:30:51
-
-

I’ve opened a github issue to propose OpenAPI as the way to define the lineage metadata: https://github.com/OpenLineage/OpenLineage/issues/2 -I have also started a thread on the OpenLineage group: https://groups.google.com/g/openlineage/c/2i7ogPl1IP4 -Discussion should happen there: ^

-
-
GitHub
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Evgeny Shulman - (evgeny.shulman@databand.ai) -
-
2020-11-04 10:56:00
-
-

@Evgeny Shulman has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-11-05 20:51:22
-
-

FYI I have updated the PR with a simple genrator: https://github.com/OpenLineage/OpenLineage/pull/1

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Comments
- 6 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Henneberger - (danny@datakin.com) -
-
2020-11-11 15:05:46
-
-

@Daniel Henneberger has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-08 17:27:57
-
-

Please send me your github ids if you wish to be added to the github repo

- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Fabrice Etanchaud - (fabrice.etanchaud@netc.fr) -
-
2020-12-10 02:10:35
-
-

@Fabrice Etanchaud has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-10 17:04:29
-
-

As mentioned on the mailing List, the initial spec is ready for a final review. Thanks for all who gave feedback so far.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-10 17:04:39
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/1

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Comments
- 6 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-10 17:04:51
-
-

The next step will be to define individual facets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-13 00:28:11
-
-

I have opened a PR to update the ReadMe: https://openlineage.slack.com/archives/C01EB6DCLHX/p1607835827000100

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - -
Pull request opened by julienledem
- - - - - - - - - - - - - - -
- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2020-12-14 17:55:46
-
-

*Thread Reply:* Looks great!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maxime Beauchemin - (max@preset.io) -
-
2020-12-13 17:45:49
-
-

👋

- - - -
- 👋 Shirshanka Das, Julien Le Dem, Willy Lulciuc, Arthur Wiedmer, Mario Measic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-14 20:19:57
-
-

I’m planning to merge https://github.com/OpenLineage/OpenLineage/pull/1 soon. That will be the base that we can iterate on and will enable starting the discussion on individual facets

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Comments
- 6 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-16 21:40:52
-
-

Thank you all for the feedback. I have made an update to the initial spec adressing the final comments

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-16 21:41:16
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/1

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Comments
- 7 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-19 11:21:27
-
-

The contributing guide is available here: https://github.com/OpenLineage/OpenLineage/blob/main/CONTRIBUTING.md -Here is an example proposal for adding a new facet: https://github.com/OpenLineage/OpenLineage/issues/9

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Josh Benamram, Victor Shafran -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-19 18:27:36
-
-

Welcome to the newly joined members 🙂 👋

- - - -
- 👋 Chris Lambert, Ananth Packkildurai, Arthur Wiedmer, Abe Gong, ale, James Le, Ha Pham, David Krevitt, Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ash Berlin-Taylor - (ash@apache.org) -
-
2020-12-21 05:23:21
-
-

Hello! Airflow PMC member here. Super interested in this effort

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-21 12:15:42
-
-

*Thread Reply:* Welcome!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ash Berlin-Taylor - (ash@apache.org) -
-
2020-12-21 05:25:07
-
-

I'm joining this slack now, but I'm basically done for the year, so will investigate proposals etc next year

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2020-12-21 10:02:37
-
-

Hey all 👋 Super curious what people's thoughts are on the best way for data quality tools i.e. Great Expectations to integrate with OpenLineage. Probably a Dataset level facet of some sort (from the 25 minutes of deep spec knowledge I have 😆), but curious if that's something being worked on? @Abe Gong

- - - -
- 👋 Abe Gong, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-12-21 10:30:51
-
-

*Thread Reply:* Yes, that’s about right.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-12-21 10:31:45
-
-

*Thread Reply:* There’s some subtlety here.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-12-21 10:32:02
-
-

*Thread Reply:* The initial OpenLineage spec is pretty explicit about linking metadata primarily to execution of specific tasks, which is appropriate for ValidationResults in Great Expectations

- - - -
- ✅ Zachary Friedman -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-12-21 10:32:57
-
-

*Thread Reply:* There isn’t as strong a concept of persistent data objects (e.g. a specific table, or batches of data from a specific table)

- - - -
- ✅ Zachary Friedman -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-12-21 10:33:20
-
-

*Thread Reply:* (In the GE ecosystem, we call these DataAssets and Batches)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-12-21 10:33:56
-
-

*Thread Reply:* This is also an important conceptual unit, since it’s the level of analysis where Expectations and data docs would typically attach.

- - - -
- ✅ Zachary Friedman -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-12-21 10:34:47
-
-

*Thread Reply:* @James Campbell and I have had some productive conversations with @Julien Le Dem and others about this topic

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-21 12:20:53
-
-

*Thread Reply:* Yep! The next step will be to open a few github issues with proposals to add to or amend the spec. We would probably start with a Descriptive Dataset facet of a dataset profile (or dataset update profile). There are other aspects to clarify as well as @Abe Gong is explaining above.

- - - -
- ✅ James Campbell -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2020-12-21 10:08:24
-
-

Also interesting to see where this would hook into Dagster. Because one of the many great features of Dagster IMO is it let you do stuff like this (without a formal spec albeit). An OpenLineageMaterialization could be interesting

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-21 12:23:41
-
-

*Thread Reply:* Totally! We had a quick discussion with Dagster. Looking forward to proposals along those lines.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harikiran Nayak - (hari@streamsets.com) -
-
2020-12-21 14:35:11
-
-

Congrats @Julien Le Dem @Willy Lulciuc and team on launching OpenLineage!

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2020-12-21 14:48:11
-
-

*Thread Reply:* Thanks, @Harikiran Nayak! It’s amazing to see such interest in the community on defining a standard for lineage metadata collection.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harikiran Nayak - (hari@streamsets.com) -
-
2020-12-21 15:03:29
-
-

*Thread Reply:* Yep! Its a validation that the problem is real!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kriti - (kathuriakritihp@gmail.com) -
-
2020-12-22 02:05:45
-
-

Hey folks! -Worked on a variety of lineage problems across domains. Super excited about this initiative!

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-22 13:23:43
-
-

*Thread Reply:* Welcome!

- - - -
- 👋 Kriti -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-30 22:30:23
-
-

*Thread Reply:* What are you current use cases for lineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-22 19:54:33
-
-

(for review) Proposal issue template: https://github.com/OpenLineage/OpenLineage/pull/11

-
-
GitHub
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-22 19:55:16
-
-

for people interested, <#C01EB6DCLHX|github-notifications> has the github integration that will notify of new PRs …

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Charrel - (martin.charrel@datadoghq.com) -
-
2020-12-29 09:39:46
-
-

👋 Hello! I'm currently working on lineage systems @ Datadog. Super excited to learn more about this effort

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-30 22:28:54
-
-

*Thread Reply:* Welcome!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-30 22:29:43
-
-

*Thread Reply:* Would you mind sharing your main use cases for collecting lineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marko Jamedzija - (marko@popcore.com) -
-
2021-01-03 05:54:34
-
-

Hi! I’m also working on a similar topic for some time. Really looking forward to having these ideas standardized 🙂

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Alexander Gilfillan - (agilfillan@dealerinspire.com) -
-
2021-01-05 11:29:31
-
-

I would be interested to see how to extend this to dashboards/visualizations. If that still falls with the scope of this project.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-05 12:55:01
-
-

*Thread Reply:* Definitely, each dashboard should become a node in the lineage graph. That way you can understand all the dependencies of a given dashboard. SOme example of interesting metadata around this: is the dashboard updated in a timely fashion (data freshness); is the data correct (data quality)? Observing changes upstream of the dashboard will provide insights to what’s hapening when freshness or quality suffer

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Alexander Gilfillan - (agilfillan@dealerinspire.com) -
-
2021-01-05 13:20:41
-
-

*Thread Reply:* 100%. On a granular scale, the difference between a visualization and dashboard can be interesting. One visualization can be connected to multiple dashboards. But of course this depends on the BI tool, Redash would be an example in this case.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-05 15:15:23
-
-

*Thread Reply:* We would need to decide how to model those things. Possibly as a Job type for dashboard and visualization.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Alexander Gilfillan - (agilfillan@dealerinspire.com) -
-
2021-01-06 18:20:06
-
-

*Thread Reply:* It could be. Its interesting in Redash for example you create custom queries that run at certain intervals to produce the data you need to visualize. Pretty much equivalent to job. But you then build certain visualizations off of that “job”. Then you build dashboards off of visualizations. So you could model it as an job or it could make sense for it to be more modeled like an dataset.

- -

Thats the hard part of this. How to you model a visualization/dashboard to all the possible ways they can be created since it differs depending on how the tool you use abstracts away creating an visualization.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Reid - (reid.david.jason@gmail.com) -
-
2021-01-05 17:06:02
-
-

👋 Hi everyone!

- - - -
- 🙌 Willy Lulciuc, Arthur Wiedmer -
- -
- 👋 Abe Gong -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Reid - (reid.david.jason@gmail.com) -
-
2021-01-05 17:10:22
-
-

*Thread Reply:* Part of my role at Netflix is to oversee our data lineage story so very interested in this effort and hope to be able to participate in its success

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-05 18:12:48
-
-

*Thread Reply:* Hi Jason and welcome

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-05 18:15:12
-
-

A reference implementation of the OpenLineage initial spec is in progress in Marquez: https://github.com/MarquezProject/marquez/pull/880

-
- - -
- - - } - - henneberger - (https://github.com/henneberger) -
- - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-07 12:46:19
-
-

*Thread Reply:* The OpenLineage reference implementation in Marquez will be presented this morning Thursday (01/07) at 10AM PST, at the Marquez Community meeting.

- -

When: Thursday, January 7th at 10AM PST -Wherehttps://us02web.zoom.us/j/89344845719?pwd=Y09RZkxMZHc2U3pOTGZ6SnVMUUVoQT09

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-07 12:46:36
- -
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-07 12:46:44
-
-

*Thread Reply:* that’s in 15 min

- - - -
-
-
-
- - - - - - - - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-12 17:10:23
-
-

*Thread Reply:* And it’s merged!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-12 17:10:53
-
-

*Thread Reply:* Marquez now has a reference implementation of the initial OpenLineage spec

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jon Loyens - (jon@data.world) -
-
2021-01-06 17:43:02
-
-

👋 Hi everyone! I'm one of the co-founder at data.world and looking forward to hanging out here

- - - -
- 👋 Julien Le Dem, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Elena Goydina - (egoydina@provectus.com) -
-
2021-01-11 11:39:20
-
-

👋 Hi everyone! I was looking for the roadmap and don't see any. Does it exist?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-13 19:06:34
-
-

*Thread Reply:* There’s no explicit roadmap so far. With the initial spec defined and the reference implementation implemented, next steps are to define more facets (for example, data shape, dataset size, etc), provide clients to facilitate integrations (java, python, …), implement more integrations (Spark in the works). Members of the community are welcome to drive their own initiatives around the core spec. One of the design goals of the facet is to enable numerous and independant parallel efforts

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-13 19:06:48
-
-

*Thread Reply:* Is there something you are interested about in particular?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-13 19:09:42
-
-

I have opened a proposal to move the spec to JSONSchema, this will make it more focused and decouple from http: https://github.com/OpenLineage/OpenLineage/issues/15

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Assignees
- julienledem -
- - - - - - - - - - -
- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-19 12:26:39
-
-

Here is a PR with the corresponding change: https://github.com/OpenLineage/OpenLineage/pull/17

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Xinbin Huang - (bin.huangxb@gmail.com) -
-
2021-02-01 17:07:50
-
-

Really excited to see this project! I am curious what's the current state and the roadmap of it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-01 17:55:59
-
-

*Thread Reply:* You can find the initial spec here: https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md -The process to contribute to the model is described here: https://github.com/OpenLineage/OpenLineage/blob/main/CONTRIBUTING.md -In particular, now we’d want to contribute more facets and integrations. -Marquez has a reference implementation: https://github.com/MarquezProject/marquez/pull/880 -On the roadmap: -• define more facets: data profile, etc -• more integrations -• java/python client -You can see current discussions here: https://github.com/OpenLineage/OpenLineage/issues

- - - -
- ✅ Xinbin Huang -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-01 17:56:43
-
-

For people curious about following github activity you can subscribe to: <#C01EB6DCLHX|github-notifications>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-01 17:57:05
-
-

*Thread Reply:* It is not on general, as it can be a bit noisy

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2021-02-09 13:50:17
-
-

Random-ish question: why is producer and schemaURL nested under nominalTime facet in the spec for postRunStateUpdate? It seems like the producer of its metadata isn’t related to the time of the lineage event?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-09 20:02:48
-
-

*Thread Reply:* Hi @Zachary Friedman! I replied bellow. https://openlineage.slack.com/archives/C01CK9T7HKR/p1612918909009900

-
- - -
- - - } - - Julien Le Dem - (https://openlineage.slack.com/team/U01DCLP0GU9) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-09 20:01:49
-
-

producer and schemaURL are defined in the BaseFacet type and therefore all facets (including nominalTime) have it. -• The producer is an identifier for the code that produced the metadata. The idea is that different facets in the same event can be produced by different libraries. For example In a Spark integration, Iceberg could emit it’s own facet in addition to other facets. The producer identifies what produced what. -• The _schemaURL is the identifier of the version of the schema for a given facet. Similarly an event could contain a mixture of Core facets from the spec as well as custom facets. This makes explicit what the definition for this facet is.

- - - -
- 👍 Zachary Friedman -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-09 21:27:05
-
-

As discussed previously, I have separated a Json Schema spec for the OpenLineage events from the OpenAPI spec defining a HTTP endpoint: https://github.com/OpenLineage/OpenLineage/pull/17

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Reviewers
- @wslulciuc, @henneberger -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-09 21:27:26
-
-

*Thread Reply:* Feel free to comment, this is ready to merge

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-02-11 20:12:18
-
-

*Thread Reply:* Thanks, Julien. The new spec format looks great 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-09 21:34:31
-
-

And the corresponding code generator to start the java (and other languages) client: https://github.com/OpenLineage/OpenLineage/pull/18

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Reviewers
- @wslulciuc -
- - - - - - - - - - -
- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-11 22:25:24
-
-

those are merged, we now have a jsonschema, an openapi spec that extends it and a generated java model

- - - -
- 🎉 Willy Lulciuc -
- -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-17 19:39:55
-
-

Following up on a previous discussion: -This proposal and the accompanying PR add the notion of InputFacets and OutputFacets: https://github.com/OpenLineage/OpenLineage/issues/20 -In summary, we are collecting metadata about jobs and datasets. -At the Job level, when it’s fairly static metadata (not changing every run, like the current code version of the job) it goes in a JobFacet. When it is dynamic and changes every run (like the schedule time of the run), it goes in a RunFacet. -This proposal is adding the same notion at the Dataset level: when it is static and doesn’t change every run (like the dataset schema) it goes in a Dataset facet. When it is dynamic and changes every run (like the input time interval of the dataset being read, or the statistics of the dataset being written) it goes in an inputFacet or an outputFacet. -This enables Job and Dataset versioning logic, to keep track of what changes in the definition of something vs runtime changes

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Labels
- proposal -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- 👍 Kevin Mellott, Petr Šimeček -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-19 14:27:23
-
-

*Thread Reply:* @Kevin Mellott and @Petr Šimeček Thanks for the confirmation on this slack message. To make your comment visible to the wider community, please chime in on the github issue as well: https://github.com/OpenLineage/OpenLineage/issues/20 -Thank you.

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Labels
- proposal -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-19 14:27:46
-
-

*Thread Reply:* The PR is out for this: https://github.com/OpenLineage/OpenLineage/pull/23

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Reviewers
- @jcampbell, @abegong, @henneberger -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Weixi Li - (ashlee.happy@gmail.com) -
-
2021-02-19 04:14:59
-
-

Hi, I am really interested in this project and Marquez. I am a bit not clear about the differences and relationship between those two projects. As my understanding, OpenLineage provides an api specification for other tools running jobs (e.g. Spark, Airflow) to send out an event to update the run state of the job, then for example Marquez can be the destination for those events and show the data lineage from those run state updates. When you are saying there is an reference implementation of the OpenLineage spec in Marquez, do you mean there is an /lineage endpoint implemented in the Marquez api https://github.com/MarquezProject/marquez/blob/main/api/src/main/java/marquez/api/OpenLineageResource.java? Then my question is what is next step after Marquez has this api? How does Marquez use that endpoint to integrate with airflow for example? I did not find the usage of that endpoint in Marquez project. The library marquez-airflow which integrates Airflow with Marquez seems like only use the other marquez apis to build the data lineage. Or did I misunderstand something? Thank you very much!

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Weixi Li - (ashlee.happy@gmail.com) -
-
2021-02-19 05:03:21
-
-

*Thread Reply:* Okay, I found the spark integration in Marquez calls the /lineage endpoint. But I am still curious about the future plan to integrate with other tools, like airflow?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-19 12:41:23
-
-

*Thread Reply:* Just restating some of my answers from teh marquez slack for the benefits of folks here.

- -

• OpenLineage defines the schema to collect metadata -• Marquez has a /lineage endpoint implementing the OpenLineage spec to receive this metadata, implemented by the OpenLineageResource you pointed out -• In the future other projects will also have OpenLineage endpoints to receive this metadata -•  The Marquez Spark integration produces OpenLineage events: https://github.com/MarquezProject/marquez/tree/main/integrations/spark -• The Marquez airflow integration still uses the original marquez api but will be migrated to open lineage. -• All new integrations will use OpenLineage metadata

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Weixi Li - (ashlee.happy@gmail.com) -
-
2021-02-22 03:55:18
-
-

*Thread Reply:* thank you! very clear answer🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2021-03-02 13:49:04
-
-

Hi Everyone. Just got started with the Marquez REST API and a little bit into the Open Lineage aspects. Very easy to use. Great work on the curl examples for getting started. I'm working with Postman and am happy to share a collection I have once I finish testing. A question about tags --- are there plans for a "post new tag" call in the API? ...or maybe I missed it. Thx. --ernie

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-02 17:51:29
-
-

*Thread Reply:* I forgot to reply in thread 🙂 https://openlineage.slack.com/archives/C01CK9T7HKR/p1614725462008300

-
- - -
- - - } - - Julien Le Dem - (https://openlineage.slack.com/team/U01DCLP0GU9) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-02 17:51:02
-
-

OpenLineage doesn’t have a Tag facet yet (but tags are defined in the Marquez api). Feel free to open a proposal on the github repo. https://github.com/OpenLineage/OpenLineage/issues/new/choose

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-03-16 11:21:37
-
-

Hey everyone. What's the story for stream processing (like Flink jobs) for OpenLineage? -It does not fit cleanly with runEvent model, which -It is required to issue 1 START event and 1 of [ COMPLETE, ABORT, FAIL ] event per run. -as unbounded stream jobs usually do not complete.

- -

I'd imagine few "workarounds" that work for some cases - for example, imagine a job calculating hourly aggregations of transactions and dumpling them into parquet files for further analysis. The job could issue OTHER event type adding additional output dataset every hour. Another option would be to create new "run" every hour, just indicating the added data.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-16 15:07:04
-
-

*Thread Reply:* Ha, I signed up just to ask this precise question!

- - - -
- 😀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-16 15:07:44
-
-

*Thread Reply:* I’m still looking into the spec myself. Are we required to have 1 or more runs per Job? Or can a Job exist without a run event?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ravi Suhag - (suhag.ravi@gmail.com) -
-
2021-04-02 07:24:39
-
-

*Thread Reply:* Run event can be emitted when it starts. and it can stay in RUNNING state unless something happens to the job. Additionally, you could send event periodically as state RUNNING to inform the system that job is healthy.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-16 15:09:31
-
-

Similar to @Maciej Obuchowski question about Flink / Streaming jobs - what about Streaming sources (eg: a Kafka topic)? It does fit into the dataset model, more or less. But, has anyone used this yet for a set of streaming sources? Particularly with schema changes over time?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 18:30:46
-
-

Hi @Maciej Obuchowski and @Adam Bellemare, streaming jobs are meant to be covered by the spec but I agree there are a few details to iron out.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 18:31:55
-
-

In particular, streaming job still have runs. If they run continuously they do not run forever and you want to track that a job has been started at a point in time with a given version of the code, then stopped and started again after being upgraded for example.

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 18:32:23
-
-

I agree with @Maciej Obuchowski that we would also send OTHER events to keep track of progress.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 18:32:46
-
-

For example one could track checkpointing this way.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 18:35:35
-
-

For a Kafka topic you could have streaming dataset specific facets or even Kafka specific facets (ex: list of offsets we stopped reading at, schema id, etc )

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-03-17 10:05:53
-
-

*Thread Reply:* That's good idea.

- -

Now I'm wondering - let's say we want to track on which offset checkpoint ended processing. That would mean we want to expose checkpoint id, time, and offset. -I suppose we don't want to overwrite previous checkpoint info, so we want to have some collection of data in this facet.

- -

Something like appendable facets would be nice, to just add new checkpoint info to the collection, instead of having to push all the checkpoint infos all the time we just want to add new data point.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 18:45:23
-
-

Let me know if you have more thoughts

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-17 09:18:49
-
-

*Thread Reply:* Thanks Julien! I will try to wrap my head around some use-cases and see how it maps to the current spec. From there, I can see if I can figure out any proposals

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-17 13:43:29
-
-

*Thread Reply:* You can use the proposal issue template to propose a new facet for example: https://github.com/OpenLineage/OpenLineage/issues/new/choose

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Carlos Zubieta - (carlos.zubieta@wizeline.com) -
-
2021-03-16 18:49:00
-
-

Hi everyone, I just hear about OpenLineage and would like to learn more about it. The talks in the repo explain nicely the purpose and general ideas but I have a couple of questions. Are there any working implementations to produce/consume the spec? Also, are there any discussions/guides standard information, naming conventions, etc. in the facets?

- - - -
-
-
-
- - - - - - - - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 20:05:06
-
-

Hi @Carlos Zubieta here are some pointers ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 20:06:51
-
-

Marquez has a reference implementation of an OpenLineage endpoint. The Spark integration emits OpenLineage events.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Carlos Zubieta - (carlos.zubieta@wizeline.com) -
-
2021-03-16 20:56:37
-
-

Thank you @Julien Le Dem!!! Will take a close look

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-17 15:41:50
-
-

Q related to People/Teams/Stakeholders/Owners with regards to Jobs and Datasets (didn’t find anything in search): -Let’s say I have a dataset , and there are a number of other downstream jobs that ingest from it. In the case that the dataset is mutated in some way (or deleted, archived, etc), how would I go about notifying the stakeholders of that set about the changes?

- -

Just to be clear, I’m not concerned about the mechanics of doing this, just that there is someone that needs to be notified, who has self-registered on this set. -Similarly, I want to manage the datasets I am concerned about , where I can grab a list of all the datasets I tagged myself on.

- -

This seems to suggest that we could do with additional entities outside of Dataset, Run, Job. However, at the same time, I can see how this can lead to an explosion of other entities. Any thoughts on this particular domain? I think I could achieve something similar with aspects, but this would require that I update the aspect on each entity if I want to wholesale update the user contact, say their email address.

- -

Has anyone else run into something like this? Have you any advice? Or is this something that may be upcoming in the spec?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-17 16:42:24
-
-

*Thread Reply:* One thing we were considering is just adding these in as Facets ( Tags as per Marquez), and then plugging into some external people managing system. However, I think the question can be generalized to “should there be some sort of generic entity that can enable relationships between itself and Datasets, Jobs, Runs) as part of an integration element?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-18 16:03:55
-
-

*Thread Reply:* That’s a great topic of discussion. I would definitely use the OpenLineage facets to capture what you describe as aspect above. The current Marquez model has a simple notion of ownership at the namespace model but this need to be extended to enable use cases you are describing (owning a dataset or a job) . Right now the owner is just a generic identifier as a string (a user id or a group id for example). Once things are tagged (in some way), you can use the lineage API to find all the downstream or upstream jobs and datasets. In OpenLineage I would start by being able to capture the owner identifier in a facet with contact info optional if it’s available at runtime. It will have the advantage of keeping track of how that changed over time. This definitely deserves its own discussion.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-18 17:52:13
-
-

*Thread Reply:* And also to make sure I understand your use case, you want to be able to notify the consumers of a dataset that it is being discontinued/replaced/… ? What else are you thinking about?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-22 09:15:19
-
-

*Thread Reply:* Let me pull in my colleagues

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-22 09:15:24
-
-

*Thread Reply:* Standby

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Olessia D'Souza - (olessia.dsouza@shopify.com) -
-
2021-03-22 10:59:57
-
-

*Thread Reply:* 👋 Hi Julien. I’m Olessia, I’m working on the metadata collection implementation with Adam. Some thought on this:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Olessia D'Souza - (olessia.dsouza@shopify.com) -
-
2021-03-22 11:00:45
-
-

*Thread Reply:* To start off, we’re thinking that there often isn’t a single owner, but rather a set of Stakeholders that evolve over time. So we’d like to be able to attach multiple entries, possibly of different types, to a Dataset. We’re also thinking that a dataset should have at least one owner. So a few things I’d like to confirm/discuss options:

  • If I were to stay true to the spec as it’s defined atm I wouldn’t be able to add a required facet. True/false?
  • According to the readme, “...emiting a new facet with the same name for the same entity replaces the previous facet instance for that entity entirely”. If we were to store multiple stakeholders, we’d have a field “stakeholders” and its value would be a list? This would make queries involving stakeholders not very straightforward. If the facet is overwritten every time, how do I a) add individuals to the list b) track changes to the list over time. Let me know what I’m missing, because based on what you said above tracking facet changes over time is possible.
  • Run events are issued by a scheduler. Why should it be in the domain of the scheduler to know the entire list of Stakeholders?
  • I noticed that Marquez has separate endpoints to capture information about Datasets, and some additional information beyond what’s described in the spec is required. In this context, we could add a required Stakeholder facets on a Dataset, and potentially even additional end points to add and remove Stakeholders. Is that a valid way to go about this, in your opinion?
  • -
- -

Curious to hear your thoughts on all of this!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-24 17:06:50
-
-

*Thread Reply:* > To start off, we’re thinking that there often isn’t a single owner, but rather a set of Stakeholders that evolve over time. So we’d like to be able to attach multiple entries, possibly of different types, to a Dataset. We’re also thinking > that a dataset should have at least one owner. So a few things I’d like to confirm/discuss options: -> -> If I were to stay true to the spec as it’s defined atm I wouldn’t be able to add a required facet. True/false? -Correct, The spec defines what facets looks like (and how you can make your own custom facets) but it does not make statements about whether facets are required. However, you can have your own validation and make certain things required if you wish to on the client side? -  -> - According to the readme, “...emiting a new facet with the same name for the same entity replaces the previous facet instance for that entity entirely”. If we were to store multiple stakeholders, we’d have a field “stakeholders” and its value would be a list?  -Yes, I would indeed consider such a facet on the dataset with the stakeholder.

- -

> This would make queries involving stakeholders not very straightforward. If the facet is overwritten every time, how do I  -> a) add individuals to the list -You would provide the new list of stake holders. OpenLineage standardizes lineage collection and defines a format for expressing metadata. Marquez will keep track of how metadata has evolved over time.

- -

> b) track changes to the list over time. Let me know what I’m missing, because based on what you said above tracking facet changes over time is possible. -Each event is an observation at a point in time. In a sense they are each immutable. There’s a “current” version but also all the previous ones stored in Marquez. -Marquez stores each version of a dataset it received through OpenLineage and exposes an API to see how that evolved over time.

- -

> - Run events are issued by a scheduler. Why should it be in the domain of the scheduler to know the entire list of Stakeholders? -The scheduler emits the information that it knows about. For example: “I started this job and it’s reading from this dataset and is writing to this other dataset.” -It may or may not be in the domain of the scheduler to know the list of stakeholders. If not then you could emit different types of events to add a stakeholder facet to a dataset. We may want to refine the spec for that. Actually I would be curious to hear what you think should be the source of truth for stakeholders. It is not the intent to force everything coming from the scheduler.

  • example 1: stakeholders are people on call for the job, they are defined as part of the job and that also enables alerting
  • example 2: stakeholders are consumers of the jobs: they may be defined somewhere else
  • -
- -

> - I noticed that Marquez has separate endpoints to capture information about Datasets, and some additional information beyond what’s described in the spec is required. In this context, we could add a required Stakeholder facets on a Dataset, and potentially even additional end points to add and remove Stakeholders. Is that a valid way to go about this, in your opinion?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-24 17:06:50
-
-

*Thread Reply:* -Marquez existed before OpenLineage. In particular the /run end-point to create and update runs will be deprecated as the OpenLineage /lineage endpoint replaces it. At the moment we are mapping OpenLineage metadata to Marquez. Soon Marquez will have all the facets exposed in the Marquez API. (See: https://github.com/MarquezProject/marquez/pull/894/files) -We could make Marquez Configurable or Pluggable for validation purposes. There is already a notion of LineageListener for example. -Although Marquez collects the metadata. I feel like this validation would be better upstream or with some some other mechanism. The question is when do you create a dataset vs when do you become a stakeholder? What are the various stakeholder and what is the responsibility of the minimum one stakeholder? I would probably make it required to deploy the job that the stakeholder is defined. This would apply to the output dataset and would be collected in Marquez.

- -

In general, you are very welcome to make suggestion on additional endpoints for Marquez and I’m happy to discuss this further as those ideas are progressing.

- -

> Curious to hear your thoughts on all of this! -Thanks for taking the time!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-05-24 16:27:03
-
-

*Thread Reply:* https://openlineage.slack.com/archives/C01CK9T7HKR/p1621887895004200

-
- - -
- - - } - - Julien Le Dem - (https://openlineage.slack.com/team/U01DCLP0GU9) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-24 18:58:00
-
-

Thanks for the Python client submission @Maciej Obuchowski -https://github.com/OpenLineage/OpenLineage/pull/34

-
- - -
- - - } - - mobuchowski - (https://github.com/mobuchowski) -
- - - - - - - - - - - - - - - -
- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-24 18:59:50
-
-

I also have added a spec to define a standard naming policy. Please review: https://github.com/OpenLineage/OpenLineage/pull/31/files

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-31 23:45:35
-
-

We now have a python client! Thanks @Maciej Obuchowski

- - - -
- 👍 Maciej Obuchowski, Kevin Mellott, Ravi Suhag, Ross Turk, Willy Lulciuc, Mirko Raca -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2021-04-02 19:37:36
-
-

Question, what do you folks see as the canonical mechanism for receiving OpenLineage events? Do you see an agent like statsd? Or do you see this as purely an API spec that services could implement? Do you see producers of lineage data writing code to send formatted OpenLineage payloads to arbitrary servers that implement receipt of these events? Curious what the long-term vision is here related to how an ecosystem of producers and consumers of payloads would interact?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-02 19:54:52
-
-

*Thread Reply:* Marquez is the reference implementation for receiving events and tracking changes. But the definition of the API let’s other receive them (and also enables using openlineage events to sync between systems)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-02 19:55:32
-
-

*Thread Reply:* In particular, Egeria is involved in enabling receiving and emitting openlineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2021-04-03 18:03:01
-
-

*Thread Reply:* Thanks @Julien Le Dem. So to get specific, if dbt were to emit OpenLineage events, how would this work? Would dbt Cloud hypothetically allow users to configure an endpoint to send OpenLineage events to, similar in UI implementation to configuring a Stripe webhook perhaps? And then whatever server the user would input here would point to somewhere that implements receipt of OpenLineage payloads? This is all a very hypothetical example, but trying to ground it in something I have a solid mental model for.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-04-05 17:51:57
-
-

*Thread Reply:* hypothetically speaking, that all sounds right. so a user, who, e.g., has a dbt pipeline and an AWS glue pipeline could configure both of those projects to point to the same open lineage service and get their entire lineage graph even if the two pipelines aren't connected.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-04-06 20:33:51
-
-

*Thread Reply:* Yeah, OpenLineage events need to be published to a backend (can be Kafka, can be a graphDB, etc). Your Stripe webhook analogy is aligned with how events can be received. For example, in Marquez, we expose a /lineage endpoint that consumes OpenLineage events. We then map an OpenLineage event to the Marquez model (sources, datasets, jobs, runs) that’s persisted in postgres.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2021-04-07 10:47:06
-
-

*Thread Reply:* Thanks both!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-13 20:52:53
-
-

*Thread Reply:* sorry, I was away last week. Yes that sounds right.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Moravec - (jkb.moravec@gmail.com) -
-
2021-04-07 09:41:09
-
-

Hi everyone, I just started discovering OpenLineage and Marquez, it looks great and the quick-start tutorial is very helpful! One question though, I pushed some metadata to Marquez using the Lineage POST endpoint, and when I try to confirm that everything was created using Marquez REST API, everything is there ... but I don't see these new objects in the Marquez UI... what is the best way how to investigate where the issue is?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-04-14 13:12:31
-
-

*Thread Reply:* Welcome, @Jakub Moravec 👋 . Given that you're able to retrieve metadata using the marquezAPI, you should be able to also view dataset and job metadata in the UI. Mind using the search bar in the top right-hand corner in the UI to see if your metadata is searchable? The UI only renders jobs and datasets that are connected in the lineage graph. We're working towards a more general metadata exploration experience, but currently the lineage graph is the main experience.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakob Külzer - (jakob.kulzer@shopify.com) -
-
2021-04-08 11:23:18
-
-

Hi friends, we're exploring OpenLineage and while building out integration for existing systems we realized there is no obvious way for an input to specify what "version" of that dataset is being consumed. For example, we have a job that rolls up a variable number of what OpenLineage calls dataset versions. By specifying only that dataset, we can't represent the specific instances of it that are actually rolled up. We think that would be a very important part of the lineage graph.

- -

Are there any thoughts on how to address specific dataset versions? Is this where custom input facets would come to play?

- -

Furthermore, based on the spec, it appears that events can provide dataset facets for both inputs and outputs and this seems to open the door to race conditions in which two runs concurrently create dataset versions of a dataset. Is this where the eventTime field is supposed to be used?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-13 20:56:42
-
-

*Thread Reply:* Your intuition is right here. I think we should define an input facet that specifies which dataset version is being read. Similarly you would have an output facet that specifies what version is being produced. This would apply to storage layers like Deltalake and Iceberg as well.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-13 20:57:58
-
-

*Thread Reply:* Regarding the race condition, input and output facets are attached to the run. The version of the dataset that was read is an attribute of a run and should not modify the dataset itself.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-13 21:01:34
-
-

*Thread Reply:* See the Dataset description here: https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md#core-lineage-model

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Stephen Pimentel - (stephenpiment@gmail.com) -
-
2021-04-14 18:20:42
-
-

Hi everyone! I’m exploring what existing, open-source integrations are available, specifically for Spark, Airflow, and Trino (PrestoSQL). My team is looking both to use and contribute to these integrations. I’m aware of the integration in the Marquez repo: -• Spark: https://github.com/MarquezProject/marquez/tree/main/integrations/spark -• Airflow: https://github.com/MarquezProject/marquez/tree/main/integrations/airflow -Are there other efforts I should be aware of, whether for these two or for Trino? Thanks for any information!

- - - -
- 👋 Arthur Wiedmer, Maciej Obuchowski, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2021-04-19 16:17:06
-
-

*Thread Reply:* I think for Trino integration you'd be looking at writing a Trino extractor if I'm not mistaken, yes?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2021-04-19 16:17:23
-
-

*Thread Reply:* But extractor would obviously be at the Marquez layer not OpenLineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2021-04-19 16:19:00
-
-

*Thread Reply:* And hopefully the metadata you'd be looking to extract from Trino wouldn't have any connector-specific syntax restrictions.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Antonio Moctezuma - (antoniomoctezuma@northwesternmutual.com) -
-
2021-04-16 15:37:24
-
-

Hey all! Right now I am working on getting OpenLineage integrated with some microservices here at Northwestern Mutual and was looking for some advice. The current service I am trying to integrate it with moves files from one AWS S3 bucket to another so i was hoping to track that movement with OpenLineage. However by my understanding the inputs that would be passed along in a runEvent are meant to be datasets that have schema and other properties. But I wanted to have that input represent the file being moved. Is this a proper usage of Open Lineage? Or is this a use case that is still being developed? Any and all help is appreciated!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-19 21:42:14
-
-

*Thread Reply:* This is a proper usage. That schema is optional if it’s not available.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-19 21:43:27
-
-

*Thread Reply:* You would model it as a job reading from a folder (the input dataset) in the input bucket and writing to a folder (the output dataset) in the output bucket

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-19 21:43:58
-
-

*Thread Reply:* This is similar to how this is modeled in the spark integration (spark job reading and writing to s3 buckets)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-19 21:47:06
-
-

*Thread Reply:* for reference: getting the urls for the inputs: https://github.com/MarquezProject/marquez/blob/c5e5d7b8345e347164aa5aa173e8cf35062[…]marquez/spark/agent/lifecycle/plan/HadoopFsRelationVisitor.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-19 21:47:54
- -
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-19 21:48:48
-
-

*Thread Reply:* See the spec (comments welcome) for the naming of S3 datasets: https://github.com/OpenLineage/OpenLineage/pull/31/files#diff-e3a8184544e9bc70d8a12e76b58b109051c182a914f0b28529680e6ced0e2a1cR87

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Antonio Moctezuma - (antoniomoctezuma@northwesternmutual.com) -
-
2021-04-20 11:11:38
-
-

*Thread Reply:* Hey Julien, thank you so much for getting back to me. I'll take a look at the documentation/implementations you've sent me and will reach out if I have anymore questions. Thanks again!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Antonio Moctezuma - (antoniomoctezuma@northwesternmutual.com) -
-
2021-04-20 17:39:24
-
-

*Thread Reply:* @Julien Le Dem I left a quick comment on that spec PR you mentioned. Just wanted to let you know.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-20 17:49:15
-
-

*Thread Reply:* thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Josh Quintus - (josh.quintus@gmail.com) -
-
2021-04-28 09:41:45
-
-

Hello all. I was reading through the OpenLineage documentation on GitHub and noticed a very minor typo (an instance where and should have been an). I was just about to create a PR for it but wanted to check with someone to see if that would be something that the team is interested in.

- -

Thanks for the tool, I'm looking forward to learning more about it.

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-28 20:56:53
-
-

*Thread Reply:* Thank you! Please do fix typos, I’ll approve your PR.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Josh Quintus - (josh.quintus@gmail.com) -
-
2021-04-28 23:21:44
-
-

*Thread Reply:* No problem. Here's the PR. https://github.com/OpenLineage/OpenLineage/pull/47

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Josh Quintus - (josh.quintus@gmail.com) -
-
2021-04-28 23:22:41
-
-

*Thread Reply:* Once I fixed the ones I saw I figured "Why not just run it through a spell checker just in case... " and found a few additional ones.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2021-05-20 16:30:05
-
-

For your enjoyment, @Julien Le Dem was on the Data Engineering Podcast talking about OpenLineage!

- -

https://www.dataengineeringpodcast.com/openlineage-data-lineage-specification-episode-187/

-
-
Data Engineering Podcast
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Willy Lulciuc, Maciej Obuchowski, Peter Hicks, Mario Measic -
- -
- ❤️ Willy Lulciuc, Maciej Obuchowski, Peter Hicks, Rogier Werschkull, A Pospiech, Kedar Rajwade, James Le -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2021-05-20 16:30:09
-
-

share and enjoy 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-05-21 18:21:23
-
-

Also happened yesterday: OpenLineage being accepted by the LFAI&Data.

- - - -
- 🎉 Abe Gong, Willy Lulciuc, Peter Hicks, Maciej Obuchowski, Daniel Henneberger, Harel Shein, Antonio Moctezuma, Josh Quintus, Mariusz Górski, James Le -
- -
- 👏 Matt Turck -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-05-21 19:20:55
-
-

*Thread Reply:* Huge milestone! 🙌💯🎊

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-05-24 16:24:55
-
-

I have created a channel to discuss <#C022MMLU31B|user-generated-metadata> since this came up in a few discussions.

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jonathon Mitchal - (bigmit83@gmail.com) -
-
2021-05-31 01:28:35
-
-

hey guys, does anyone have any sample openlineage schemas for S3 please? potentially including facets for attributes in a parquet file? that would help heaps thanks. i am trying to slowly bring in a common metadata interface and this will help shape some of the conversations 🙂 with a move to marquez/datahub et al over time

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-01 17:56:16
-
-

*Thread Reply:* We currently don’t have S3 (or distributed filesystem specific facets) at the moment, but such support would be a great addition! @Julien Le Dem would be best to answer if any work has been done in this area 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-01 17:57:19
-
-

*Thread Reply:* Also, happy to answer any Marquez specific questions, @Jonathon Mitchal when you’re thinking of making the move. Marquez supports OpenLineage out of the box 🙌

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-01 19:58:21
-
-

*Thread Reply:* @Jonathon Mitchal You can follow the naming strategy here for referring to a S3 dataset: https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md#s3

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-01 19:59:30
-
-

*Thread Reply:* There is no facet yet for the attributes of a Parquet file. I can give you feedback if you want to start defining one. https://github.com/OpenLineage/OpenLineage/blob/main/CONTRIBUTING.md#proposing-changes

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-01 20:00:50
-
-

*Thread Reply:* Adding Parquet metadata as a facet would make a lot of sense. It is mainly a matter of specifying what the json would look like

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-01 20:01:54
-
-

*Thread Reply:* for reference the parquet metadata is defined here: https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jonathon Mitchal - (bigmit83@gmail.com) -
-
2021-06-01 23:20:50
-
-

*Thread Reply:* Thats awesome, thanks for the guidance Willy and Julien ... will report back on how we get on

- - - -
- 🙏 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Pedram - (pedram@hightouch.io) -
-
2021-06-01 17:52:08
-
-

hi all! just wanted to introduce myself, I'm the Head of Data at Hightouch.io, we build reverse etl pipelines from the warehouse into various destinations. I've been following OpenLineage for a while now and thought it would be nice to build and expose our runs via the standard and potentially save that back to the warehouse for analysis/alerting. Really interesting concept, looking forward to playing around with it

- - - -
- 👋 Willy Lulciuc, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-01 20:02:34
-
-

*Thread Reply:* Welcome! Let use know if you have any questions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Leo - (leorobinovitch@gmail.com) -
-
2021-06-03 19:22:10
-
-

Hi all! I have a noob question. As I understand it, one of the main purposes of OpenLineage is to avoid runaway proliferation of bespoke connectors for each data lineage/cataloging/provenance tool to each data source/job scheduler/query engine etc. as illustrated in the problem diagram from the main repo below.

- -

My understanding is that instead, things push to OpenLineage which provides pollable endpoints for metadata tools.

- -

I’m looking at Amundsen, and it seems to have bespoke connectors, but these are pull-based - I don’t need to instrument my data resources to push to Amundsen, I just need to configure Amundsen to poll my data resources (e.g. the Postgres metadata extractor here).

- -

Can OpenLineage do something similar where I can just point it at something to extract metadata from it, rather than instrumenting that thing to push metadata to OpenLineage? If not, I’m wondering why?

- -

Is it the case that Open Lineage defines the general framework but doesn’t actually enforce push or pull-based implementations, it just so happens that the reference implementation (Marquez) uses push?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-06-04 04:45:15
-
-

*Thread Reply:* > Is it the case that Open Lineage defines the general framework but doesn’t actually enforce push or pull-based implementations, it just so happens that the reference implementation (Marquez) uses push? -Yes, at core OpenLineage just enforces format of the event. We also aim to provide clients - REST, later Kafka, etc. and some reference implementations - which are now in Marquez repo. https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/doc/Scope.png

- -

There are several differences between push and poll models. Most important one is that with push model, latency between your job and emitting OpenLineage events is very low. With some systems, with internal, push based model you have more runtime metadata available than when looking from outside. Another one would be that naive poll implementation would need to "rebuild the world" on each change. There are also disadvantages, such as that usually, it's easier to write plugin that extracts data from outside the system than hooking up to the internals.

- -

Integration with Amundsen specifically is planned. Although, right now it seems to me that way to do it is to bypass the databuilder framework and push directly to underlying database, such as Neo4j, or make Marquez backend for Metadata Service: https://raw.githubusercontent.com/amundsen-io/amundsen/master/docs/img/Amundsen_Architecture.png

-
- - - - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - - - -
- - - -
- ❤️ Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Leo - (leorobinovitch@gmail.com) -
-
2021-06-04 10:39:51
-
-

*Thread Reply:* This is really helpful, thank you @Maciej Obuchowski!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Leo - (leorobinovitch@gmail.com) -
-
2021-06-04 10:40:59
-
-

*Thread Reply:* Similar to what you say about push vs pull, I found DataHub’s comment to be interesting yesterday: -> Push is better than pull: While pulling metadata directly from the source seems like the most straightforward way to gather metadata, developing and maintaining a centralized fleet of domain-specific crawlers quickly becomes a nightmare. It is more scalable to have individual metadata providers push the information to the central repository via APIs or messages. This push-based approach also ensures a more timely reflection of new and updated metadata.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-04 21:59:59
-
-

*Thread Reply:* yes. You can also “pull-to-push” for things that don’t push.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mariusz Górski - (gorskimariusz13@gmail.com) -
-
2021-06-17 10:01:37
-
-

*Thread Reply:* @Maciej Obuchowski any particular reason for bypassing databuilder and go directly to neo4j? By design databuilder is supposed to be very abstract so any kind of backend can be used with Amundsen. Currently there are at least 4 and neo4j is just one of them.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-06-17 10:28:52
-
-

*Thread Reply:* Databuilder's pull model is very different than OpenLineage's push model, where the events are generated while the dataset itself is generated.

- -

So, how would you see using it? Just to proxy the events to concrete search and metadata backend?

- -

I'm definitely not an Amundsen expert, so feel free to correct me if I'm getting it wrong.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-07 19:59:28
-
-

*Thread Reply:* @Mariusz Górski my slide that Maciej is referring to might be a bit misleading. The Amundsen integration does not exist yet. Please add your input in the ticket: https://github.com/OpenLineage/OpenLineage/issues/86

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mariusz Górski - (gorskimariusz13@gmail.com) -
-
2021-07-09 02:22:06
-
-

*Thread Reply:* thanks Julien! will take a look

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kedar Rajwade - (kedar@cloudzealous.com) -
-
2021-06-08 10:00:47
-
-

@here Hello, My name is Kedar Rajwade. I happened to come across the OpenLineage project and it looks quite interesting. Is there some kind of getting start guide that I can follow. Also are there any weekly/bi-weekly calls that I can attend to know the current/future plans ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-08 14:16:42
-
-

*Thread Reply:* Welcome! You can look here: https://github.com/OpenLineage/OpenLineage/blob/main/CONTRIBUTING.md

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-08 14:17:19
-
-

*Thread Reply:* We’re starting a monthly call, I will publish more details here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-08 14:17:48
-
-

*Thread Reply:* Do you have a specific use case in mind?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kedar Rajwade - (kedar@cloudzealous.com) -
-
2021-06-08 21:32:02
-
-

*Thread Reply:* Nothing specific yet

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-09 00:49:09
-
-

The first instance of the OpenLineage Monthly meeting is tomorrow June 9 at 9am PT: https://calendar.google.com/event?action=TEMPLATE&tmeid=MDRubzk0cXAwZzA4bXRmY24yZjBkdTZzbDNfMjAyMTA2MDlUMTYwMDAwWiBqdWxpZW5AZGF0YWtpbi5jb20&tmsrc=julien%40datakin.com&scp=ALL|https://calendar.google.com/event?action=TEMPLATE&tmeid=MDRubzk0cXAwZzA4bXRmY24yZjBkdT[…]qdWxpZW5AZGF0YWtpbi5jb20&tmsrc=julien%40datakin.com&scp=ALL

-
-
accounts.google.com
- - - - - - - - - - - - - - - -
- - - -
- 🎉 Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Victor Shafran - (victor.shafran@databand.ai) -
-
2021-06-09 08:33:45
-
-

*Thread Reply:* Hey @Julien Le Dem, I can’t add a link to my calendar… Can you send an invite?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Leo - (leorobinovitch@gmail.com) -
-
2021-06-09 11:00:05
-
-

*Thread Reply:* Same!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-09 11:01:45
-
-

*Thread Reply:* Will do. Also if you send your email in dm you can get added to the invite

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-09 11:59:22
- -
-
-
- - - - - -
-
- - - - -
- -
Kedar Rajwade - (kedar@cloudzealous.com) -
-
2021-06-09 12:00:30
-
-

*Thread Reply:* @Julien Le Dem Can't access the calendar.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kedar Rajwade - (kedar@cloudzealous.com) -
-
2021-06-09 12:00:43
-
-

*Thread Reply:* Can you please share the meeting details

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-09 12:01:12
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-09 12:01:24
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-06-09 12:01:55
-
-

*Thread Reply:* The calendar invite says 9am PDT, not 10am. Which is right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kedar Rajwade - (kedar@cloudzealous.com) -
-
2021-06-09 12:01:58
-
-

*Thread Reply:* Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-09 13:25:13
-
-

*Thread Reply:* it is 9am,thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-09 18:37:02
-
-

*Thread Reply:* I have posted the notes on the wiki (includes link to recording) https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+meeting+archive

- - - -
- 🙌 Willy Lulciuc, Victor Shafran -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Pedram - (pedram@hightouch.io) -
-
2021-06-10 13:53:18
-
-

Hi! Are there some 'close-to-real' sample events available to build off and compare to? I'd like to make sure what I'm outputting makes sense but it's hard when only comparing to very synthetic data.

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-10 13:55:51
-
-

*Thread Reply:* We’ve recently worked on a getting started guide for OpenLineage that we’d like to publish on the OpenLineage website. That should help with making things a bit more clear on usage. @Ross Turk / @Julien Le Dem might know of when that might become available. Otherwise, happy to answer any immediate questions you might have about posting/collecting OpenLineage events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Pedram - (pedram@hightouch.io) -
-
2021-06-10 13:58:58
-
-

*Thread Reply:* Here's a sample of what I'm producing, would appreciate any feedback if it's on the right track. One of our challenges is that 'dataset' is a little loosely defined for us as outputs since we take data from a warehouse/database and output to things like Salesforce, Airtable, Hubspot and even Slack.

- -

{ - eventType: 'START', - eventTime: '2021-06-09T08:45:00.395+00:00', - run: { runId: '2821819' }, - job: { - namespace: '<hightouch://my-workspace>', - name: '<hightouch://my-workspace/sync/123>' - }, - inputs: [ - { - namespace: '<snowflake://abc1234>', - name: '<snowflake://abc1234/my_source_table>' - } - ], - outputs: [ - { - namespace: '<salesforce://mysf_instance.salesforce.com>', - name: 'accounts' - } - ], - producer: 'hightouch-event-producer-v.0.0.1' -} -{ - eventType: 'COMPLETE', - eventTime: '2021-06-09T08:45:30.519+00:00', - run: { runId: '2821819' }, - job: { - namespace: '<hightouch://my-workspace>', - name: '<hightouch://my-workspace/sync/123>' - }, - inputs: [ - { - namespace: '<snowflake://abc1234>', - name: '<snowflake://abc1234/my_source_table>' - } - ], - outputs: [ - { - namespace: '<salesforce://mysf_instance.salesforce.com>', - name: 'accounts' - } - ], - producer: 'hightouch-event-producer-v.0.0.1' -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Pedram - (pedram@hightouch.io) -
-
2021-06-10 14:02:59
-
-

*Thread Reply:* One other question I have is really around how customers might take the metadata we emit at Hightouch and integrate that with OpenLineage metadata emitted from other tools like dbt, Airflow, and other integrations to create a true lineage of their data.

- -

For example, if the data goes from S3 -&gt; Snowflake via Airflow and then from Snowflake -&gt; Salesforce via Hightouch, this would mean both Airflow/Hightouch would need to define the Snowflake dataset in exactly the same way to get the benefits of lineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-17 19:13:14
-
-

*Thread Reply:* Hey, @Dejan Peretin! Sorry for the late replay here! Your OL events look solid and only have a few of suggestions:

- -
  1. I would use a valid UUID for the run ID as the spec will standardize on that type, see https://github.com/OpenLineage/OpenLineage/pull/65
  2. You don’t need to provide the input dataset again on the COMPLETE event as the input datasets have already been associated with the run ID
  3. For the producer, I’d recommend using a link to the producer source code version to link the producer version with the OL event that was emitted.
  4. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-17 19:13:59
-
-

*Thread Reply:* You can now reference our OL getting started guide for a close-to-real example 🙂 , see http://openlineage.io/getting-started

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-17 19:18:19
-
-

*Thread Reply:* > … this would mean both Airflow/Hightouch would need to define the Snowflake dataset in exactly the same way to get the benefits of lineage? -Yes, the dataset and the namespace that it was registered under would have to be the same to properly build the lineage graph. We’re working on defining unique dataset names and have made some good progress in this area. I’d suggest reviewing the OL naming conventions if you haven’t already: https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

- - - -
- 🙌 Pedram -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Pedram - (pedram@hightouch.io) -
-
2021-06-19 01:09:27
-
-

*Thread Reply:* Thanks! I'm really excited to see what the future holds, I think there are so many great possibilities here. Will be keeping a watchful eye. 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-22 15:14:39
-
-

*Thread Reply:* 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Antonio Moctezuma - (antoniomoctezuma@northwesternmutual.com) -
-
2021-06-11 09:53:39
-
-

Hey everyone! I've been running into a minor OpenLineage issue and I was curious if anyone had any advice. So according to OpenLineage specs its suggested that for a dataset coming from S3 that its namespace be in the form of s3://<bucket>. We have implemented our code to do so and RunEvents are published without issue but when trying to retrieve the information of this RunEvent (like the job) I am unable to retrieve it based on namespace from both /api/v1/namespaces/s3%3A%2F%2F<bucket name> (encoding since : and / are special characters in URL) and the beta endpoint of /api/v1-beta/lineage?nodeId=<dataset>:<namespace>:<name> and instead get a 400 error with a "Ambiguous Segment in URI" message.

- -

Any and all advice would be super helpful! Thank you so much!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-06-11 10:16:41
-
-

*Thread Reply:* Sounds like problem is with Marquez - might be worth to open issue here: https://github.com/MarquezProject/marquez/issues

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Antonio Moctezuma - (antoniomoctezuma@northwesternmutual.com) -
-
2021-06-11 10:25:58
-
-

*Thread Reply:* Thank you! Will do.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-11 15:31:41
-
-

*Thread Reply:* Thanks for reporting Antonio

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-16 19:01:52
-
-

I have opened a proposal for versioning and publishing the spec: https://github.com/OpenLineage/OpenLineage/issues/63

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-18 15:00:20
-
-

We have a nice OpenLineage website now. https://openlineage.io/ -Thank you to contributors: @Ross Turk @Willy Lulciuc @Michael Collado!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- ❤️ Ross Turk, Kevin Mellott, Leo, Peter Hicks, Willy Lulciuc, Edgar Ramírez Mondragón, Maciej Obuchowski, Supratim Mukherjee -
- -
- 👍 Kedar Rajwade, Mukund -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Leo - (leorobinovitch@gmail.com) -
-
2021-06-18 15:09:18
-
-

*Thread Reply:* Very nice!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:08:43
-
-

Hi everyone! Im trying to run a spark job with openlineage and marquez...But Im getting some errors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:09:28
-
-

*Thread Reply:* Here is the error...

- -

21/06/20 11:02:56 WARN ArgumentParser: missing jobs in [, api, v1, namespaces, spark_integration] at 5 -21/06/20 11:02:56 WARN ArgumentParser: missing runs in [, api, v1, namespaces, spark_integration] at 7 -21/06/20 11:03:01 ERROR AsyncEventQueue: Listener SparkListener threw an exception -java.lang.NullPointerException - at marquez.spark.agent.SparkListener.onJobEnd(SparkListener.java:165) - at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:39) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92) - at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92) - at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) - at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:10:41
-
-

*Thread Reply:* Here is my code ...

- -

```from pyspark.sql import SparkSession -from pyspark.sql.functions import lit

- -

spark = SparkSession.builder \ - .master('local[1]') \ - .config('spark.jars.packages', 'io.github.marquezproject:marquezspark:0.15.2') \ - .config('spark.extraListeners', 'marquez.spark.agent.SparkListener') \ - .config('openlineage.url', 'http://localhost:5000/api/v1/namespaces/spark_integration/') \ - .config('openlineage.namespace', 'sparkintegration') \ - .getOrCreate()

- -

Supress success

- -

spark.sparkContext.jsc.hadoopConfiguration().set('mapreduce.fileoutputcommitter.marksuccessfuljobs', 'false') -spark.sparkContext.jsc.hadoopConfiguration().set('parquet.summary.metadata.level', 'NONE')

- -

dfsourcetrip = spark.read \ - .option('inferSchema', True) \ - .option('header', True) \ - .option('delimiter', '|') \ - .csv('/Users/bcanal/Workspace/poc-marquez/pocspark/resources/data/source/trip.csv') \ - .createOrReplaceTempView('sourcetrip')

- -

dfdrivers = spark.table('sourcetrip') \ - .select('driver') \ - .distinct() \ - .withColumn('drivername', lit('Bruno')) \ - .withColumnRenamed('driver', 'driverid') \ - .createOrReplaceTempView('source_driver')

- -

df = spark.sql( - """ - SELECT d., t. - FROM sourcetrip t, sourcedriver d - WHERE t.driver = d.driver_id - """ -)

- -

df.coalesce(1) \ - .drop('driverid') \ - .write.mode('overwrite') \ - .option('path', '/Users/bcanal/Workspace/poc-marquez/pocspark/resources/data/target') \ - .saveAsTable('trip')```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:12:27
-
-

*Thread Reply:* After this execution, I can see just the source from first dataframe called dfsourcetrip...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:13:04
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:13:45
-
-

*Thread Reply:* I was expecting to see all source dataframes, target dataframes and the job

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:14:35
-
-

*Thread Reply:* I`m running spark local on my laptop and I followed marquez getting start to up it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:14:44
-
-

*Thread Reply:* Can anyone help me?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-06-22 14:42:03
-
-

*Thread Reply:* I think there's a race condition that causes the context to be missing when the job finishes too quickly. If I just add -spark.sparkContext.setLogLevel('info') -to the setup code, everything works reliably. Also works if you remove the master('local[1]') - at least when running in a notebook

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 13:48:34
-
-

@here Hi everyone,

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 13:49:10
-
-

i need to implement export functionality for my data lineage project.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 13:50:26
-
-

as part of this i need to convert the information fetched from graph db (neo4j) to CSV format and send in response.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 13:51:21
-
-

can someone please direct me to the CSV format of open lineage data

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-22 15:26:55
-
-

*Thread Reply:* Hey, @anup agrawal. This is a great question! The OpenLineage spec is defined using the Json Schema format, and it’s mainly for the transport layer of OL events. In terms of how OL events are eventually stored, that’s determined by the backend consumer of the events. For example, Marquez stores the raw event in a lineage_events table, but that’s mainly for convenience and replayability of events . As for importing / exporting OL events from storage, as long as you can translate the CSV to an OL event, then HTTP backends like Marquez that support OL can consume them

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-22 15:27:29
-
-

*Thread Reply:* > as part of this i need to convert the information fetched from graph db (neo4j) to CSV format and send in response. -Depending on the exported CSV, I would translate the CSV to an OL event, see https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.json

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-22 15:29:58
-
-

*Thread Reply:* When you say “send in response”, who would be the consumer of the lineage metadata exported for the graph db?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 23:33:05
-
-

*Thread Reply:* so far what i understood about my requirement is that. 1. my service will receive OL events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 23:33:24
-
-

*Thread Reply:* 2. store it in graph db (neo4j)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 23:38:28
-
-

*Thread Reply:* 3. this lineage information will be displayed on ui, based on the request.

- -
  1. now my part in that is to implement an Export functionality, so that someone can download it from UI. in UI there will be option to download the report.
  2. so i need to fetch data from storage and convert it into CSV format, send to UI
  3. they can download the report from UI.
  4. -
- -

SO my question here is that i have never seen how that CSV report look like and how do i achieve that ? -when i had asked my team how should CSV look like they directed me to your website.

- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-01 19:18:35
-
-

*Thread Reply:* I see. @Julien Le Dem might have some thoughts on how an OL event would be represented in different formats like CSV (but, of course, there’s also avro, parquet, etc). The Json Schema is the recommended format for importing / exporting lineage metadata. And, for a file, each line would be an OL event. But, given that CSV is a requirement, I’m not sure how that would be structured. Or at least, it’s something we haven’t previously discussed

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 13:51:51
-
-

i am very new to this .. sorry for any silly questions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-22 20:29:22
-
-

*Thread Reply:* There are no silly questions! 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdulmalik AN - (lord.of.d1@gmail.com) -
-
2021-06-29 11:46:33
-
-

Hello, I have read every topic and listened to 4 talks and the podcast episode about OpenLineage and Marquez due to my basic understanding for the data engineering field, I have a couple of questions which I did not understand: -1- What are events and facets and what are their purpose? -2- Can I implement the OpenLineage API to any software? or does the software needs to be integrated with the OpenLineage API? -3- Can I say that OpenLineage is about observability and Marquez is about collecting and storing the metadata? -Thank you all for being cooperative.

- - - -
- 👍 Stephen Pimentel, Kedar Rajwade -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-01 19:07:27
-
-

*Thread Reply:* Welcome, @Abdulmalik AN 👋 Hopefully the talks / podcasts have been informative! And, sure, happy to clarify a few things:

- -

> What are events and facets and what are their purpose? -An OpenLineage event is used to capture the lineage metadata at a point in time for a given run in execution. That is, the runs state transition, the inputs and outputs consumed/produced and the job associated with the run are part of the event. The metadata defined in the event can then be consumed by an HTTP backend (as well as other transport layers). Marquez is an HTTP backend implementation that consumes OL events via a REST API call. The OL core model only defines the metadata that should be captured in the context of a run, while the processing of the event is up to the backend implementation consuming the event (think consumer / producer model here). For Marquez, the end-to-end lineage metadata is stored for pipelines (composed of multiple jobs) with built-in metadata versioning support. Now, for the second part of your question: the OL core model is highly extensible via facets. A facet is user-defined metadata and enables entity enrichment. I’d recommend checking out the getting started guide for OL 🙂

- -

> Can I implement the OpenLineage API to any software? or does the software needs to be integrated with the OpenLineage API? -Do you mean HTTP vs other protocols? Currently, OL defines an API spec for HTTP backends, that Marquez has adopted to ingest OL events. But there are also plans to support Kafka and many others.

- -

> Can I say that OpenLineage is about observability and Marquez is about collecting and storing the metadata? -> Thank you all for being cooperative. -Yep! OL defines the metadata to collect for running jobs / pipelines that can later be used for root cause analysis / troubleshooting failing jobs, while Marquez is a metadata service that implements the OL standard to both consume and store lineage metadata while also exposing a REST API to query dataset, job and run metadata.

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Kedar Rajwade -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Nic Colley - (nic.colley@alation.com) -
-
2021-06-30 17:46:52
-
-

Hi OpenLineage team! Has anyone got this working on databricks yet? I’ve been working on this for a few days and can’t get it to register lineage. I’ve attached my notebook in this thread.

- -

silly question - does the jar file need be on the cluster? -Which versions of spark does OpenLineage support?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nic Colley - (nic.colley@alation.com) -
-
2021-06-30 18:16:58
-
-

*Thread Reply:* I based my code on this previous post https://openlineage.slack.com/archives/C01CK9T7HKR/p1624198123045800

-
- - -
- - - } - - Bruno Canal - (https://openlineage.slack.com/team/U025LV2BJUB) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nic Colley - (nic.colley@alation.com) -
-
2021-06-30 18:36:59
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-07-01 13:45:42
-
-

*Thread Reply:* In your first cell, you have -from pyspark.sql import SparkSession -from pyspark.sql.functions import lit -spark.sparkContext.setLogLevel('info') -unfortunately, the reference to sparkContext in the third line forces the initialization of the SparkContext so that in the next cell, your new configuration is ignored. In pyspark, you must initialize your SparkSession before any references to the SparkContext. It works if you remove the setLogInfo call from the first cell and make your 2nd cell -spark = SparkSession.builder \ - .config('spark.jars.packages', 'io.github.marquezproject:marquez_spark:0.15.2') \ - .config('spark.extraListeners', 'marquez.spark.agent.SparkListener') \ - .config('openlineage.url', '<https://domain.com>') \ - .config('openlineage.namespace', 'my-namespace') \ - .getOrCreate() -spark.sparkContext.setLogLevel('info')

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Samia Rahman - (srahman@thoughtworks.com) -
-
2021-06-30 19:26:42
-
-

How would one capture lineage for job that's processing streaming data? Is that in scope for OpenLineage?

- - - -
- ➕ Josh Quintus, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-01 16:32:18
-
-

*Thread Reply:* It’s absolutely in scope! We’ve primarily focused on the batch use case (ETL jobs, etc), but the OpenLineage standard supports both batch and streaming jobs. You can check out our roadmap here, where you’ll find Flink and Beam on our list of future integrations.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-01 16:32:57
-
-

*Thread Reply:* Is there a streaming framework you’d like to see added to our roadmap?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
mohamed chorfa - (chorfa672@gmail.com) -
-
2021-06-30 20:33:25
-
-

👋 Hello everyone!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-01 16:24:16
-
-

*Thread Reply:* Welcome, @mohamed chorfa 👋 . Let’s us know if you have any questions!

- - - -
- 👍 mohamed chorfa -
- -
-
-
-
- - - - - -
-
- - - - -
- -
mohamed chorfa - (chorfa672@gmail.com) -
-
2021-07-03 19:37:58
-
-

*Thread Reply:* Really looking follow the evolution of the specification from RawData to the ML-Model

- - - -
- ❤️ Julien Le Dem, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-02 16:53:01
-
-

Hello OpenLineage community, -We have been working on fleshing out the OpenLineage roadmap. -See on github on the currently prioritized effort: https://github.com/OpenLineage/OpenLineage/projects -Please add your feedback to the roadmap by either commenting on the github issues or opening new issues.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-02 17:04:13
-
-

In particular, I have opened an issue to finalize our mission statement: https://github.com/OpenLineage/OpenLineage/issues/84

-
- - - - - - - - - - - - - - - - -
- - - -
- ❤️ Ross Turk, Maciej Obuchowski, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-07 19:53:17
-
-

*Thread Reply:* Based on community feedback, -The new proposed mission statement: “to enable the industry at-large to collect real-time lineage metadata consistently across complex ecosystems, creating a deeper understanding of how data is produced and used”

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-07 20:23:24
-
-

I have updated the proposal for the spec versioning: https://github.com/OpenLineage/OpenLineage/issues/63

-
- - - - - - - -
-
Assignees
- julienledem -
- -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jorik - (jorik.blaas-sigmond@nn.nl) -
-
2021-07-08 07:06:53
-
-

Hi all. I'm trying to get my bearings on openlineage. Love the concept. In our data transformation pipelines, output datasets are explicitly versioned (we have an incrementing snapshot id). Our storage layer (deltalake) allows us to also ingest 'older' versions of the same dataset, etc. If I understand it correctly I would have to add some inputFacets and outputFacets to run to store the actual version being referenced. Is that something that is currently available, or on the roadmap, or is it something I could extend myself?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-08 18:57:44
-
-

*Thread Reply:* It is on the roadmap and there’s a ticket open but nobody is working on it at the moment. You are very welcome to contribute a spec and implementation

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-08 18:59:00
-
-

*Thread Reply:* Please comment here and feel free to make a proposal: https://github.com/OpenLineage/OpenLineage/issues/35

-
- - - - - - - -
-
Labels
- proposal -
- -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorik - (jorik.blaas-sigmond@nn.nl) -
-
2021-07-08 07:07:29
-
-

TL;DR: our database supports time-travel, and runs can be set up to use a specific point-in-time of an input. How do we make sure to keep that information within openlineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mariusz Górski - (gorskimariusz13@gmail.com) -
-
2021-07-09 02:23:29
-
-

Hi, on a subject of spark integrations - I know that there is spark-marquez but was curious did you also consider https://github.com/AbsaOSS/spline-spark-agent ? It seems like this and spark-marquez are doing similar thing and maybe it would make sense to add openlineage support to spline spark agent?

-
- - - - - - - -
-
Website
- <https://absaoss.github.io/spline/> -
- -
-
Stars
- 36 -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mariusz Górski - (gorskimariusz13@gmail.com) -
-
2021-07-09 02:23:42
-
-

*Thread Reply:* cc @Julien Le Dem @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-07-09 04:28:38
-
-

*Thread Reply:* @Michael Collado

- - - -
- 👀 Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-12 21:17:12
-
-

The OpenLineage Technical Steering Committee meetings are Monthly on the Second Wednesday 9:00am to 10:00am US Pacific and the link to join the meeting is https://us02web.zoom.us/j/81831865546?pwd=RTladlNpc0FTTDlFcWRkM2JyazM4Zz09 -The next meeting is this Wednesday -All are welcome. -•  Agenda: - ◦ Finalize the OpenLineage Mission Statement - ◦ Review OpenLineage 0.1 scope - ◦ Roadmap - ◦ Open discussion  - ◦ Slides: https://docs.google.com/presentation/d/1fD_TBUykuAbOqm51Idn7GeGqDnuhSd7f/edit#slide=id.ge4b57c6942_0_46 -notes are posted here: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting.,.,_

- - - -
- 🙌 Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-12 21:18:04
-
-

*Thread Reply:* Feel free to share your email with me if you want to be added to the gcal invite

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-14 12:03:31
-
-

*Thread Reply:* It is starting now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jiří Sedláček - (yirie.sedlahczech@gmail.com) -
-
2021-07-13 08:22:40
-
-

Hello, is it possible to track lineage on column level? For example for SQL like this: -CREATE TABLE T2 AS SELECT c1,c2 FROM T1; -I would like to record this lineage: -T1.C1 -- job1 --&gt; T2.C1 -T1.C2 -- job1 --&gt; T2.C2 -Would that be possible to record in OL format?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jiří Sedláček - (yirie.sedlahczech@gmail.com) -
-
2021-07-13 08:29:52
-
-

(the important thing for me is to be able to tell that T1.C1 has no effect on T2.C2)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-14 17:00:12
-
-

I have updated the notes and added the link to the recording of the meeting this morning: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-14 17:04:18
-
-

*Thread Reply:* In particular, please review the versioning proposal: https://github.com/OpenLineage/OpenLineage/issues/63

-
- - - - - - - -
-
Assignees
- julienledem -
- -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-14 17:04:33
-
-

*Thread Reply:* and the mission statement: https://github.com/OpenLineage/OpenLineage/issues/84

-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-14 17:05:02
-
-

*Thread Reply:* for this one, please give explicit approval in the ticket

- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-14 21:10:42
-
-

*Thread Reply:* @Zhamak Dehghani @Daniel Henneberger @Drew Banin @James Campbell @Ryan Blue @Maciej Obuchowski @Willy Lulciuc ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-27 18:58:35
-
-

*Thread Reply:* Per the votes in the github ticket, I have finalized the charter here: https://docs.google.com/document/d/11xo2cPtuYHmqRLnR-vt9ln4GToe0y60H/edit

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jarek Potiuk - (jarek@potiuk.com) -
-
2021-07-16 01:25:56
-
-

Hi Everyone. I am PMC member and committer of Apache Airflow. Watched the talk at the summit https://airflowsummit.org/sessions/2021/data-lineage-with-apache-airflow-using-openlineage/ and thought I might help (after the Summit is gone 🙂 with making OpenLineage/Marquez more seemlesly integrated in Airflow

-
-
airflowsummit.org
- - - - - - - - - - - - - - - - - -
- - - -
- ❤️ Abe Gong, WingCode, Maciej Obuchowski, Ross Turk, Julien Le Dem, Michael Collado, Samia Rahman, mohamed chorfa -
- -
- 🙌 Maciej Obuchowski -
- -
- 👍 Jorik -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Samia Rahman - (srahman@thoughtworks.com) -
-
2021-07-20 16:38:38
-
-

*Thread Reply:* The demo in this does not really use the openlineage spec does it?

- -

Did I miss something - the API that was should for lineage was that of Marquez, how does Marquest use the open lineage spec?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Samia Rahman - (srahman@thoughtworks.com) -
-
2021-07-20 18:09:01
-
-

*Thread Reply:* I have a question about the SQLJobFacet in the job schema - isn't it better to call it the TransformationJob Facet or the ProjecessJobFacet such that any logic in the appropriate language and be described? Am I misinterpreting the intention of SQLJobFacet is to capture the logic that runs for a job?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-26 19:06:43
-
-

*Thread Reply:* > The demo in this does not really use the openlineage spec does it? -@Samia Rahman In our Airflow talk, the demo used the marquez-airflow lib that sends OpenLineage events to Marquez’s . You can check out the how does Airflow works with OpenLineage + Marquez here https://openlineage.io/integration/apache-airflow/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-26 19:07:51
-
-

*Thread Reply:* > Did I miss something - the API that was should for lineage was that of Marquez, how does Marquest use the open lineage spec? -Yes, Marquez ingests OpenLineage events that confirm to the spec via the . Hope this helps!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kenton (swiple.io) - (kknoxparton@gmail.com) -
-
2021-07-21 07:52:32
-
-

Hi all, does OpenLineage intend on creating lineage off of query logs?

- -

From what I have read, there are a number of supported integrations but none that cater to regular SQL based ETL. Is this on the OpenLineage roadmap?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-26 18:54:46
-
-

*Thread Reply:* I would say this is more of an ingestion pattern, then something the OpenLineage spec would support directly. Though I completely agree, query logs are a great source of lineage metadata with minimal effort. On our roadmap, we have Kafka as a supported backend which would enable streaming lineage metadata from query logs into a topic. That said, confluent has some great blog posts on Change Data Capture: -• https://www.confluent.io/blog/no-more-silos-how-to-integrate-your-databases-with-apache-kafka-and-cdc/ -• https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/

-
-
Confluent
- - - - - - - - - - - - - - - - - -
-
-
Confluent
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-26 18:57:59
-
-

*Thread Reply:* Q: @Kenton (swiple.io) Are you planning on using Kafka connect? If so, I see 2 reasonable options:

- -
  1. Stream query logs to a topic using the JDBC source connector, then have a consumer read the query logs off the topic, parse the logs, then stream the result of the query parsing to another topic as an OpenLineage event
  2. Add direct support for OpenLineage to the JDBC connector or any other application you planned to use to read the query logs.
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-26 19:01:31
-
-

*Thread Reply:* Either way, I think this is a great question and a common ingestion pattern we should document or have best practices for. Also, more details on how you plan to ingestion the query logs would be help drive the discussion.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kenton (swiple.io) - (kknoxparton@gmail.com) -
-
2021-08-05 12:01:55
-
-

*Thread Reply:* Using something like sqlflow could be a good starting point? Demo https://sqlflow.gudusoft.com/?utm_source=gspsite&utm_medium=blog&utm_campaign=support_article#/

-
-
sqlflow.gudusoft.com
- - - - - - - - - - - - - - - -
-
- - - - - - - -
-
Stars
- 79 -
- -
-
Language
- Python -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-09-21 20:22:26
-
-

*Thread Reply:* @Kenton (swiple.io) I haven’t heard of sqlflow but it does look promising. It’s not on our current roadmap, but I think there is a need to have support for parsing query logs as OpenLineage events. Do you mind opening an issue and outlining you thoughts? It’d be great to start the discussion if you’d like to drive this feature and help prioritize this 💯

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Samia Rahman - (srahman@thoughtworks.com) -
-
2021-07-21 08:49:23
-
-

The openlineage implementation for airflow and spark code integration currently lives in Marquez repo, my understanding from the open lineage scope is that the the integration implementation is the scope of open lineage, are the spark code migrations going to be moved to open lineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2021-07-21 11:35:12
-
-

@Samia Rahman Yes, that is the plan. For details you can see https://github.com/OpenLineage/OpenLineage/issues/73

-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 Samia Rahman, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Samia Rahman - (srahman@thoughtworks.com) -
-
2021-07-21 18:13:11
-
-

I have a question about the SQLJobFacet in the job schema - isn't it better to call it the TransformationJob Facet or the ProjecessJobFacet such that any logic in the appropriate language and be described, can be scala or python code that runs in the job facet and processing streaming or batch data? Am I misinterpreting the intention of SQLJobFacet is to capture the logic that runs for a job?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-21 18:22:01
-
-

*Thread Reply:* Hey, @Samia Rahman 👋. Yeah, great question! The SQLJobFacet is used only for SQL-based jobs. That is, it’s not intended to capture the code being executed, but rather the just the SQL if it’s present. The SQL fact can be used later for display purposes. For example, in Marquez, we use the SQLJobFacet to display the SQL executed by a given job to the user via the UI.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-21 18:23:03
-
-

*Thread Reply:* To capture the logic of the job (meaning, the code being executed), the OpenLineage spec defines the SourceCodeLocationJobFacet that builds the link to source in version control

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-22 17:56:41
-
-

The process started a few months back when the LF AI & Data voted to accept OpenLineage as part of the foundation. It is now official, OpenLineage joined the LFAI & data Foundation. - https://lfaidata.foundation/blog/2021/07/22/openlineage-joins-lf-ai-data-as-new-sandbox-project/

-
-
LF AI
- - - - - - -
-
Written by
- Jacqueline Z Cardoso -
- -
-
Est. reading time
- 3 minutes -
- - - - - - - - - - - - -
- - - -
- 🙌 Ross Turk, Luke Smith, Maciej Obuchowski, Gyan Kapur, Dr Daniel Smith, Jarek Potiuk, Peter Hicks, Kedar Rajwade, Abe Gong, Damian Warszawski, Willy Lulciuc -
- -
- ❤️ Ross Turk, Jarek Potiuk, Peter Hicks, Abe Gong, Willy Lulciuc -
- -
- 🎉 Laurent Paris, Rifa Achrinza, Minkyu Park, Peter Hicks, mohamed chorfa, Jarek Potiuk, Abe Gong, Damian Warszawski, Willy Lulciuc, James Le -
- -
- 👏 Matt Turck -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Namron - (ian.norman@avanade.com) -
-
2021-07-29 11:20:17
-
-

Hi, I am trying to create lineage between two datasets. Following the Spec, I can see the syntax for declaring the input and output datasets, and for all creating the associated Job (which I take to be the process in the middle joining the two datasets together). What I can't see is where in the specification to relate the job to the inputs and outputs. Do you have an example of this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-07-30 17:24:44
-
-

*Thread Reply:* The run event is always tied to exactly one job. It's up to the backend to store the relationship between the job and its inputs/outputs. E.g., in marquez, this is where we associate the input datasets with the job- https://github.com/MarquezProject/marquez/blob/main/api/src/main/java/marquez/db/OpenLineageDao.java#L132-L143

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-03 15:06:58
-
-

the OuputStatistics facet PR is updated based on your comments @Michael Collado https://github.com/OpenLineage/OpenLineage/pull/114

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- 🙌 Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-03 15:11:56
-
-

*Thread Reply:* /|~~~ - ///| - /////| - ///////| - /////////| - \==========|===/ -~~~~~~~~~~~~~~~~~~~~~

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-03 19:59:03
-
-

*Thread Reply:*

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-03 19:59:38
-
-

I have updated the DataQuality metrics proposal and the corresponding PR: https://github.com/OpenLineage/OpenLineage/issues/101 -https://github.com/OpenLineage/OpenLineage/pull/115

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 Willy Lulciuc, Bruno González -
- -
- 💯 Willy Lulciuc, Dominique Tipton -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Oleksandr Dvornik - (oleksandr.dvornik@getindata.com) -
-
2021-08-04 10:42:48
-
-

Guys, I've merged circleCI publish snapshot PR

- -

Snapshots can be found bellow: -https://datakin.jfrog.io/artifactory/maven-public-libs-snapshot-local/io/openlineage/openlineage-java/0.0.1-SNAPSHOT/ -openlineage-java-0.0.1-20210804.142910-6.jar -https://datakin.jfrog.io/artifactory/maven-public-libs-snapshot-local/io/openlineage/openlineage-spark/0.1.0-SNAPSHOT/ -openlineage-spark-0.1.0-20210804.143452-5.jar

- -

Build on main passed (edited)

- -
- - - - - - - -
- - -
- 🎉 Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-04 23:08:08
-
-

I added a mechanism to enforce spec versioning per: https://github.com/OpenLineage/OpenLineage/issues/63 -https://github.com/OpenLineage/OpenLineage/pull/140

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ben Teeuwen-Schuiringa - (ben.teeuwen@booking.com) -
-
2021-08-05 10:02:49
-
-

Hi all, at Booking.com we’re using Spline to extract granular lineage information from spark jobs to be able to trace lineage on column-level and the operations in between. We wrote a custom python parser to create graph-like structure that is sent into arangodb. But tbh, the process is far from stable and is not able to quickly answer questions like ‘which root input columns are used to construct column x’.

- -

My impression with openlineage thus far is it’s focusing on less granular, table input-output information. Is anyone here trying to accomplish something similar on a column-level?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-08-05 12:56:48
-
-

*Thread Reply:* Also interested in use case / implementation differences between Spline and OL. Watching this thread.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-05 14:46:44
-
-

*Thread Reply:* It would be great to have the option to produce the spline lineage info as OpenLineage. -To capture the column level lineage, you would want to add a ColumnLineage facet to the Output dataset facets. -Which is something that is needed in the spec. -Here is a proposal, please chime in: https://github.com/OpenLineage/OpenLineage/issues/148 -Is this something you would be interested to do?

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-09 19:49:51
-
-

*Thread Reply:* regarding the difference of implementation, the OpenLineage spark integration focuses on extracting metadata and exposing it as a standard representation. (The OpenLineage LineageEvents described in the JSON-Schema spec). The goal is really to have a common language to express lineage and related metadata across everything. We’d be happy if Spline can produce or consume OpenLineage as well and be part of that ecosystem.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ben Teeuwen-Schuiringa - (ben.teeuwen@booking.com) -
-
2021-08-18 08:09:38
-
-

*Thread Reply:* Does anyone know if the Spline developers are in this slack group?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ben Teeuwen-Schuiringa - (ben.teeuwen@booking.com) -
-
2022-08-03 03:07:56
-
-

*Thread Reply:* @Luke Smith how have things progressed on your side the past year?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-09 19:39:28
-
-

I have opened an issue to track the facet versioning discussion: -https://github.com/OpenLineage/OpenLineage/issues/153

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-09 20:16:18
-
-

I have updated the agenda to the OpenLineage monthly TSC meeting: -https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting -(meeting information bellow for reference, you can also DM me your email to get added to a google calendar invite)

- -

The OpenLineage Technical Steering Committee meetings are Monthly on the Second Wednesday 9:00am to 10:00am US Pacific and the link to join the meeting is https://us02web.zoom.us/j/81831865546?pwd=RTladlNpc0FTTDlFcWRkM2JyazM4Zz09 -All are welcome.

- -

Aug 11th 2021 -• Agenda: - ◦ Coming in OpenLineage 0.1 - ▪︎ OpenLineage spec versioning - ▪︎ Clients - ◦ Marquez integrations imported in OpenLineage - ▪︎ Apache Airflow: - • BigQuery  - • Postgres - • Snowflake - • Redshift - • Great Expectations - ▪︎ Apache Spark - ▪︎ dbt - ◦ OpenLineage 0.2 scope discussion - ▪︎ Facet versioning mechanism - ▪︎ OpenLineage Proxy Backend () - ▪︎ Kafka client - ◦ Roadmap - ◦ Open discussion -• Slides: https://docs.google.com/presentation/d/1Lxp2NB9xk8sTXOnT0_gTXicKX5FsktWa/edit#slide=id.ge80fbcb367_0_14

- - - -
- 🙌 Willy Lulciuc, Maciej Obuchowski, Dr Daniel Smith -
- -
- 💯 Willy Lulciuc, Dr Daniel Smith -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-11 10:05:27
-
-

*Thread Reply:* Just a reminder that this is in 2 hours

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-11 18:50:32
-
-

*Thread Reply:* I have added the notes to the meeting page: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-11 18:51:19
-
-

*Thread Reply:* The recording of the meeting is linked there: -https://us02web.zoom.us/rec/share/2k4O-Rjmmd5TYXzT-pEQsbYXt6o4V6SnS6Vi7a27BPve9aoMmjm-bP8UzBBzsFzg.uY1je-PyT4qTgYLZ?startTime=1628697944000 -• Passcode: =RBUj01C

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Avancini - (dpavancini@gmail.com) -
-
2021-08-11 13:30:52
-
-

Hi guys, great discussion today. Something we are particularly interested on is the integration with Airflow 2. I've been searching into Marquez and Openlineage repos and I couldn't find a clear answer on the status of that. I did some work locally to update the marquez-airflow package but I would like to know if someone else is working on this and maybe we could give it some help too.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-08-11 13:36:43
-
-

*Thread Reply:* @Daniel Avancini I'm working on it. Some changes in airflow made current approach unfeasible, so slight change in a way how we capture events is needed. You can take a look at progress here: https://github.com/OpenLineage/OpenLineage/tree/airflow/2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Avancini - (dpavancini@gmail.com) -
-
2021-08-11 13:48:36
-
-

*Thread Reply:* Thank you Maciej. I'll take a look

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-11 20:37:09
-
-

I have migrated the Marquez issues related to OpenLineage integrations to the OpenLineage repo

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-13 19:02:54
-
-

And OpenLineage 0.1.0 is out ! https://github.com/OpenLineage/OpenLineage/releases/tag/0.1.0

- - - -
- 🙌 Peter Hicks, Maciej Obuchowski, Willy Lulciuc, Oleksandr Dvornik, Luke Smith, Daniel Avancini, Matt Gee -
- -
- ❤️ Willy Lulciuc, Matt Gee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Oleksandr Dvornik - (oleksandr.dvornik@getindata.com) -
-
2021-08-16 11:42:24
-
-

PR ready for review

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-08-20 13:54:08
-
-

Anyone have experience parsing spark's logical plan to generate column-level lineage and DAGs with more human readable operations? I assume I could recreate a graph like the one below using the spark.logicalPlan facet. The analysts writing the SQL / spark queries aren't familiar with ShuffledRowRDD , MapPartitionsRDD, etc... It'd be better if I could convert this plan into spark SQL (or capture spark SQL as a facet at runtime).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-26 16:46:53
-
-

*Thread Reply:* The logicalPlan facet currently returns the Logical Plan, not the physical plan. This means you end up with expressions like Aggregate and Join rather than WholeStageCodegen and Exchange. I don't know if it's possible to reverse engineer the SQL- it's worth looking into the API and trying to find a way to generate that

- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-08-31 14:26:35
-
-

👋 Hi everyone!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-08-31 14:27:00
-
-

Nice to e-meet you 🙂 -I want to use OpenLineage integration for spark in my Azure Databricks clusters, but I am having problems with the configuration of the listener in the cluster, I was wondering if you could help me, if you know any tutorial for the integration of spark with Azure Databricks that can help me, or some more specific guide for this scenario, I would really appreciate it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-08-31 14:27:33
-
-

I added this configuration to my cluster :

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-08-31 14:28:37
-
-

I receive this error message:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-08-31 14:30:00
-
-

*Thread Reply:* Hey, @Erick Navarro 👋 . Are you using the openlineage-spark lib? (Note, the marquez-spark lib has been deprecated)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-08-31 14:43:20
-
-

*Thread Reply:* My team had this issue as well. Our read of the error is that Databricks attempts to register the listener before installing packages defined with either spark.jars or spark.jars.packages. Since the listener lib is not yet installed, the listener cannot be found. To solve the issue, we

- -
  1. copy the OL JAR to a staging directory on DBFS (we use /dbfs/databricks/init/lineage)
  2. using an init script, copy the JAR from the staging directory to the default JAR location for the Databricks driver -- /mnt/driver-daemon/jars
  3. Within the same init script, write the spark config parameters to a .conf file in /databricks/driver/conf (we use open-lineage.conf) -The .conf file will be read by the driver on initialization. It should follow this format (lineagehosturl should point to your API): -[driver] { -"spark.jars" = "/mnt/driver-daemon/jars/openlineage-spark-0.1-SNAPSHOT.jar" -"spark.extraListeners" = "com.databricks.backend.daemon.driver.DBCEventLoggingListener,openlineage.spark.agent.OpenLineageSparkListener" -"spark.openlineage.url" = "$lineage_host_url" -} -Your cluster must be configured to call the init script (enabling lineage for entire cluster). OL is not friendly to notebook-level init as far as we can tell.
  4. -
- -

@Willy Lulciuc -- I have some utils and init script templates that simplify this process. May be worth adding them to the OL repo along with a readme.

- - - -
- 🙏 Erick Navarro -
- -
- ❤️ Erick Navarro -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-08-31 14:51:46
-
-

*Thread Reply:* Absolutely, thanks for elaborating on your spark + OL deployment process and I think that’d be great to document. @Michael Collado what are your thoughts?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-31 14:57:02
-
-

*Thread Reply:* I haven't tried with Databricks specifically, but there should be no issue registering the OL listener in the Spark config as long as it's done before the Spark session is created- e.g., this example from the README works fine in a vanilla Jupyter notebook- https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#openlineagesparklistener-as-a-plain-spark-listener

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-31 15:11:37
-
-

*Thread Reply:* Looks like Databricks' notebooks come with a Spark instance pre-configured- configuring lineage within the SparkSession configuration doesn't seem possible- https://docs.databricks.com/notebooks/notebooks-manage.html#attach-a-notebook-to-a-cluster 😞

-
-
docs.databricks.com
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-31 15:11:53
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-08-31 15:59:38
-
-

*Thread Reply:* Right, Databricks provides preconfigured spark context / session objects. With Spline, you can set some cluster level config (e.g. spark.spline.lineageDispatcher.http.producer.url ) and install the library on the cluster, but then enable tracking at a notebook level with:

- -

%scala -import za.co.absa.spline.harvester.SparkLineageInitializer._ -sparkSession.enableLineageTracking() -In OL, it would be nice to install and config OL at a cluster level, but to enable it at a notebook level. This way, users could control whether all notebooks run on a cluster emit lineage or just those with lineage explicitly enabled.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-31 16:01:00
-
-

*Thread Reply:* Seems, at the very least, we need to provide a way to specify the job name at the notebook level

- - - -
- 👍 Luke Smith -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-08-31 16:03:50
-
-

*Thread Reply:* Agreed. I'd like a default that uses the notebook name that can also be overridden in the notebook.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-31 16:10:42
-
-

*Thread Reply:* if you have some insight into the available options, it would be great if you can open an issue on the OL project. I'll have to carve out some time to play with a databricks cluster and learn what options we have

- - - -
- 👍 Luke Smith -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-08-31 18:26:11
-
-

*Thread Reply:* Thank you @Luke Smith, the method you recommend works for me, the cluster is running and apparently it fetch the configuration it was my first progress in over a week testing openlineage in azure databricks. Thank you!

- -

Now I have this:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-08-31 18:52:15
-
-

*Thread Reply:* Is this error thrown during init or job execution?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-31 18:55:30
-
-

*Thread Reply:* this is likely a race condition- I've seen it happen for jobs that start and complete very quickly- things like defining temp views or similar

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-08-31 19:59:15
-
-

*Thread Reply:* During the execution of the job @Luke Smith, thank you @Michael Collado, that was exactly the scenario, the job that I executed was empty, now the cluster is running ok, I don't have errors, I have run some jobs successfully, but I don't see any information in my datakin explorer

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-08-31 20:00:46
-
-

*Thread Reply:* Awesome! Great to hear you’re up and running. For datakin specific questions, mind if we move the discussion to the datakin user slack channel?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-08-31 20:01:17
-
-

*Thread Reply:* Yes Willy, thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-09-02 10:06:00
-
-

*Thread Reply:* Hi , @Luke Smith, thank you for your help, are you familiar with this error in azure databricks when you use OL?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-09-02 10:07:07
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-09-02 10:17:17
-
-

*Thread Reply:* I found the solution here: -https://docs.microsoft.com/en-us/answers/questions/170730/handshake-fails-trying-to-connect-from-azure-datab.html

-
-
docs.microsoft.com
- - - - - - - - - - - - - - - -
- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-09-02 10:17:28
-
-

*Thread Reply:* It works now! 😄

- - - -
- 👍 Luke Smith, Maciej Obuchowski, Minkyu Park, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-09-02 16:33:01
-
-

*Thread Reply:* @Erick Navarro This might be a helpful to add to our openlineage spark docs for others trying out openlineage-spark with Databricks. Let me know if that’s something you’d like to contribute 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-09-02 19:59:10
-
-

*Thread Reply:* Yes of course @Willy Lulciuc, I will prepare a small tutorial for my colleagues and I will share it with you 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-09-02 20:44:36
-
-

*Thread Reply:* Awesome. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-02 03:47:35
-
-

Hello everyone! I am currently evaluating OpenLineage and am finding it very interesting as Prefect is in the list of integrations. However, I am not seeing any documentation or code for this. How far are you from supporting Prefect?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-02 04:57:55
-
-

*Thread Reply:* Hey! If you mean this picture, it provides concept of how OpenLineage works, not current state of integration. We don't have Prefect support yet; hovewer, it's on our roadmap.

-
- - - - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-02 05:22:15
-
-

*Thread Reply:* great, thanks 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-02 11:49:48
-
-

*Thread Reply:* @Thomas Fredriksen Feel free to chime in the github issue Maciej linked if you want.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-09-02 13:13:05
-
-

What's the timeline to support spark 3.0 within OL? One breaking change we've found is within DatasetSourceVisitor.java -- the DataSourceV2 is deprecated in spark 3.0. There may be other issues we haven't found yet. Is there a good feel for the scope of work required to make OL spark 3.0 compatible?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-02 14:28:11
-
-

*Thread Reply:* It is being worked on right now. @Oleksandr Dvornik is adding an integration test in the build so that we run test for both spark 2.4 and spark 3. Please open an issue with the stack trace if you can. From our perspective, it should be mostly compatible with a few exceptions like this one that we’d want to add test cases for.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-02 14:36:19
-
-

*Thread Reply:* The goal is to be able to make a release in the next few weeks. The integration is being used with Spark 3 already.

- - - -
- 🙌 Luke Smith -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-09-02 15:50:14
-
-

*Thread Reply:* Great, I'll take some time to open an issue for this particular issue and a few others.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-09-02 17:33:08
-
-

*Thread Reply:* are you actually using the DatasetSource interface in any capacity? Or are you just scanning the source code to find incompatibilities?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-09-03 12:36:20
-
-

*Thread Reply:* Turns out this has more to do with a how Databricks handles the delta format. It's related to https://github.com/AbsaOSS/spline-spark-agent/issues/96.

-
- - - - - - - -
-
Labels
- question -
- -
-
Comments
- 5 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-09-03 13:42:43
-
-

*Thread Reply:* I haven't been chasing this issue down on my team -- turns out some things were lost in communication. There are really two problems here:

- -
  1. When attempting to do delta I/O with Spark 3 on Databricks, e.g. -insert into . . . values . . . -We get an error related to DataSourceV2: -java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation.source()Lorg/apache/spark/sql/sources/v2/DataSourceV2;
  2. Using Spline, which is Spark 3 compatible, we have issues with the way Databricks handles delta table io. This is related: https://github.com/AbsaOSS/spline-spark-agent/issues/96
  3. -
- -

So there are two stacked issues related to spark 3 on Databricks with delta IO, not just one. Hope this clears things up.

-
- - - - - - - -
-
Labels
- question -
- -
-
Comments
- 5 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-09-03 13:44:54
-
-

*Thread Reply:* So, the first issue is OpenLineage related directly, and the second issue applies to both OpenLineage and Spline?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-09-03 13:45:49
-
-

*Thread Reply:* Yes, that's my read of what I'm getting from others on the team.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-09-03 13:46:56
-
-

*Thread Reply:* For the first issue- can you give some details about the target of the INSERT INTO... ? Is it a data source defined in Databricks? a Hive table? a view on GCS?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-09-03 13:47:40
-
-

*Thread Reply:* oh, it's a Delta table?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-09-03 14:48:15
-
-

*Thread Reply:* Yes, it's created via

- -

CREATE TABLE . . . using DELTA location "/dbfs/mnt/ . . . "

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-02 14:28:53
-
-

I have opened a PR to fix some outdated language in the spec: https://github.com/OpenLineage/OpenLineage/pull/241 Thank you @Mandy Chessell for the feedback

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-02 14:37:27
-
-

The next OpenLineage monthly meeting is next week. Please chime in this thread if you’d like something added to the agenda

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
marko - (marko.kristian.helin@gmail.com) -
-
2021-09-04 12:53:54
-
-

*Thread Reply:* Apache Beam integration? I have a very crude integration at the moment. Maybe it’s better to integrate on the orchestration level (airflow, luigi). Thoughts?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-05 13:06:19
-
-

*Thread Reply:* I think it makes a lot of sense to have a Beam level integration similar to the spark one. Feel free to post a draft PR if you want to share.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-07 21:04:09
-
-

*Thread Reply:* I have added Beam as a topic for the roadmap discussion slide: https://docs.google.com/presentation/d/1fI0u8aE0iX9vG4GGrnQYAEcsJM9z7Rlv/edit#slide=id.ge7d4b64ef4_0_0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-07 21:03:08
-
-

I have prepared slides for the OpenLineage meeting tomorrow morning: https://docs.google.com/presentation/d/1fI0u8aE0iX9vG4GGrnQYAEcsJM9z7Rlv/edit#slide=id.ge7d4b64ef4_0_0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-07 21:03:32
-
-

*Thread Reply:* There will be a quick demo of the dbt integration (thanks @Willy Lulciuc!)

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-07 21:05:13
-
-

*Thread Reply:* Information to join and archive of previous meetings: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-08 14:49:52
-
-

*Thread Reply:* The recording and notes are now available: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Venkatesh Tadinada - (venkat@mlacademy.io) -
-
2021-09-08 21:58:09
-
-

*Thread Reply:* Good meeting today. @Julien Le Dem. Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Shreyas Kaushik - (shreyask@gmail.com) -
-
2021-09-08 04:03:29
-
-

Hello, was looking to get some lineage out for BQ in my Airflow DAGs and saw that the BQ extractor here - https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/openlineage/airflow/extractors/bigquery_extractor.py#L47 is using an operator that has been deprecated by Airflow - https://github.com/apache/airflow/blob/main/airflow/contrib/operators/bigquery_operator.py#L44 and most of my DAGs are using the operator BigQueryExecuteQueryOperator mentioned there. I presume with this lineage extraction wouldn’t work and some work is needed to support both these operators with the same ( or differnt) extractor. Is that correct or am I missing something ?

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-08 04:27:04
-
-

*Thread Reply:* We're working on updating our integration to airflow 2. Some changes in airflow made current approach unfeasible, so slight change in a way how we capture events is needed. You can take a look at progress here: https://github.com/OpenLineage/OpenLineage/tree/airflow/2

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Shreyas Kaushik - (shreyask@gmail.com) -
-
2021-09-08 04:27:38
-
-

*Thread Reply:* Thanks @Maciej Obuchowski When is this expected to land in a release ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Zagales - (dzagales@gmail.com) -
-
2021-11-11 06:35:24
-
-

*Thread Reply:* hi @Maciej Obuchowski I wanted to follow up on this to understand when the more recent BQ Operators will be supported, specifically BigQueryInsertJobOperator

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-11 22:30:31
-
-

The PR to separate facets in their own file (and allowing versioning them independently) is now available: https://github.com/OpenLineage/OpenLineage/pull/118

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jose Badeau - (jose.badeau@gmail.com) -
-
2021-09-13 03:46:20
-
-

Hi, new to the channel but I think OL is a great initiative. Currently we are focused on beam/spark/delta but are moving to beam/flink/iceberg and I’m happy to help where I can.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-09-13 15:40:01
-
-

*Thread Reply:* Welcome, @Jose Badeau 👋. That’s exciting to hear as we have Beam, Flink and Iceberg on our roadmap! Your welcome to join the discussion :)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-13 20:56:11
-
-

Per the discussion last week, Ryan updated the metadata that would be available in Iceberg: https://github.com/OpenLineage/OpenLineage/issues/167#issuecomment-917237320

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-13 21:00:54
-
-

I have also created tickets for follow up discussions: (#269 and #270): https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 04:50:22
-
-

Hello. I find OpenLineage an interesting tool however can someone help me with integration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 04:52:50
-
-

I am trying to capture lineage from spark 3.1.1 but when executing i constantly get: java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2.writer()Lorg/apache/spark/sql/sources/v2/writer/DataSourceWriter; - at openlineage.spark.agent.lifecycle.plan.DatasetSourceVisitor.findDatasetSource(DatasetSourceVisitor.java:57) as if i would be using openlineage on wrong spark version (2.4) I have tried also spark jar from branch feature/itspark3. Is there any branch or release that works or can be tried with spark 3+?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Oleksandr Dvornik - (oleksandr.dvornik@getindata.com) -
-
2021-09-14 05:03:45
-
-

*Thread Reply:* Hello Tomas. We are currently working on support for spark v3. Can you please raise an issue with stack trace, that would help us to track and solve it. We are currently adding integration tests. Next step would be fix changes in method signatures for v3 (that's what you have)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 05:12:45
-
-

*Thread Reply:* Hi @Oleksandr Dvornik i raised https://github.com/OpenLineage/OpenLineage/issues/272

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Oleksandr Dvornik -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 08:47:39
-
-

I also tried to downgrade spark to 2.4.0 and retry with 0.2.2 but i also faced issue.. so my preferred way would be to push for spark 3.1.1 but depends a bit on when you plan to release version supporting it. As backup plan i would try spark 2.4.0 but this is blocking me also: https://github.com/OpenLineage/OpenLineage/issues/274

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-14 08:55:44
-
-

*Thread Reply:* I think this might be actually spark issue: https://stackoverflow.com/questions/53787624/spark-throwing-arrayindexoutofboundsexception-when-parallelizing-list/53787847

-
-
Stack Overflow
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-14 08:56:10
-
-

*Thread Reply: Can you try newer version in 2.4.* line, like 2.4.7?

- - - -
- 👀 Tomas Satka -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-14 08:57:30
-
-

*Thread Reply:* This might be also spark 2.4 with scala 2.12 issue - I'd recomment 2.11 versions.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 09:04:26
-
-

*Thread Reply:* @Maciej Obuchowski with 2.4.7 i get following exc:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 09:04:27
-
-

*Thread Reply:* 21/09/14 15:03:25 WARN RddExecutionContext: Unable to access job conf from RDD -java.lang.NoSuchFieldException: config$1 - at java.base/java.lang.Class.getDeclaredField(Class.java:2411)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 09:04:48
-
-

*Thread Reply:* i can also try to switch to 2.11 scala

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 09:05:37
-
-

*Thread Reply:* or do you have some recommended setup that works for sure?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-14 09:09:58
-
-

*Thread Reply:* One more check - you're using Java 8 with this, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-14 09:10:17
-
-

*Thread Reply:* This is what works for me: --&gt; % cat tools/spark-2.4/RELEASE -Spark 2.4.8 (git revision 4be4064) built for Hadoop 2.7.3 -Build flags: -B -Pmesos -Pyarn -Pkubernetes -Pflume -Psparkr -Pkafka-0-8 -Phadoop-2.7 -Phive -Phive-thriftserver -DzincPort=3036

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-14 09:11:23
-
-

*Thread Reply:* spark-shell: -Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 09:12:05
-
-

*Thread Reply:* awesome let me try 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 09:26:00
-
-

*Thread Reply:* data has been sent to marquez. coolio. however i noticed nullpointer being thrown: 21/09/14 15:23:53 ERROR AsyncEventQueue: Listener OpenLineageSparkListener threw an exception -java.lang.NullPointerException - at io.openlineage.spark.agent.OpenLineageSparkListener.onJobEnd(OpenLineageSparkListener.java:164) - at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:39) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92) - at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92) - at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) - at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 10:59:45
-
-

*Thread Reply:* closed related issue #274

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 11:02:42
-
-

does openlineage capture streaming in spark? as this example is not showing me anything unless i replace readStream() with batch read() and writeStream() with write() -```SparkSession.Builder builder = SparkSession.builder(); - SparkSession session = builder - .appName("quantweave") - .master("local[**]") - .config("spark.jars.packages", "io.openlineage:openlineage_spark:0.2.2") - .config("spark.extraListeners", "io.openlineage.spark.agent.OpenLineageSparkListener") - .config("spark.openlineage.url", "http://localhost:5000/api/v1/namespaces/spark_integration/") - .getOrCreate();

- -
    Dataset&lt;Row&gt; df = session
-            .readStream()
-            .format("kafka")
-            .option("kafka.bootstrap.servers", "localhost:9092")
-            .option("subscribe", "topic1")
-            .option("startingOffsets", "earliest")
-            .load();
-
-    Dataset&lt;Row&gt; dff = df
-            .selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)").as("data");
-
-    dff
-            .writeStream()
-            .format("kafka")
-            .option("kafka.bootstrap.servers", "localhost:9092")
-            .option("topic", "topic2")
-            .option("checkpointLocation", "/tmp/checkpoint")
-            .start();```
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-14 13:38:09
-
-

*Thread Reply:* Not at the moment, but it is in scope. You are welcome to open an issue with your example to track this or even propose an implementation if you have the time.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Oleksandr Dvornik - (oleksandr.dvornik@getindata.com) -
-
2021-09-14 15:12:01
-
-

*Thread Reply:* @Tomas Satka it would be great, if you can add an containerized integration test for kafka with your test case. You can take this as an example here

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 18:02:05
-
-

*Thread Reply:* Hi @Oleksandr Dvornik i wrote a test for simple read/write from kafka topic using kafka testcontainer. However i discovered a bug. When writing to kafka topic getting java.lang.IllegalArgumentException: One of the following options must be specified for Kafka source: subscribe, subscribepattern, assign. See the docs for more details.

- -

• How would you like me to add the test? Fork openlineage and create PR

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 18:02:50
-
-

*Thread Reply:* • Shall i raise bug for writing to kafka that should have only "topic" instead of "subscribe"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 18:03:42
-
-

*Thread Reply:* • Since i dont know expected payload to openlineage mock server can somebody help me to create it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Oleksandr Dvornik - (oleksandr.dvornik@getindata.com) -
-
2021-09-14 19:06:41
-
-

*Thread Reply:* Hi @Tomas Satka, yes you should create a fork and raise a PR from that. For more details, please take a look at. Not sure about kafka, cause we don't have that integration yet. About expected payload, as a first step, I would suggest to leave that test without assertion for now. Second step would be investigation (what we can get from that plan node). Third step - implementation and asserting a payload. Basically we parse spark optimized plan, and get as much information as we can for specific implementation. You can take a look at recent PR for HIVE. We visit root node and leaves to get output datasets and input datasets accordingly.

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-15 04:37:59
-
-

*Thread Reply:* Hi @Oleksandr Dvornik PR for step one : https://github.com/OpenLineage/OpenLineage/pull/279

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Oleksandr Dvornik -
- -
- 🙌 Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-09-14 15:52:41
-
-

There may not be an answer to these questions yet, but I'm curious about the plan for Tableau lineage.

- -

• How will this integration be packaged and attached to Tableau instances? - ◦ via Extensions API, REST API? -• What is the architecture? -https://github.com/OpenLineage/OpenLineage/issues/78

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-15 01:58:37
-
-

Hi everyone - Following up on my previous post on prefect. The technical integration does not seem very difficult, but I am wondering about how to structure the lineage logic. -Is it the case that each prefect task should be mapped to a lineage job? If so, how do we connect the jobs together? Does there have to be a dataset between each job? -I am OpenLineage with Marquez by the way

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 09:19:23
-
-

*Thread Reply:* Hey Thomas!

- -

Following what we do with Airflow, yes, I think that each task should be mapped to job.

- -

You don't need datasets between each tasks. It's necessary only where you consume and produce datasets - and it does not matter where in uour job graph you've produced them.

- -

To map tasks togther In Airflow, we use ParentRunFacet , and the same approach could be used here. In Prefect, I think using flow_run_id would work.

- - - -
- 👍 Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-15 09:26:21
-
-

*Thread Reply:* this is very helpful, thank you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-15 09:26:43
-
-

*Thread Reply:* what would be the namespace used in the Job -definition of each task?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 09:31:34
-
-

*Thread Reply:* In contrast to dataset namespaces - which we try to standardize, job namespaces should be provided by user, or operator of particular scheduler.

- -

For example, it would be good if it helped you identify Prefect instance where the job was run.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 09:32:23
-
-

*Thread Reply:* If you use openlineage-python client, you can provide namespace either in client constuctor, or via OPENLINEAGE_NAMESPACE env variable.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-15 09:32:55
-
-

*Thread Reply:* awesome, thank you 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-15 17:03:07
-
-

*Thread Reply:* Hey @Thomas Fredriksen - just chiming in, I’m also keen for a prefect integration. Let me know if I can help out at all

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-15 17:27:20
-
-

*Thread Reply:* Please chime in on https://github.com/OpenLineage/OpenLineage/issues/81

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-15 18:29:20
-
-

*Thread Reply:* Done!

- - - -
- ❤️ Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 00:06:41
-
-

*Thread Reply:* For now I'm prototyping in a separate repo https://github.com/limx0/caching_flow_runner/tree/open_lineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-17 01:55:08
-
-

*Thread Reply:* I really like your PR, @Brad. I think that using FlowRunner and TaskRunner may be a more "proper" way of doing this, as opposed as adding a state-handler to each task the way I do it.

- -

How are you dealing with Prefect-library tasks such as the included BigQuery-tasks and such? Is it necessary to create DatasetTask for them to show up in the lineage graph?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:04:19
-
-

*Thread Reply:* Hey @Thomas Fredriksen! At the moment I'm not dealing with any task-specific things. The plan (in my head, and after speaking with another prefect user @davzucky) would be that we add a LineageTask subclass where you could define custom facets on a per task basis

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:05:21
-
-

*Thread Reply:* or some sort of other hook where basically you would define some lineage attribute or put something in the prefect.context that the TaskRunner would find and attach

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:06:23
-
-

*Thread Reply:* Sorry I misread your question - any tasks should be automatically tracked (I believe but have not tested yet!)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-17 02:16:02
-
-

*Thread Reply:* @Brad Could you elaborate a bit on your ideas around adding custom context attributes?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:21:57
-
-

*Thread Reply:* yeah so basically we just need some hooks that you can easily access from the task decorator or somewhere else that we can pass through to the open lineage adapter to do things like custom facets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:24:31
-
-

*Thread Reply:* like for your bigquery example - you might want to record some facets like in https://github.com/OpenLineage/OpenLineage/blob/main/integration/common/openlineage/common/provider/bigquery.py and we need a way to do that with the Prefect bigquery task

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:28:28
-
-

*Thread Reply:* @davzucky

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-17 02:29:12
-
-

*Thread Reply:* I see. Is this supported by the airflow-integration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:29:32
-
-

*Thread Reply:* I think so, yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:30:51
- -
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:31:54
-
-

*Thread Reply:* (I don't actually use airflow or bigquery - but for my own use case I can see wanting to do thing like this)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-17 03:18:27
-
-

*Thread Reply:* Interesting, I like how dynamic this is

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Chris Baynes - (chris@contiamo.com) -
-
2021-09-15 09:09:21
-
-

HI all, I have a clarification question about dataset namespaces. What's the difference between a dataset namespace (in the input/output) and a dataSource name (in the dataSource facet)? -The dbt integration appears to set those to the same value (e.g. <snowflake://myprofile>), however it seems that Marquez assumes the dataset namespace to be a more generic concept (similar to a nice user provided name like the job namespace).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 09:29:25
-
-

*Thread Reply:* Hey. -Generally, dataSource name should be namespace of particular dataset.

- -

In some cases, like Postgres, dataSource facet is used to provide additionally connection strings, with info like particular host and port that we're connected to.

- -

In case of Snowflake - or Bigquery, or S3, or multiple systems where we have only "global" instance, so the dataSource facet does not carry any other additional information.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Chris Baynes - (chris@contiamo.com) -
-
2021-09-15 10:11:19
-
-

*Thread Reply:* Thanks. So then perhaps marquez could differentiate a bit more between job & dataset namespaces. Right now it doesn't quite feel right to have a single global list of namespaces for jobs & datasets, especially as they also have a separate concept of sources (which are not in a namespace).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 10:18:59
-
-

*Thread Reply:* @Willy Lulciuc what do you think?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Chris Baynes - (chris@contiamo.com) -
-
2021-09-15 10:41:20
-
-

*Thread Reply:* As an example, in marquez I have this list of namespaces (from some sample data): dbt-sales, default, <snowflake://my-account1>, <snowflake://my-account2>. -I think the new marquez UI with the nice namespace dropdown and job/dataset search is awesome, and I'd expect to be able to filter by job namespace everywhere, but how about being able to filter datasets by source (which would be populated by the OL dataset namespace) and not persist dataset namespaces in the global namespace table?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-15 18:38:03
-
-

The dbt integration (https://github.com/OpenLineage/OpenLineage/tree/main/integration/dbt) is pretty awesome but there are still a few improvements we could make. -Here are a few thoughts. -• In dbt-ol if the configuration is wrong or missing we will fail silently. This one seems like a good first thing to fix by logging the error to stdout -• We need to wait until the end to know if it worked at all. It would be nice if we checked the config at the beginning and display an error right away. Possibly by adding a parent job/run with a start event at the beginning and an end event at the end when all is done. -• While we are sending events at the end the console will hang until it’s done. It’s not clear that progress is made. We could have a simple progress bar by printing a dot for every event sent. (ex: sending 10 OpenLineage events: .........) -• We could also write at the beginning that the OL events will be sent at the end so that the user knows what to expect. -What do you think? (@Maciej Obuchowski in particular, but anyone using dbt in general)

- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-15 18:43:18
-
-

*Thread Reply:* Last point is that we should persist the configuration and not just have it in environment variables. What is the best way to do this in dbt?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 18:49:21
-
-

*Thread Reply:* We could have something similar to https://docs.getdbt.com/dbt-cli/configure-your-profile - or even put our config in there

- - - -
- ❤️ Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 18:51:42
-
-

*Thread Reply:* I think we should assume that variables/config should be set and valid - and fail the run if they aren't. After all, if someone wouldn't need lineage events, they wouldn't use our wrapper.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 18:56:36
-
-

*Thread Reply:* 3rd point would be easy to address if we could send events async/in parallel. But there could be dataset version dependencies, and we don't want to get into needless complexity of recognizing that, building a dag etc.

- -

We could batch events if the network roundtrips are responsible for majority of the slowdown. However, we can't assume any particular environment.

- -

Maybe just notifying about the progress is the best thing we can do right now.

- - - -
- 👀 Mario Measic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 18:58:22
-
-

*Thread Reply:* About second point, I want to add recognizing if we already have a parent run - for example, if running via airflow. If not, creating run for this purpose is a good idea.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-15 21:31:35
-
-

*Thread Reply:* @Maciej Obuchowski can you open github issues to propose those changes?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 09:11:31
-
-

*Thread Reply:* Done

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2021-09-16 12:05:10
-
-

*Thread Reply:* FWIW, I have been putting my config in ~/.openlineage/config so it can be mapped into a container

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 17:56:23
-
-

*Thread Reply:* Makes sense, also, all clients could use that config

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-18 04:47:08
-
-

*Thread Reply:* if dbt could actually stream the events, that would be great.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-18 09:59:12
-
-

*Thread Reply:* Unfortunately, this seems very unlikely for now, due to the fact that we rely on metadata files that dbt only produces after end of execution.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-15 22:52:09
-
-

The split of facets in their own schemas is ready to be merged: https://github.com/OpenLineage/OpenLineage/pull/118

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 00:12:02
-
-

Hey @Julien Le Dem I'm going to start a thread here for any issues I run into trying to build a prefect integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 00:16:44
-
-

*Thread Reply:* This might be useful to others https://github.com/OpenLineage/OpenLineage/pull/284

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 00:18:44
-
-

*Thread Reply:* So I'm trying to push a simple event to marquez, but getting the following response: -'{"code":400,"message":"Unable to process JSON"}' -The JSON I'm pushing:

- -

{ - "eventTime": "2021-09-16T04:00:28.343702", - "eventType": "START", - "inputs": {}, - "job": { - "facets": {}, - "name": "prefect.core.parameter.p", - "namespace": "default" - }, - "producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.0.0/integration/prefect>", - "run": { - "facets": {}, - "runId": "3bce33cb-9495-4c58-b326-6aac71634ace" - } -} -Does anything look obviously wrong here?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
marko - (marko.kristian.helin@gmail.com) -
-
2021-09-16 02:41:11
-
-

*Thread Reply:* What I did previously when debugging something like this was to remove half of the payload until I found the culprit. Binary search essentially. I was running Marquez locally, so probably could’ve enabled better logging as well. -Aren’t inputs and facets arrays?

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 03:14:54
-
-

*Thread Reply:* Thanks for the response @marko - this is a greatly reduced payload already (but I'll keep going). Yep they are supposed to be arrays (I've since fixed that)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 03:46:01
-
-

*Thread Reply:* okay it was my timestamp 🥲

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 19:07:16
-
-

*Thread Reply:* Okay - I've got a simply working example now https://github.com/limx0/caching_flow_runner/blob/open_lineage/caching_flow_runner/task_runner.py

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 19:07:37
-
-

*Thread Reply:* I might move this into a proper PR @Julien Le Dem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 19:08:12
-
-

*Thread Reply:* Successfully got a basic prefect flow working

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 02:11:53
-
-

A question about DatasetType - is there a representation for a file-like type? For files stored in S3/FTP/NFS etc (assuming a fully resolvable url)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 09:53:24
-
-

*Thread Reply:* I think there was some talk somewhere to actually drop the DatasetType concept; can't find where though.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 10:04:09
-
-

*Thread Reply:* I've taken a look at your repo. Looks great so far!

- -

One thing I've noticed I don't think you need to use any stuff from Marquez to emit events. It's lineage ingestion API is deprecated - you can just use openlineage-python client. If there's something you think it's missing from it, feel free to write that here or open issue.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 17:12:31
-
-

*Thread Reply:* And would that be replaced by just some Input/Output notion @Maciej Obuchowski?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 17:13:26
-
-

*Thread Reply:* Oh yeah I got a little confused by the single lineage endpoint - but I’ve realised how it all works now. I’m still using the marquez backend to view things but I’ll use the openlineage-client to talk to it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 17:34:46
-
-

*Thread Reply:* Yes 🙌

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-16 06:04:30
-
-

When trying to fix failing checks, i see integration-test-integration-airflow to fail -```#!/bin/bash -eo pipefail -if [[ GCLOUDSERVICEKEY,GOOGLEPROJECTID == "" ]]; then - echo "No required environment variables to check; moving on" -else - IFS="," read -ra PARAMS <<< "GCLOUDSERVICEKEY,GOOGLEPROJECTID"

- -

for i in "${PARAMS[@]}"; do - if [[ -z "${!i}" ]]; then - echo "ERROR: Missing environment variable {i}" >&2

- -
  if [[ -n "" ]]; then
-    echo "" &gt;&amp;2
-  fi
-
-  exit 1
-else
-  echo "Yes, ${i} is defined!"
-fi
-
- -

done -fi

- -

ERROR: Missing environment variable {i}

- -

Exited with code exit status 1 -CircleCI received exit code 1``` -However i havent touch airflow at all.. can somebody help please?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 06:59:34
-
-

*Thread Reply:* Hey, Airflow integration tests do not pass env variables to PRs from forks due to security reasons - everyone could create malicious PR and dump secrets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 07:00:29
-
-

*Thread Reply:* So, they will fail and there's nothing to do from your side 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 07:00:55
-
-

*Thread Reply:* We probably should split those into ones that don't touch external systems, and run those for all PRs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-16 07:08:03
-
-

*Thread Reply:* ah okie. good to know. -and in build-integration-spark Could not resolve all artifacts. Is that also known issue? Or something from my side that i could fix?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 07:11:12
-
-

*Thread Reply:* Looks like gradle server problem? -&gt; Could not get resource '<https://plugins.gradle.org/m2/com/diffplug/spotless/spotless-lib/2.13.2/spotless-lib-2.13.2.module>'. - &gt; Could not GET '<https://plugins.gradle.org/m2/com/diffplug/spotless/spotless-lib/2.13.2/spotless-lib-2.13.2.module>'. Received status code 500 from server: Internal Server Error

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 07:34:44
-
-

*Thread Reply:* After retry, there's spotless error:

- -

+········.orElse(Collections.emptyList()).stream()

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 07:35:15
-
-

*Thread Reply:* I think this is due to mismatch between behavior of spotless in Java 8 and Java 11+ - which you probably used 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-16 07:40:01
-
-

*Thread Reply:* ah.. i used java11. so shall i rerun something with java8 setup as sdk?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 07:44:31
-
-

*Thread Reply:* For spotless, you can just fix this one line 🙂 -Though I don't guarantee that tests that run later will pass, so you might need Java 8 for later testing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-16 08:04:36
-
-

*Thread Reply:* yup looks better now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-16 08:04:41
-
-

*Thread Reply:* thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-16 14:27:02
-
-

*Thread Reply:* will somebody please review my PR? had to already adjust due to updates on same test class 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 20:36:28
-
-

Hey team - I've opened https://github.com/OpenLineage/OpenLineage/pull/293 for a very WIP prefect integration

- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 20:37:27
-
-

*Thread Reply:* @Thomas Fredriksen would love any feedback

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-17 04:21:13
-
-

*Thread Reply:* nicely done! As we discussed in another thread - the way you have implemented lineage using FlowRunner and TaskRunner is likely the best way to do this. Let me know if you need any help, I would love to see this PR get merged!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-17 07:28:33
-
-

*Thread Reply:* Hey @Brad, it looks great!

- -

I've seen you're using task_qualified_name to name datasets and I don't think it's the right way. -I'd take a look at naming conventions here: https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

- -

Getting that right is key to making sure that lineage is properly tracked between systems - for example, if you use Prefect to schedule dbt runs or pyspark jobs, the unified naming makes sure that all those integrations properly refer to the same dataset.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 08:12:50
-
-

*Thread Reply:* Hey @Maciej Obuchowski thanks for the feedback. Yep the naming was a bit of a placeholder. Open to any recommendations.. I think things like dbt or pyspark are straight forward (we could add special handling for tasks like that) but what about regular transformation type tasks that run in a scheduler? Do you have any naming preference? Say I just had some pandas transform task in prefect for example

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-17 08:28:04
-
-

*Thread Reply:* First of all, not all tasks are producing and consuming datasets. For example, I wouldn't expect any of the Github tasks to have any datasets.

- -

Second, in Airflow we have a concept of Extractor where you can write specialized code to expose datasets. For example, for BigQuery we extract datasets from query plan. Now, I'm not sure if this concept would translate well to Prefect - but if yes, then we have some helpers inside openlineage common library that could be reused. Also, this way allows to emit additional facets, some of which are really useful - like query statistics for BigQuery, and data quality tests for dbt.

- -

Third, if we're talking about generalized tasks like FunctionTask or ShellTask, then I think the right way is to expose functionality to user to expose lineage themselves. I'm not sure how exactly that would look in Prefect.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-19 23:03:14
-
-

*Thread Reply:* You've raised some good points @Maciej Obuchowski - I might have been thinking about this integration in slightly the wrong way. I think based on your comments I'll refactor some of the code to hook into the Results object in prefect (The Result object is the way in which data is serialized and persisted).

- -

> Now, I'm not sure if this concept would translate well to Prefect - but if yes, then we have some helpers inside openlineage common library that could be reused -This definitely applies to prefect and the similar tasks exist in prefect and we should definitely leverage the common library in this case.

- -

> Third, if we're talking about generalized tasks like FunctionTask or ShellTask, then I think the right way is to expose functionality to user to expose lineage themselves. I'm not sure how exactly that would look in Prefect. -Yeah I agree with this. I'd like to make it as easy a possible to opt-in, but I think you're right that there needs to be some hooks for user defined lineage. I'll think about this a little more.

- -

> First of all, not all tasks are producing and consuming datasets. For example, I wouldn't expect any of the Github tasks to have any datasets. -My initial thoughts here were that it would still be good to have lineage as these tasks do have side effects, and downstream consumers of the lineage data might want to know about these tasks. However I don't have a good feeling yet how best to do this, so I'm going to park those thoughts for now.

- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-20 06:30:51
-
-

*Thread Reply:* > Yeah I agree with this. I'd like to make it as easy a possible to opt-in, but I think you're right that there needs to be some hooks for user defined lineage. I'll think about this a little more. -First version of an integration doesn't have to be perfect. in particular, not handling this use case would be okay, since it does not lock us into some particular way of doing it later.

- -

> My initial thoughts here were that it would still be good to have lineage as these tasks do have side effects, and downstream consumers of the lineage data might want to know about these tasks. However I don't have a good feeling yet how best to do this, so I'm going to park those thoughts for now. -I'd think of two options first, before modeling it as a dataset: -Won't existence of a event be enough? After all, we'll still have it despite it not having any input and output datasets. -If not, then wouldn't custom run or job facet be a better fit?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-23 17:27:49
-
-

*Thread Reply:* > Won’t existence of a event be enough? After all, we’ll still have it despite it not having any input and output datasets. -Duh, yep you’re right @Maciej Obuchowski, I’m over thinking this. I’m going to clean this up based on your comments

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-10-06 03:39:28
-
-

*Thread Reply:* Hi @Brad. How will this integration work for Prefect flows running in Prefect Cloud or on Prefect Server?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-06 03:40:44
-
-

*Thread Reply:* Hi @Thomas Fredriksen - it'll relate to the agent actually - you'll need to pass the flow runner class to the agent when running

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-10-06 03:48:14
-
-

*Thread Reply:* nice!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-06 03:48:54
-
-

*Thread Reply:* Unfortunately I've been a little busy the past week, and I will be for the rest of this week

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-06 03:49:09
-
-

*Thread Reply:* but I do plan to pick this up next week

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-06 03:49:23
-
-

*Thread Reply:* (the additional changes I mention above)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-10-06 03:50:08
-
-

*Thread Reply:* looking forward to it 🙂 let me know if you need any help!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-06 03:50:34
-
-

*Thread Reply:* yeah when I get this next lot of stuff in - I'd love for people to test it out

- - - -
- 🙌 Thomas Fredriksen, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Pocock - (adam.pocock@oracle.com) -
-
2021-09-20 17:38:51
-
-

Is there a preferred academic citation for OpenLineage? I’m writing a paper on the provenance system in our machine learning library, and I’d like to cite OpenLineage as an example of future work on data lineage to integrate with.

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-20 19:18:53
-
-

*Thread Reply:* I think you can reffer to https://openlineage.io/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-20 19:31:30
-
-

We’re starting to see the beginning of larger contributions (Spark streaming, prefect, …) and I think we need to define a way to accept those contributions incrementally. -If we take the example of Streaming (Spark streaming, Flink or Beam) support (but really this applies in general, sorry to pick on you Tomas, this is great!): -The first Spark streaming PR ( https://github.com/OpenLineage/OpenLineage/pull/279 ) lays the ground work for testing spark streaming but there’s more work to have a full feature. -I’m in favor of merging Spark streaming support into main once it’s working end to end (possibly with partial input/output coverage). -So I see 2 options:

- -
  1. start a branch for spark streaming support. Have PRs like this one go into it until it’s completed (smaller reviews). Then merge the whole thing as a PR in main when it’s finished
  2. Keep working on that PR until it’s fully implemented, but it will get big, and make reviews difficult. -I have seen the model 1) work well. It’s easier to do multiple smaller reviews for larger projects.
  3. -
-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
- 👍 Ross Turk, Maciej Obuchowski, Faouzi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Endrion - (yannick.endrion@gmail.com) -
-
2021-09-24 05:10:04
-
-

Thank you @Ross Turk for this really useful article: https://openlineage.io/blog/dbt-with-marquez/?s=03 -Is anyone aware of additional environment being supported by the dbt<->OpenLineage<->Marquez integration ? I think only Snowflake and BigQuery are supported now. -I am really interested by SQLServer or even Dremio (which could be great because capable of read from multiples DB).

- -

Thank you

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🎉 Minkyu Park, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-24 05:15:31
-
-

*Thread Reply:* It should be really easy to add additional databases. Basically, we'd need to know how to get namespace for that database: https://github.com/OpenLineage/OpenLineage/blob/main/integration/common/openlineage/common/provider/dbt.py#L467

- -

The first step should be to add SQLServer or Dremio to the dataset naming schema here https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Endrion - (yannick.endrion@gmail.com) -
-
2021-10-04 16:22:59
-
-

*Thread Reply:* Thank you @Maciej Obuchowski, -I tried to give it a try but without success yet. Not sure where I am suppose to add the sqlserver naming schema... -If you have any documentation that I could read I would be glad =) -Many thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 15:13:43
-
-

*Thread Reply:* This would be adding a paragraph similar to this one: https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md#snowflake

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 15:14:30
-
-

*Thread Reply:* Snowflake -See: Object Identifiers — Snowflake Documentation -Datasource hierarchy: -• account name -Naming hierarchy: -• Database: {database name} => unique across the account -• Schema: {schema name} => unique within the database -• Table: {table name} => unique within the schema -Identifier: -• Namespace: snowflake://{account name} - ◦ Scheme = snowflake - ◦ Authority = {account name} -• Name: {database}.{schema}.{table} - ◦ URI = snowflake://{account name}/{database}.{schema}.{table}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marty Pitt - (martypitt@vyne.co) -
-
2021-09-24 06:53:05
-
-

Hi all. I'm the Founder / CTO of a data discovery & transformation platform that captures very rich lineage information. We're interested in exposing / making our lineage data consumable via open standards, which is what lead me to this project. A couple of questions:

- -

A) Am I right in considering that's the goal of this project? -B) Are you also considering provedance as well as lineage? -C) What's a good starting point to understand the models we should be exposing our data in, to make it consumable?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marty Pitt - (martypitt@vyne.co) -
-
2021-09-24 07:06:20
-
-

*Thread Reply:* For clarity on the provedance vs lineage point (in case I'm using those terms incorrectly...)

- -

Our platform performs automated enrichment and processing of data. In doing so, we often make calls to functions or out to other data services (such as APIs, or SELECTs against databases). We capture the inputs that pass to these, along with the outputs. (And, if the input is derived from other outputs, we capture the full chain, right back to the root).

- -

That's the kinda stuff our customers are really interested in, and we feel like there's value in making is consumable.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-24 08:47:35
-
-

*Thread Reply:* Not sure I understand you right, but are you interested in tracking individual API calls, and for example, values of some parameters passed for one call?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-24 08:51:16
-
-

*Thread Reply:* I guess that's not in OpenLineage scope, as we're interested more in tracking metadata for whole datasets. But I might be wrong, some other people might chime in.

- -

We could of course model this situation, but that would capture for example schema of those parameters. Not their values.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-24 08:52:16
-
-

*Thread Reply:* I think this might be better suited for https://opentelemetry.io/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marty Pitt - (martypitt@vyne.co) -
-
2021-09-24 10:55:54
-
-

*Thread Reply:* Kinda, but not really. Telemetery data is metadata about the API calls. We have that, but it's not interesting to our customers. It's the metadata about the data that Vyne provides that we want to expose.

- -

Our customers use Vyne to fetch data from lots of different sources. Eg:

- -

> "Whenever a trade is booked, calculate it's compliance against these regulations, to report to the regulators". -or

- -

> "Whenever a customer buys a $thing, capture the transaction data, client data, and account data, and store it in this table." -Providing answers to those questions involves fetching and transforming data, before storing it, or outputting it. We capture all that data, on a per-attribute basis, so we can answer the question "how did we get this value?" That's the lineage information we want to publish.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-09-30 15:10:51
-
-

*Thread Reply:* The core OpenLineage model is documented at https://github.com/OpenLineage/OpenLineage/#core-model . The model is really focused on Jobs and Datasets. Jobs have Runs which have start and end times (typically scheduled start/end times as well) and read from and/or write to the target datasets. If your transformation chain fits within that model, then I think you can definitely record and share the lineage information with your customers. The existing implementations are all focused on batch data access, though streaming should be possible to capture as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Drew Bittenbender - (drew@salt.io) -
-
2021-09-29 11:10:29
-
-

Hello. I am trying the openlineage-airflow integration with Marquez as the backend and have 3 questions.

- -
  1. Does it only work for PostgresOperators?
  2. Which is the recommended integration: marquez-airflow or openlineage-airflow
  3. How do you enable more detailed logging? I tried OPENLINEAGELOGLEVEL and MARQUEZLOGLEVEL and neither seemed to affect logging. I assume this is logged to the airflow worker
  4. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Faouzi - (faouzi@dataroots.io) -
-
2021-09-29 13:46:59
-
-

*Thread Reply:* Hello @Drew Bittenbender!

- -

For your two first questions:

- -

• Yes right now only the PostgresOperator is integrated. I learnt it the hard way ^_^. Spent hours trying with MySQL. There were attempts to integrate with MySQL actually. If engineers do not integrate it I will allocate myself some time to try to implement other airflow db operators. -• Use the openlineage one. It is the recommended approach now.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Drew Bittenbender - (drew@salt.io) -
-
2021-09-29 13:49:41
-
-

*Thread Reply:* Thank you @Faouzi. Is there any documentation/best practices to write your own extractor, or is it "read the code"? We use the Python, Docker and SSH operators a lot. Maybe those don't fit into the lineage paradigm well, but want to give it a shot

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Faouzi - (faouzi@dataroots.io) -
-
2021-09-29 13:52:16
-
-

*Thread Reply:* To the best of my knowledge there is no documentation to guide through the design of your own extractor. So yes we need to read the code. Here a link where you can see how they did for postgre extractor and others. https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow/openlineage/airflow/extractors

- - - -
- 👍 Drew Bittenbender -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-30 05:08:53
-
-

*Thread Reply:* I think in case of "bring your own code" operators like Python or Docker ones, it might be better to use lineage_run_id macro and use openlineage-python library inside, instead of implementing extractor.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-09-30 15:14:47
-
-

*Thread Reply:* I think @Maciej Obuchowski is right here. The airflow integration will create the parent jobs, but to get the dataset input/output links, it's best to do that directly from the python/docker scripts. If you report the parent run id, Marquez will link the jobs together correctly

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 15:09:55
-
-

*Thread Reply:* To clarify on what airflow operators are supported out of the box: -• postgres -• bigquery -• snowflake -• Great expectations (with extra config) -See: https://github.com/OpenLineage/OpenLineage/blob/3a1ccbd854bbf202bbe6437bf81786cb01[…]ntegration/airflow/openlineage/airflow/extractors/extractors.py -Mysql is not at the moment. We should track it as an issue

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yuki Tannai - (tannai-yuki@dmm.com) -
-
2021-09-30 09:21:35
-
-

Hi there! -I’m trying to enhance the lineage functionality of a data infrastructure I’m working on. -All of the tools I found only visualize the relationships between tables before and after the transformation, but the DataHub RFC discusses Field Level Lineage, which I thought was close to the functionality I was looking for. -Does OpenLineage support the same functionality? -https://datahubproject.io/docs/rfc/active/1841-lineage/field_level_lineage/

-
-
datahubproject.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 15:03:40
-
-

*Thread Reply:* OpenLineage doesn’t have field level lineage yet. Here is the proposal for adding it: https://github.com/OpenLineage/OpenLineage/issues/148

-
- - - - - - - -
-
Labels
- proposal -
- -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
- 👀 Yuki Tannai, Ricardo Gaspar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 15:04:36
-
-

*Thread Reply:* Those two specs look compatible, so Datahub should be able to consume this lineage metadata in the future

- - - -
- 👍 Yuki Tannai -
- -
-
-
-
- - - - - -
-
- - - - -
- -
павел клопотюк - (klopotuk@gmail.com) -
-
2021-10-04 14:27:24
-
-

Hello, everyone. I'm trying to work with OL and Airflow 2.1.4 and it doesn't work. I found that OL is supported for Airflow 1.10.12++. Does it support Airflow 2.X.Y?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2021-10-04 15:38:47
-
-

*Thread Reply:* Hi! Airflow 2.x is currently in development - you can follow along with the progress here: -https://github.com/OpenLineage/OpenLineage/issues/205

-
- - - - - - - -
-
Assignees
- mobuchowski -
- -
-
Comments
- 5 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
павел клопотюк - (klopotuk@gmail.com) -
-
2021-10-05 03:01:54
-
-

*Thread Reply:* Thank you for your reply!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 15:02:23
-
-

*Thread Reply:* There should be a first version of Airflow 2.X support soon: https://github.com/OpenLineage/OpenLineage/pull/305 -We’re labelling it experimental because the config step might change as discussion in the airflow github evolve. It will track succesful jobs in its current state.

-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-04 23:14:26
-
-

Hi All, I’m working on openlineage-dbt integration with Marquez as backend. I want to integrate OL with DBT cloud, would you please help to provide steps that I need to follow?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-05 04:18:42
-
-

*Thread Reply:* Take a look at this: https://docs.getdbt.com/docs/dbt-cloud/dbt-cloud-api/metadata/metadata-overview

-
-
docs.getdbt.com
- - - - - - - - - - - - - - - - - -
- - - -
- ✅ SAM -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 14:58:24
-
-

*Thread Reply:* @SAM Let us know of your progress.

- - - -
- 👍 SAM -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-05 16:23:41
-
-

Hey folks 😊 -I’m trying to run dbt-ol with Redshift target, but I get the following error message -Traceback (most recent call last): - File "/usr/local/bin/dbt-ol", line 61, in &lt;module&gt; - main() - File "/usr/local/bin/dbt-ol", line 54, in main - events = processor.parse().events() - File "/usr/local/lib/python3.8/site-packages/openlineage/common/provider/dbt.py", line 97, in parse - self.extract_dataset_namespace(profile) - File "/usr/local/lib/python3.8/site-packages/openlineage/common/provider/dbt.py", line 368, in extract_dataset_namespace - self.dataset_namespace = self.extract_namespace(profile) - File "/usr/local/lib/python3.8/site-packages/openlineage/common/provider/dbt.py", line 382, in extract_namespace - raise NotImplementedError( -NotImplementedError: Only 'snowflake' and 'bigquery' adapters are supported right now. Passed redshift -I know that Redshift is not the best cloud DWH we can use… 😅 -But, still….do you have any plan to support it? -Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-05 16:41:30
-
-

*Thread Reply:* Hey, can you create ticket in OpenLineage repository? FWIW Redshift is very similar to postgres, so supporting it won't be hard.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-05 16:43:39
-
-

*Thread Reply:* Hey @Maciej Obuchowski 😊 -Yep, will do now! Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-05 16:46:26
-
-

*Thread Reply:* Well...will do tomorrow morning 😅

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-06 03:03:16
-
-

*Thread Reply:* Here’s the issue: https://github.com/OpenLineage/OpenLineage/issues/318

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 14:51:08
-
-

*Thread Reply:* Thanks a lot. I pulled it in the current project.

- - - -
- 👍 ale -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 05:48:28
-
-

*Thread Reply:* @Julien Le Dem @Maciej Obuchowski I’m not familiar with dbt-ol codebase, but I’m willing to help on this if you guys can give me a bit of guidance 😅

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 05:53:05
-
-

*Thread Reply:* @ale can you help us define naming schema for redshift, as we have for other databases? https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 05:53:21
-
-

*Thread Reply:* Sure!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 05:54:21
-
-

*Thread Reply:* will work on this today and I’ll try to submit a PR by EOD

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 06:36:12
-
-

*Thread Reply:* There you go https://github.com/OpenLineage/OpenLineage/pull/324

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 06:39:35
-
-

*Thread Reply:* Host would be something like -examplecluster.&lt;XXXXXXXXXXXX&gt;.<a href="http://us-west-2.redshift.amazonaws.com">us-west-2.redshift.amazonaws.com</a> -right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 07:13:51
-
-

*Thread Reply:* Yep, let me update the PR

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 07:27:42
-
-

*Thread Reply:* Done

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 07:31:40
-
-

*Thread Reply:* 🙌

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 07:35:30
-
-

*Thread Reply:* If you want to look at dbt integration itself, there are two things:

- -

We need to determine how Redshift adapter reports metrics https://github.com/OpenLineage/OpenLineage/blob/610a687bf69df2b52ec4ac4da80b4a05580e8d32/integration/common/openlineage/common/provider/dbt.py#L412

- -

And how we can create namespace and job name based on the job naming schema that you created: -https://github.com/OpenLineage/OpenLineage/blob/610a687bf69df2b52ec4ac4da80b4a05580e8d32/integration/common/openlineage/common/provider/dbt.py#L512

- -

One thing how to get this info is to run the dbt yourself and look at resulting metadata files - in target dir of the dbt directory

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 08:33:31
-
-

*Thread Reply:* I figured out how to generate the namespace. -But I can’t understand which of the JSON files is inspected for metrics. Is it run_results.json ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 09:48:50
-
-

*Thread Reply:* yes, run_results.json - it's different in bigquery and snowflake, so I presume it's different in redshift too

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 11:02:32
-
-

*Thread Reply:* Ok thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 11:11:57
-
-

*Thread Reply:* Should be stats:rows:value

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 11:19:59
-
-

*Thread Reply:* Regarding namespace: if env_var is used in profiles.yml , how is this handled now?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 11:44:50
-
-

*Thread Reply:* Well, it isn't. This is relevant only if you passed cluster hostname this way, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 11:53:52
-
-

*Thread Reply:* Exactly

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 07:10:38
-
-

*Thread Reply:* If you think it make sense, I can submit a PR to handle dbt profile with env_var

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:18:01
-
-

*Thread Reply:* Do you want to run jinja on the dbt profile?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:20:18
-
-

*Thread Reply:* Theoretically, we'd need to run it also on dbt_project.yml , but we only take target path and profile name from it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 07:20:32
-
-

*Thread Reply:* The env_var syntax in the profile is quite simple, I was thinking of extracting the env var name using re and then retrieving the value from os

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:23:59
-
-

*Thread Reply:* It would work, but we can actually use jinja - if you're using dbt, it's already included. -The method is pretty simple: -``` @contextmember - @staticmethod - def envvar(var: str, default: Optional[str] = None) -> str: - """The envvar() function. Return the environment variable named 'var'. - If there is no such environment variable set, return the default.

- -
    If the default is None, raise an exception for an undefined variable.
-    """
-    if var in os.environ:
-        return os.environ[var]
-    elif default is not None:
-        return default
-    else:
-        msg = f"Env var required but not provided: '{var}'"
-        undefined_error(msg)```
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 07:25:07
-
-

*Thread Reply:* Oh cool! -I will definitely use this one!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:25:09
-
-

*Thread Reply:* We'd be sure that our implementation matches dbt's one, right? Also, you'd support default method for free

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 07:26:34
-
-

*Thread Reply:* So this env_varmethod is defined in dbt and not in OpenLineage codebase, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:27:01
-
-

*Thread Reply:* yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:27:14
-
-

*Thread Reply:* dbt is on Apache license 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 07:28:06
-
-

*Thread Reply:* Should we import dbt package and use the method or should we just copy/paste the method inside OpenLineage codebase?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 07:28:28
-
-

*Thread Reply:* I’m asking for guidance here 😊

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:34:44
-
-

*Thread Reply:* I think we should just do basic jinja template rendering in our code like in the quick example: https://realpython.com/primer-on-jinja-templating/#quick-examples

- -

just with the env_var method passed to the render method 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:37:05
-
-

*Thread Reply:* basically, here in the code we should read the file, do the jinja render, and load yaml from string instead of straight from file -https://github.com/OpenLineage/OpenLineage/blob/610a687bf69df2b52ec4ac4da80b4a05580e8d32/integration/common/openlineage/common/provider/dbt.py#L176

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 07:38:53
-
-

*Thread Reply:* ok, got it. -Will try to implement following your suggestions. -Thanks @Maciej Obuchowski 🙌

- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 08:36:13
-
-

*Thread Reply:* We need to:

- -
  1. load the template profile from the profile.yml
  2. replace any env vars we found -For the first step, we can use jinja2.Template -However, to replace the env vars we find, we have to actually search for those env vars… 🤔
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 08:43:06
-
-

*Thread Reply:* The dbt method implements that: -``` @contextmember - @staticmethod - def envvar(var: str, default: Optional[str] = None) -> str: - """The envvar() function. Return the environment variable named 'var'. - If there is no such environment variable set, return the default.

- -
    If the default is None, raise an exception for an undefined variable.
-    """
-    if var in os.environ:
-        return os.environ[var]
-    elif default is not None:
-        return default
-    else:
-        msg = f"Env var required but not provided: '{var}'"
-        undefined_error(msg)```
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 08:45:54
-
-

*Thread Reply:* Ok, but I need to pass var to the env_var method. -And to pass the var value, I need to look into the loaded Template and search for env var names…

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 08:46:54
-
-

*Thread Reply:* that's what jinja does - you're passing function to jinja render, and it's calling it itself

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 08:47:45
-
-

*Thread Reply:* you can try the quick example from here, but just pass the env_var method (slightly adjusted - as a standalone function and without undefined error) and call it inside the template: https://realpython.com/primer-on-jinja-templating/#quick-examples

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 08:51:19
-
-

*Thread Reply:* Ok, will try

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 09:37:49
-
-

*Thread Reply:* I’m trying to run -pip install -e ".[dev]" -so that I can test my changes, but I get -ERROR: Could not find a version that satisfies the requirement openlineage-integration-common[dbt]==0.2.3 (from openlineage-dbt[dev]) (from versions: 0.0.1rc7, 0.0.1rc8, 0.0.1, 0.1.0rc5, 0.1.0, 0.2.0, 0.2.1, 0.2.2) -ERROR: No matching distribution found for openlineage-integration-common[dbt]==0.2.3 -I don’t understand what I’m doing wrong…

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 09:41:47
-
-

*Thread Reply:* can you try installing it manually?

- -

pip install openlineage-integration-common[dbt]==0.2.3

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 09:42:13
-
-

*Thread Reply:* I mean, it exists in pypi: https://pypi.org/project/openlineage-integration-common/#files

-
-
PyPI
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 09:44:57
-
-

*Thread Reply:* Yep, maybe it’s our internal Pypi repo which is not synced. -Installing from the public pypi resolved the issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 12:04:55
-
-

*Thread Reply:* Can;’t seem to make env_var working as the render method of a Template 😅

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 12:57:07
-
-

*Thread Reply:* try this:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 12:57:09
-
-

*Thread Reply:* ```import os -from typing import Optional -from jinja2 import Template

- -

def envvar(var: str, default: Optional[str] = None) -> str: - """The envvar() function. Return the environment variable named 'var'. - If there is no such environment variable set, return the default.

- -
If the default is None, raise an exception for an undefined variable.
-"""
-if var in os.environ:
-    return os.environ[var]
-elif default is not None:
-    return default
-else:
-    msg = f"Env var required but not provided: '{var}'"
-    raise Exception("")
-
- -

if name == 'main': - t = Template("Hello {{ envvar('ENVVAR') }}!") - print(t.render(envvar=envvar))```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 12:57:42
-
-

*Thread Reply:* works for me: -mobuchowski@thinkpad [18:57:14] [~] --&gt; % ENV_VAR=world python jinja_example.py -Hello world!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 16:59:13
-
-

*Thread Reply:* Finally 😅 -https://github.com/OpenLineage/OpenLineage/pull/328

- -

There are minimal tests for Redshift and env vars. -Feedbacks and suggestions are welcome!

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 03:10:45
-
-

*Thread Reply:* Hi @Maciej Obuchowski 😊 -Regarding this comment https://github.com/OpenLineage/OpenLineage/pull/328#discussion_r726586564

- -

How can we distinguish between snowflake, bigquery and redshift in this method?

- -

A simple, but not very clean solution, would be to split this -bytes = get_from_multiple_chains( - node.catalog_node, - [ - ['stats', 'num_bytes', 'value'], # bigquery - ['stats', 'bytes', 'value'], # snowflake - ['stats', 'size', 'value'] # redshift (Note: size = count of 1MB blocks) - ] - ) -into two pieces, one checking for snowflake and bigquery and the other checking for redshift.

- -

A better solution would be to have the profile type inside method node_to_output_dataset , but I’m struggling understanding how to do that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 05:35:00
-
-

*Thread Reply:* Well, why not do something like

- -

```bytes = getfrommultiple_chains(... rest of stuff)

- -

if adapter == 'redshift': - bytes = 10241024```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 05:36:49
-
-

*Thread Reply:* we can store adapter type in the class

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 05:38:47
-
-

*Thread Reply:* well, I've looked at last commit and that's exactly what you did 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 05:40:35
-
-

*Thread Reply:* Now, have you tested your branch on real redshift cluster? I don't think we 100% need automated tests for that now, but would be nice to have confirmation that it works.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 06:35:04
-
-

*Thread Reply:* Not yet, but I'll try to do that this afternoon. -Need to figure out how to build the lib locally, then I can use it to test with Redshift

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 06:40:58
-
-

*Thread Reply:* I think pip install -e .[dbt] in common directory should be enough

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 09:29:13
-
-

*Thread Reply:* I was able to run my local branch with my Redshift cluster and metadata is pushed to Marquez. -However, I’m not sure about the namespace . -I also see exceptions in Marquez logs

- -
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 09:33:26
-
-

*Thread Reply:* namespace: well, if it matches what you put into your profile, there's not much we can do. I don't understand why you connect to redshift via host, maybe this is related to IAM?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 09:44:17
-
-

*Thread Reply:* I think the marquez error is because we don't send SourceCodeLocationJobFacet

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 09:46:17
-
-

*Thread Reply:* Regarding the namespace, I will check it and figure it out 😊 -Regarding the error: in the context of this PR, is it something I should worry about or not?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 09:54:17
-
-

*Thread Reply:* I think not in the context of the PR. It certainly deserves separate issue in Marquez repository.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 10:24:38
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 10:24:51
-
-

*Thread Reply:* Is there anything else I can do to improve the PR?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 10:27:44
-
-

*Thread Reply:* did you figure out the namespace stuff? -I think it's ready to be merged outside of that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 10:49:06
-
-

*Thread Reply:* Not yet

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 10:58:07
-
-

*Thread Reply:* Ok i figured it out. -When running dbt locally, we connect to Redshift using an SSH tunnel. -dbt runs on Docker, hence it can access the tunnel using host.docker.internal

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 10:58:16
-
-

*Thread Reply:* So the namespace is correct

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 11:04:12
-
-

*Thread Reply:* Makes sense. So, let's merge it, after DCO bot gets up again.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 11:04:37
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-13 05:29:48
-
-

*Thread Reply:* merged your PR 🙌

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-13 10:54:09
-
-

*Thread Reply:* 🎉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-13 12:01:20
-
-

*Thread Reply:* I think I'm going to change it up a bit. -The problem is that we can try to render jinja everywhere, including comments. -I tried to make it skip unknown methods and values here, but I think the right solution is to load the yaml, and then try to render jinja for values.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-13 14:27:37
-
-

*Thread Reply:* Ok sounds good to me!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-06 10:50:43
-
-

Hey there, I’m not sure why I’m getting below error, after I ran OPENLINEAGE_URL=<http://localhost:5000> dbt-ol run , although running this command dbt debug doesn’t show any error. Pls help.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-06 10:54:32
-
-

*Thread Reply:* Does it work with simply dbt run?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-06 10:55:51
-
-

*Thread Reply:* also, do you have dbt-snowflake installed?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-06 11:00:42
-
-

*Thread Reply:* it works with dbt run

- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-06 11:01:22
-
-

*Thread Reply:* no i haven’t installed dbt-snowflake

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-06 12:04:19
-
-

*Thread Reply:* what the dbt says - the snowflake profile with dev target - is that what you ment to run or was it something else?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-06 12:04:46
-
-

*Thread Reply:* it feels very weird to me, since the dbt-ol script just runs dbt run underneath

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-06 12:19:27
-
-

*Thread Reply:* this is my profiles.yml file: -```snowflake: - target: dev - outputs: - dev: - type: snowflake - account: xxxxxxx

- -
  # User/password auth
-  user: xxxxxx
-  password: xxxxx
-
-  role: poc_db_temp_fullaccess
-  database: POC_DB
-  warehouse: poc_wh
-  schema: temp
-  threads: 2
-  client_session_keep_alive: False
-  query_tag: dbt_ol```
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-06 12:26:39
-
-

*Thread Reply:* Yes, it looks that everything is okay on your side...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-06 12:28:19
-
-

*Thread Reply:* may be I’ll restart my machine and try again

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-06 12:30:25
-
-

*Thread Reply:* can you try -OPENLINEAGE_URL=<http://localhost:5000> dbt-ol debug

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-07 05:59:03
-
-

*Thread Reply:* Actually i had to use venv that fixed above issue. However, i ran into another problem which is no jobs / datasets found in marquez:

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-07 06:00:28
-
-

*Thread Reply:* Good that you fixed that one 🙂 Regarding last one, I've found it independently yesterday and PR fixing it is already waiting for review: https://github.com/OpenLineage/OpenLineage/pull/322

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-07 06:00:46
-
-

*Thread Reply:* oh, thanks a lot

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 14:50:01
-
-

*Thread Reply:* There will be a release soon: https://openlineage.slack.com/archives/C01CK9T7HKR/p1633631825147900

-
- - -
- - - } - - Willy Lulciuc - (https://openlineage.slack.com/team/U01DCMDFHBK) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 SAM -
- -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-07 23:23:26
-
-

*Thread Reply:* Hi, -openlineage-dbt==0.2.3 worked, thanks a lot for the quick fix.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Alex P - (alexander.pelivan@scout24.com) -
-
2021-10-07 07:46:16
-
-

Hi, I just started playing around with Marquez. When submitting some lineage data, after some experimenting, the visualisation becomes a bit cluttered with all the naive attempts of building a meaningful graph. Can I clear this up somehow? Or is there a tip, how to hide certain information?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Alex P - (alexander.pelivan@scout24.com) -
-
2021-10-07 07:46:59
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Alex P - (alexander.pelivan@scout24.com) -
-
2021-10-07 09:51:40
-
-

*Thread Reply:* So, as a quick fix, shutting down and re-starting the docker container resets everything. -./docker/up.sh

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-07 12:28:25
-
-

*Thread Reply:* I guess that it's the easiest way now. There should be API for that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-07 14:09:50
-
-

*Thread Reply:* @Alex P Yeah, we're realizing that being able to delete metadata is becoming very important. And, as @Maciej Obuchowski mentioned, dropping your entire database is the only way currently (not ideal!). We do have an issue in the Marquez backlog to expose delete APIs: https://github.com/MarquezProject/marquez/issues/754

-
- - - - - - - -
-
Labels
- feature, api -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-07 14:10:36
-
-

*Thread Reply:* A bit more discussion is needed though. Like what if a dataset is deleted, but you still want to keep track that it existed at some point? (i.e. soft vs hard deletes). But, for the case that you just want to clear metadata because you were testing things out, then yeah, that's more obvious and requires little discussion of the API upfront.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-07 14:12:52
-
-

*Thread Reply:* @Alex P I moved the delete APIs to the Marquez 0.20.0 release

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 14:39:03
-
-

*Thread Reply:* Thanks Willy.

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 14:48:37
-
-

*Thread Reply:* I have also updated a corresponding issue to track this in OpenLineage: https://github.com/OpenLineage/OpenLineage/issues/323

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 13:36:48
-
-

The next OpenLineage monthly meeting is on the 13th. https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting -please chime in here if you’d like a topic to be added to the agenda

- - - -
- 🙌 Willy Lulciuc, Maciej Obuchowski, Peter Hicks -
- -
- ❤️ Willy Lulciuc, Maciej Obuchowski, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 10:47:49
-
-

*Thread Reply:* Reminder that the meeting is today. See you soon

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 19:49:21
-
-

*Thread Reply:* The recording and notes of the meeting are now available: -https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting#MonthlyTSCmeeting-Oct13th2021

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-07 14:37:05
-
-

@channel: We’ve recently become aware that our integration with dbt no longer works with the latest dbt manifest version (v3), see original discussion. The manifest version change was introduced in dbt 0.21 , see diff. That said, we do have a fix: PR #322 contributed by @Maciej Obuchowski! Here’s our plan to rollout the openlineage-dbt hotfix for those using the latest version of dbt (NOTE: for those using an older dbt version, you will NOT not be affected by this bug):

- -

Releasing OpenLineage 0.2.3 with dbt v3 manifest support:

- -
  1. Branch off 0.2.2 tagged commit, and create a openlineage-0.2.x branch
  2. Cherry pick the commit with the dbt manifest v3 fix
  3. Release 0.2.3 batch release -We will be releasing 0.2.3 today. Please reach out to us with any questions!
  4. -
-
- - -
- - - } - - Samjhana Khettri - (https://openlineage.slack.com/team/U02EYPQNU58) -
- - - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 Mario Measic, Minkyu Park, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 14:55:35
-
-

*Thread Reply:* For people following along, dbt changed the schema of its metadata which broke the openlineage integration. However we were a bit too stringent on validating the schema version (they increment it every time event if it’s backwards compatible, which it is in this case). We will fix that so that future compatible changes don’t prevent the ol integration to work.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-07 16:44:28
-
-

*Thread Reply:* As one of the main integrations, would be good to connect more within the dbt community for the next releases, by testing the release candidates 👍

- -

Thanks for the PR

- - - -
- 💯 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-07 16:46:40
-
-

*Thread Reply:* Yeah, I totally agree with you. We also should be more proactive and also be more aware in what’s coming in future dbt releases. Sorry if you were effected by this bug :ladybug:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-07 18:12:22
-
-

*Thread Reply:* We’ve release OpenLineage 0.2.3 with the hotfix for adding dbt v3 manifest support, see https://github.com/OpenLineage/OpenLineage/releases/tag/0.2.3

- -

You can download and install openlineage-dbt 0.2.3 with the fix using:

- -

$ pip3 install openlineage-dbt==0.2.3

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Drew Bittenbender - (drew@salt.io) -
-
2021-10-07 19:02:37
-
-

Hello. I have a question about dbt-ol. I run dbt in a docker container and alias the dbt command to execute in that docker container. dbt-ol doesn't seem to use that alias. Do you know of a way to force it to use the alias?...or is there an alternative to getting the linage into Marquez?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 21:10:36
-
-

*Thread Reply:* @Maciej Obuchowski might know

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 04:23:17
-
-

*Thread Reply:* @Drew Bittenbender dbt-ol always calls dbt command now, without spawning shell - so it does not have access to bash aliases.

- -

Can you elaborate about your use case? Do you mean that dbt in your path does docker run or something like this? It still might be a problem if we won't have access to artifacts generated by dbt in target directory.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Drew Bittenbender - (drew@salt.io) -
-
2021-10-08 10:59:32
-
-

*Thread Reply:* I am running on a mac and I have aliased (.zshrc) dbt to execute docker run against the fishtownanalytics docker image rather than installing dbt natively (homebrew, etc). I am doing this so that the dbt configuration is portable and reusable by others.

- -

It seems that by installing openlineage-dbt in a virtual environment, it pulls down it's own version of dbt which it calls inline rather than shelling out and executing the dbt setup resident in the host system. I understand that opening a shell is a security risk so that is understandable.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 11:05:00
-
-

*Thread Reply:* It does not pull down, it just assumes that it's in the system. It would fail if it isn't.

- -

For now I think you could build your own image based on official one, and install openlineage-dbt inside, something like:

- -

FROM fishtownanalytics/dbt:0.21.0 -RUN pip install openlineage-dbt -ENTRYPOINT ["dbt-ol"]

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 11:05:15
-
-

*Thread Reply:* and then pass OPENLINEAGE_URL in env while doing docker run

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 11:06:55
-
-

*Thread Reply:* Also, to make sure that using shell would help in your case: do you bind mount your dbt directory to home? dbt-ol can't run without access to dbt's target directory, so if it's not visible in host, the only option is to have dbt-ol in container.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-08 07:00:43
-
-

Hi, I found below issues, not sure what is the root-cause:

- -
  1. Marquez UI does not show any jobs/datasets, but if I search my table name then only it shows in search result section.
  2. After running dbt docs generate there is not schema information available in marquez?
  3. -
- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 08:16:37
-
-

*Thread Reply:* Regarding 2), the data is only visible after next dbt-ol run - dbt docs generate does not emit events itself, but generates data that run take into account.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-08 08:24:57
-
-

*Thread Reply:* oh got it, since its in default, i need to click on it and choose my dbt profile’s account name. thnx

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-08 11:25:22
-
-

*Thread Reply:* May I know, why these highlighted ones dont have schema? FYI, I used sources in dbt.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 11:26:18
-
-

*Thread Reply:* Do they have it in dbt docs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-08 11:33:59
-
-

*Thread Reply:* I prepared this yaml file, not sure this is what u asked

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 04:14:08
-
-

Hey folks 😊 -DCO checks on this PR https://github.com/OpenLineage/OpenLineage/pull/328 seem to be stuck. -Any suggestions on how to unblock it?

- -

Thanks!

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 07:21:33
-
-

*Thread Reply:* I don't think anything is wrong with your branch. It's also not working on my one. Maybe it's globally stuck?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mark Taylor - (marktayl@microsoft.com) -
-
2021-10-12 15:17:02
-
-

We are working on the hackathon and have a couple of questions about generating lineage information. @Willy Lulciuc would you have time to help answer a couple of questions?

- -

• Is there a way to generate OpenLineage output that contains a mapping between input and output fields? -• In Azure Databricks sources often map to ADB mount points. We are looking for a way to translate this into source metadata in the OL output. Is there some configuration that would make this possible, or any other suggestions?

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-12 15:50:20
-
-

*Thread Reply:* > Is there a way to generate OpenLineage output that contains a mapping between input and output fields? -OpenLineage defines discrete classes for both OpenLineage.InputDataset and OpenLineage.OutputDataset datasets. But, for clarification, are you asking:

- -
  1. If a job reads / writes to the same dataset, how can OpenLineage track which fields were used in job’s logic as input and which fields were used to write back to the resulting output?
  2. Or, if a job reads / writes from two different dataset, how can OpenLineage track which input fields were used in the job’s logic for the resulting output dataset? (i.e. column-level lineage)
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-12 15:56:18
-
-

*Thread Reply:* > In Azure Databricks sources often map to ADB mount points.  We are looking for a way to translate this into source metadata in the OL output.  Is there some configuration that would make this possible, or any other suggestions? -I would look into our OutputDatasetVisitors class (as a starting point) that extracts metadata from the spark logical plan to construct a mapping between a logic plan to one or more OpenLineage.Dataset for the spark job. But, I think @Michael Collado will have a more detailed suggestion / approach to what you’re asking

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-10-12 15:59:41
-
-

*Thread Reply:* are the sources mounted like local filesystem mounts? are you ending up with datasources that point to the local filesystem rather than some dbfs url? (sorry, I'm not familiar with databricks or azure at this point)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mark Taylor - (marktayl@microsoft.com) -
-
2021-10-12 16:59:38
-
-

*Thread Reply:* I think under the covers they are an os level fs mount, but it is using an ADB specific api, dbutils.fs.mount. It is using the ADB filesystem.

-
-
docs.microsoft.com
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-10-12 17:01:23
-
-

*Thread Reply:* Do you use the dbfs scheme to access the files from Spark as in the example on that page? -df = spark.read.text("dbfs:/mymount/my_file.txt")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mark Taylor - (marktayl@microsoft.com) -
-
2021-10-12 17:04:52
-
-

*Thread Reply:* @Willy Lulciuc In our project, @Will Johnson had generated some sample OL output from just reading in and writing out a dataset to blob storage. In the resulting output, I see the columns represented as fields under the schema element with a set represented for output and another for input. I would need the mapping of in and out columns to generate column level lineage so wondering if it is possible to get or am I just missing it somewhere? Thanks for your help!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-12 17:26:35
-
-

*Thread Reply:* Ahh, well currently, no, but it has been discussed and on the OpenLineage roadmap. Here’s a proposal opened by @Julien Le Dem, column level lineage facet, that starts the discussion to add the columnLineage face to the datasets model in order to support column-level lineage. Would be great to get your thoughts!

-
- - - - - - - -
-
Labels
- proposal -
- -
-
Comments
- 3 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-10-12 17:41:41
-
-

*Thread Reply:* @Michael Collado - Databricks allows you to reference a file called /mnt/someMount/some/file/path The way you have referenced it would let you hit the file with local file system stuff like pandas / local python.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 17:49:37
-
-

*Thread Reply:* For column level lineage, you can add your own custom facets: Here’s an example in the Spark integration: (LogicalPlanFacet) https://github.com/OpenLineage/OpenLineage/blob/5f189a94990dad715745506c0282e16fd8[…]openlineage/spark/agent/lifecycle/SparkSQLExecutionContext.java -Here is the paragraph about this in the spec: https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md#custom-facet-naming

- -
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 17:51:24
-
-

*Thread Reply:* This example adds facets to the run, but you can also add them to the job

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-10-12 17:52:46
-
-

*Thread Reply:* unfortunately, there's not yet a way to add your own custom facets to the spark integration- there's some work on extensibility to be done

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-10-12 17:54:07
-
-

*Thread Reply:* for the hackathon's sake, you can check out the package and just add in whatever you want

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-10-12 18:26:44
-
-

*Thread Reply:* Thank you guys!!

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-10-12 20:42:20
-
-

Question on the Spark Integration and its SPARKCONFURL_KEY configuration variable.

- -

https://github.com/OpenLineage/OpenLineage/blob/8afc4ff88b8dd8090cd9c45061a9f669fe[…]rk/src/main/java/io/openlineage/spark/agent/ArgumentParser.java

- -

It looks like I can pass in any url but I'm not sure if I can pass in query parameters along with that URL. For example, if I had https://localhost/myendpoint?secret_code=123 I THINK that is used for the endpoint and it does not append /lineage to the end of the url. Is that a fair assessment of what happens when the url is provided?

- -

Thank you for any guidance!

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 21:46:12
-
-

*Thread Reply:* You can also pass the settings independently if you want something more flexible: https://github.com/OpenLineage/OpenLineage/blob/8afc4ff88b8dd8090cd9c45061a9f669fe[…]n/java/io/openlineage/spark/agent/OpenLineageSparkListener.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 21:47:36
-
-

*Thread Reply:* SparkSession.builder() - .config("spark.jars.packages", "io.openlineage:openlineage_spark:0.2.+") - .config("spark.extraListeners", "io.openlineage.spark.agent.OpenLineageSparkListener") - .config("spark.openlineage.host", "<https://localhost>") - .config("spark.openlineage.apiKey", "your api key") - .config("spark.openlineage.namespace", "&lt;NAMESPACE_NAME&gt;") // Replace with the name of your Spark cluster. - .getOrCreate()

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 21:48:57
-
-

*Thread Reply:* It is going to add /lineage in the end: https://github.com/OpenLineage/OpenLineage/blob/8afc4ff88b8dd8090cd9c45061a9f669fe[…]rc/main/java/io/openlineage/spark/agent/OpenLineageContext.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 21:49:37
-
-

*Thread Reply:* the apiKey setting is sent in an “Authorization” header

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 21:49:55
-
-

*Thread Reply:* “Bearer $KEY”

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 21:51:09
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/blob/a6eea7a55fef444b6561005164869a9082[…]n/java/io/openlineage/spark/agent/client/OpenLineageClient.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-10-12 22:54:22
-
-

*Thread Reply:* Thank you @Julien Le Dem it seems in both cases (defining the url endpoint with spark.openlineage.url and with the components: spark.openlineage.host / openlineage.version / openlineage.namespace / etc.) OpenLineage will strip out url parameters and rebuild the url endpoint with /lineage.

- -

I think we might need to add in a url parameter configuration for our hackathon. We're using a bit of serverless code to shuttle open lineage events to a queue so that another job and/or serverless application can read that queue at its leisure.

- -

Using the apiKey that feeds into the Authorization header as a Bearer token is great and would suffice but for our services we use OAuth tokens that would expire after two hours AND most of our customers wouldn't want to generate an access token themselves and feed it to Spark. ☹️

- -

Would you guys entertain a proposal to support a spark.openlineage.urlParams configuration variable that lets you add url parameters to the derived lineage url?

- -

Thank you for the detailed replies and deep links!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 10:46:22
-
-

*Thread Reply:* Yes, please open an issue detailing the use case.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-10-13 13:02:06
-
-

Quick question, is it expected, when using Spark SQL and the Spark Integration for Spark3 that we receive and INPUT but no OUTPUTS when doing a CREATE TABLE ... AS SELECT ... .

- -

I'm reading from a Spark SQL table (underlying CSV) and then writing it to a DELTA lake table.

- -

I get a COMPLETE event type with an INPUT but no OUTPUT and then I get an exception for the AsyncEvent Queue but I'm guessing it's unrelated 😅

- -

21/10/13 15:38:15 INFO OpenLineageContext: Lineage completed successfully: ResponseMessage(responseCode=200, body=null, error=null) {"eventType":"COMPLETE","eventTime":"2021-10-13T15:38:15.878Z","run":{"runId":"2cfe52b3-e08f-4888-8813-ffcdd2b27c89","facets":{"spark_unknown":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.2.3-SNAPSHOT/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","output":{"description":{"@class":"org.apache.spark.sql.catalyst.plans.logical.Project","traceEnabled":false,"streaming":false,"cacheId":null,"canonicalizedPlan":false},"inputAttributes":[{"name":"id","type":"long","metadata":{}}],"outputAttributes":[{"name":"id","type":"long","metadata":{}},{"name":"action_date","type":"date","metadata":{}}]},"inputs":[{"description":{"@class":"org.apache.spark.sql.catalyst.plans.logical.Range","streaming":false,"traceEnabled":false,"cacheId":null,"canonicalizedPlan":false},"inputAttributes":[],"outputAttributes":[{"name":"id","type":"long","metadata":{}}]}]},"spark.logicalPlan":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.2.3-SNAPSHOT/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","plan":[{"class":"org.apache.spark.sql.catalyst.plans.logical.Project","num-children":1,"projectList":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num_children":0,"name":"id","dataType":"long","nullable":false,"metadata":{},"exprId":{"product_class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":111,"jvmId":"4bdfd808-97d5-455f-ad6a-a3b29855e85b"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.Alias","num-children":1,"child":0,"name":"action_date","exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":113,"jvmId":"4bdfd808_97d5_455f_ad6a_a3b29855e85b"},"qualifier":[],"explicitMetadata":{},"nonInheritableMetadataKeys":"[__dataset_id, __col_position]"},{"class":"org.apache.spark.sql.catalyst.expressions.CurrentDate","num_children":0,"timeZoneId":"Etc/UTC"}]],"child":0},{"class":"org.apache.spark.sql.catalyst.plans.logical.Range","num-children":0,"start":0,"end":5,"step":1,"numSlices":8,"output":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num_children":0,"name":"id","dataType":"long","nullable":false,"metadata":{},"exprId":{"product_class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":111,"jvmId":"4bdfd808-97d5-455f-ad6a-a3b29855e85b"},"qualifier":[]}]],"isStreaming":false}]}}},"job":{"namespace":"sparknamespace","name":"databricks_shell.project"},"inputs":[],"outputs":[],"producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.2.3-SNAPSHOT/integration/spark>","schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunEvent>"} -21/10/13 15:38:16 INFO FileSizeAutoTuner: File size tuning result: {"tuningType":"autoTuned","tunedConfs":{"spark.databricks.delta.optimize.minFileSize":"268435456","spark.databricks.delta.optimize.maxFileSize":"268435456"}} -21/10/13 15:38:16 INFO FileFormatWriter: Write Job e062f36c-8b9d-4252-8db9-73b58bd67b15 committed. -21/10/13 15:38:16 INFO FileFormatWriter: Finished processing stats for write job e062f36c-8b9d-4252-8db9-73b58bd67b15. -21/10/13 15:38:18 INFO CodeGenerator: Code generated in 253.294028 ms -21/10/13 15:38:18 INFO SparkContext: Starting job: collect at DataSkippingReader.scala:430 -21/10/13 15:38:18 INFO DAGScheduler: Job 1 finished: collect at DataSkippingReader.scala:430, took 0.000333 s -21/10/13 15:38:18 ERROR AsyncEventQueue: Listener OpenLineageSparkListener threw an exception -java.lang.NullPointerException - at io.openlineage.spark.agent.OpenLineageSparkListener.onJobEnd(OpenLineageSparkListener.java:167) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:39) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:119) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:103) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1547) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 17:54:22
-
-

*Thread Reply:* This is because this specific action is not covered yet. You can see the “spark_unknown” facet is describing things that are not understood yet -run": { -... - "facets": { - "spark_unknown": { -... - "output": { - "description": { - "@class": "org.apache.spark.sql.catalyst.plans.logical.Project", - "traceEnabled": false, - "streaming": false, - "cacheId": null, - "canonicalizedPlan": false - },

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 17:54:43
-
-

*Thread Reply:* I think this is part of the Spark 3 gap

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 17:55:46
-
-

*Thread Reply:* an unknown output will cause missing output lineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 18:05:57
-
-

*Thread Reply:* Output handling is here: https://github.com/OpenLineage/OpenLineage/blob/e0f1852422f325dc019b0eab0e466dc905[…]io/openlineage/spark/agent/lifecycle/OutputDatasetVisitors.java

-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-10-13 22:49:08
-
-

*Thread Reply:* Ah! Thank you so much, Julien! This is very helpful to understand where that is set. This is a big gap that we want to help address after our hackathon. Thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 20:09:17
-
-

Following up on the meeting this morning, I have created an issue to formalize a design doc review process: https://github.com/OpenLineage/OpenLineage/issues/336 -If that sounds good I’ll create the first doc to describe this as a PR. (how meta!)

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 20:13:02
-
-

*Thread Reply:* the github wiki is backed by a git repo but it does not allow PRs. (people do hacks but I’d rather avoid those)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-18 10:24:25
-
-

We're discussing creating Transport abstraction for OpenLineage clients, that would allow us creating better experience for people that expect to be able to emit their events using something else than http interface. Please tell us what you think of proposed mechanism - encouraging emojis are helpful too 😉 -https://github.com/OpenLineage/OpenLineage/pull/344

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-18 20:57:04
-
-

OpenLineage release 0.3 is coming. Please chiming if there’s anything blocker that should go in the release: https://github.com/OpenLineage/OpenLineage/projects/4

- - - -
- ❤️ Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Carlos Quintas - (cdquintas@gmail.com) -
-
2021-10-19 06:36:05
-
-

👋 Hi everyone!

- - - -
- 👋 Ross Turk, Willy Lulciuc, Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Carlos Quintas - (cdquintas@gmail.com) -
-
2021-10-22 05:38:14
-
-

openlineage with DBT and Trino, is there any forecast?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-22 05:44:17
-
-

*Thread Reply:* Maybe you want to contribute it? -It's not that hard, mostly testing, and figuring out what would be the naming of openlineage namespace for Trino, and how some additional statistics work.

- -

For example, recently we had added support for Redshift by community member @ale

- -

https://github.com/OpenLineage/OpenLineage/pull/328

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Carlos Quintas - (cdquintas@gmail.com) -
-
2021-10-22 05:42:52
-
-

Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5 -Traceback (most recent call last): - File "/home/labuser/.local/bin/dbt-ol", line 61, in <module> - main() - File "/home/labuser/.local/bin/dbt-ol", line 54, in main - events = processor.parse().events() - File "/home/labuser/.local/lib/python3.8/site-packages/openlineage/common/provider/dbt.py", line 98, in parse - self.extractdatasetnamespace(profile) - File "/home/labuser/.local/lib/python3.8/site-packages/openlineage/common/provider/dbt.py", line 377, in extractdatasetnamespace - self.datasetnamespace = self.extractnamespace(profile) - File "/home/labuser/.local/lib/python3.8/site-packages/openlineage/common/provider/dbt.py", line 391, in extract_namespace - raise NotImplementedError( -NotImplementedError: Only 'snowflake' and 'bigquery' adapters are supported right now. Passed trino

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-10-22 12:41:08
-
-

Hey folks, we've released OpenLineage 0.3.1. There are quite a few changes, including doc improvements, Redshift support in dbt, bugfixes, a new server-side client code base, but the real highlights are

- -
  1. Official Spark 3 support- this is still a work in progress (the whole Spark integration is), but the big deal is we've split the source tree to support both Spark 2 and Spark 3 specific plan visitors. This will enable us to work with the Spark 3 API explicitly and to add support for those interfaces and classes that didn't exist in Spark 2. We're also running all integration tests against both Spark 2.4.7 and Spark 3.1.0
  2. Airflow 2 support- also a work in progress, but we have a new LineageBackend implementation that allows us to begin tracking lineage for successful Airflow 2 DAGs. We're working to support failure notifications so we can also trace failed jobs. The LineageBackend can also be enabled in Airflow 1.10.X to improve the reporting of task completion times. -Check the READMEs for more details and to get started with the new features. Thanks to @Maciej Obuchowski , @Oleksandr Dvornik, @ale, and @Willy Lulciuc for their contributions. See the full changelog
  3. -
- - - -
- 🎉 Willy Lulciuc, Maciej Obuchowski, Minkyu Park, Ross Turk, Peter Hicks, RamanD, Ry Walker -
- -
- 🙌 Willy Lulciuc, Maciej Obuchowski, Minkyu Park, Will Johnson, Ross Turk, Peter Hicks, Ry Walker -
- -
- 🔥 Ry Walker -
- -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-10-28 07:27:12
-
-

Hello community. I am starting using marquez. I try to connect dbt with Marquez, but the spark adapter is not yet available.

- -

Are you planning to implement this spark dbt adapter in next openlineage versions?

- -

NotImplementedError: Only 'snowflake', 'bigquery', and 'redshift' adapters are supported right now. Passed spark -In my company we are starting to use as well the athena dbt adapter. Are you planning to implement this integration? Thanks a lot community

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-28 12:20:27
-
-

*Thread Reply:* That would make sense. I think you are the first person to request this. Is this something you would want to contribute to the project?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-10-28 17:37:53
-
-

*Thread Reply:* I would like to Julien, but not sure how can I do it. Could you guide me how can i start? or show me other integration.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Matthew Mullins - (mmullins@aginity.com) -
-
2021-10-31 07:57:55
-
-

*Thread Reply:* @David Virgil look at the pull request for the addition of Redshift as a starting guide. https://github.com/OpenLineage/OpenLineage/pull/328

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 12:01:41
-
-

*Thread Reply:* Thanks @Matthew Mullins I ll try to add dbt spark integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-28 09:31:01
-
-

Hey folks, quick question, are we able to run dbt-ol without providing OPENLINEAGE_URL? I find it quite limiting that I need to have a service set up in order to emit/generate OL events/messages. Is there a way to just output them to the console?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-28 10:05:09
-
-

*Thread Reply:* OK, was changed here: https://github.com/OpenLineage/OpenLineage/pull/286

- -

Did you think about this?

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-28 12:19:27
-
-

*Thread Reply:* In Marquez there was a mechanism to do that. Something like OPENLINEAGE_BACKEND=HTTP|LOG

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-28 13:56:42
-
-

*Thread Reply:* @Mario Measic We're going to add Transport mechanism, that will address use cases like yours. Please comment on this PR what would you expect: https://github.com/OpenLineage/OpenLineage/pull/344

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- 👀 Mario Measic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-28 15:29:50
-
-

*Thread Reply:* Nice, thanks @Julien Le Dem and @Maciej Obuchowski.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-28 15:46:45
-
-

*Thread Reply:* Also, dbt build is not working which is kind of the biggest feature of the version 0.21.0, I will try testing the code with modifications to the https://github.com/OpenLineage/OpenLineage/blob/c3aa70e161244091969951d0da4f37619bcbe36f/integration/dbt/scripts/dbt-ol#L141

- -

I guess there's a reason for it that I didn't see since you support v3 of the manifest.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-29 03:45:27
-
-

*Thread Reply:* Also, is it normal not to see the column descriptions for the model/table even though these are provided in the YAML file, persisted in Redshift and also dbt docs generate has been run before dbt-ol run?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-29 04:26:22
-
-

*Thread Reply:* Tried with dbt versions 0.20.2 and 0.21.0, openlineage-dbt==0.3.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-29 10:39:10
-
-

*Thread Reply:* I'll take a look at that. Supporting descriptions might be simple, but dbt build might be a little larger task.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-01 19:12:01
-
-

*Thread Reply:* I opened a ticket to track this: https://github.com/OpenLineage/OpenLineage/issues/376

-
- - - - - - - - - - - - - - - - -
- - - -
- 👀 Mario Measic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 05:48:06
-
-

*Thread Reply:* The column description issue should be fixed here: https://github.com/OpenLineage/OpenLineage/pull/383

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-28 12:27:17
-
-

I’m looking for feedback on my proposal to improve the proposal process ! https://github.com/OpenLineage/OpenLineage/issues/336

-
- - - - - - - -
-
Assignees
- wslulciuc, mobuchowski, mandy-chessell, collado-mike -
- -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-28 18:49:12
-
-

Hey guys - just an update on my prefect PR (https://github.com/OpenLineage/OpenLineage/pull/293) - there a little spiel on the ticket but I've closed that PR in favour of opening a new one. Prefect have just release a 2.0a technical preview, which they would like to make stable near the start of next year. I think it makes sense to target this release, and I've had one of the prefect team reach out and is keen to get some sort of lineage implemented in prefect.

- - - -
- 👍 Kevin Kho, Maciej Obuchowski, Willy Lulciuc, Michael Collado, Julien Le Dem, Thomas Fredriksen -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-28 18:51:10
-
-

*Thread Reply:* If anyone has any questions or comments - happy to discuss here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-28 18:51:15
-
-

*Thread Reply:* @davzucky

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-28 23:01:29
-
-

*Thread Reply:* Thanks for updating the community, Brad!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
davzucky - (davzucky@hotmail.com) -
-
2021-10-28 23:47:02
-
-

*Thread Reply:* Than you Brad. Looking forward to see how to integrated that with v2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Kho - (kdykho@gmail.com) -
-
2021-10-28 18:53:23
-
-

Hello, joining here from Prefect. Because of community requests from users like Brad above, we are looking to implement lineage for Prefect this quarter. Good to meet you all!

- - - -
- ❤️ Minkyu Park, Faouzi, John Thomas, Maciej Obuchowski, Kevin Mellott, Thomas Fredriksen -
- -
- 👍 Minkyu Park, Faouzi, John Thomas -
- -
- 🙌 Michael Collado, Faouzi, John Thomas -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-28 18:54:56
-
-

*Thread Reply:* Welcome, @Kevin Kho 👋. Really excited to see this integration kick off! 💯🚀

- - - -
- 👍 Kevin Kho, Maciej Obuchowski, Peter Hicks, Faouzi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 12:03:14
-
-

Hello,

- -

i am integratng openLineage with Airflow 2.2.0

- -

Do you consider in the future airflow manual inlets and outlets?

- -

Seeing the documentation I can see that is not possible.

- -

OpenLineageBackend does not take into account manually configured inlets and outlets. -Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-01 12:23:11
-
-

*Thread Reply:* While it’s not something we’re supporting at the moment, it’s definitely something that we’re considering!

- -

If you can give me a little more detail on what your system infrastructure is like, it’ll help us set priority and design

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 13:57:34
-
-

*Thread Reply:* So basic architecture of a datalake. We are using airflow to trigger jobs. Every job is a pipeline that runs a spark job (in our case it spin up an EMR). So the idea of lineage would be defining in the dags inlets and outlets based on the airflow lineage:

- -

https://airflow.apache.org/docs/apache-airflow/stable/lineage.html

- -

I think you need to be able to include these inlets and outlets in the picture of openlineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-01 14:01:24
-
-

*Thread Reply:* Why not use spark integration? https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 14:05:02
-
-

*Thread Reply:* because there are some other jobs that are not spark, some jobs they run in dbt, other jobs they run in redshift @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-01 14:08:58
-
-

*Thread Reply:* So, combo of https://github.com/OpenLineage/OpenLineage/tree/main/integration/dbt and PostgresExtractor from airflow integration should cover Redshift if you're using it from PostgresOperator 🙂

- -

It's definitely interesting use case - you'd be using most of the existing integrations we have.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 15:04:44
-
-

*Thread Reply:* @Maciej Obuchowski Do i need to define any extractor in the airflow startup?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-05 23:48:21
-
-

*Thread Reply:* I am using Redshift with PostgresOperator and it is returning…

- -

[2021-11-06 03:43:06,541] {{__init__.py:92}} ERROR - Failed to extract metadata 'NoneType' object has no attribute 'host' task_type=PostgresOperator airflow_dag_id=counter task_id=inc airflow_run_id=scheduled__2021-11-06T03:42:00+00:00 -Traceback (most recent call last): - File "/usr/local/airflow/.local/lib/python3.7/site-packages/openlineage/lineage_backend/__init__.py", line 83, in _extract_metadata - task_metadata = self._extract(extractor, task_instance) - File "/usr/local/airflow/.local/lib/python3.7/site-packages/openlineage/lineage_backend/__init__.py", line 104, in _extract - task_metadata = extractor.extract_on_complete(task_instance) - File "/usr/local/airflow/.local/lib/python3.7/site-packages/openlineage/airflow/extractors/base.py", line 61, in extract_on_complete - return self.extract() - File "/usr/local/airflow/.local/lib/python3.7/site-packages/openlineage/airflow/extractors/postgres_extractor.py", line 65, in extract - authority=self._get_authority(), - File "/usr/local/airflow/.local/lib/python3.7/site-packages/openlineage/airflow/extractors/postgres_extractor.py", line 120, in _get_authority - if self.conn.host and self.conn.port: -AttributeError: 'NoneType' object has no attribute 'host'

- -

I can’t see this raised as an issue.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 13:57:54
-
-

Hello, I am trying to integrate Airflow with openlineage.

- -

It is not working for me.

- -

What I tried:

- -
  1. Adding openlineage-airflow to requirements.txt
  2. Adding -```- AIRFLOWLINEAGEBACKEND=openlineage.airflow.backend.OpenLineageBackend
  3. -
- -

During handling of the above exception, another exception occurred:

- -

Traceback (most recent call last): - File "/home/airflow/.local/bin/airflow", line 8, in <module> - sys.exit(main()) - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/main.py", line 40, in main - args.func(args) - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/cliparser.py", line 47, in command - func = importstring(importpath) - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/moduleloading.py", line 32, in importstring - module = importmodule(modulepath) - File "/usr/local/lib/python3.8/importlib/init.py", line 127, in importmodule - return bootstrap.gcdimport(name[level:], package, level) - File "<frozen importlib.bootstrap>", line 1014, in gcdimport - File "<frozen importlib.bootstrap>", line 991, in _findandload - File "<frozen importlib.bootstrap>", line 975, in findandloadunlocked - File "<frozen importlib.bootstrap>", line 671, in _loadunlocked - File "<frozen importlib.bootstrapexternal>", line 843, in execmodule - File "<frozen importlib.bootstrap>", line 219, in callwithframesremoved - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/dbcommand.py", line 24, in <module> - from airflow.utils import cli as cliutils, db - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/db.py", line 26, in <module> - from airflow.jobs.basejob import BaseJob # noqa: F401 - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/init.py", line 19, in <module> - import airflow.jobs.backfilljob - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/backfilljob.py", line 29, in <module> - from airflow import models - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/init.py", line 20, in <module> - from airflow.models.baseoperator import BaseOperator, BaseOperatorLink - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 196, in <module> - class BaseOperator(Operator, LoggingMixin, TaskMixin, metaclass=BaseOperatorMeta): - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 941, in BaseOperator - def postexecute(self, context: Any, result: Any = None): - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/lineage/init.py", line 103, in applylineage - _backend = getbackend() - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/lineage/init.py", line 52, in get_backend - clazz = conf.getimport("lineage", "backend", fallback=None) - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/configuration.py", line 469, in getimport - raise AirflowConfigException( -airflow.exceptions.AirflowConfigException: The object could not be loaded. Please check "backend" key in "lineage" section. Current value: "openlineage.airflow.backend.OpenLineageBackend".```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-01 14:06:12
-
-

*Thread Reply:* 1. Please use openlineage.lineage_backend.OpenLineageBackend as AIRFLOW__LINEAGE__BACKEND

- -
  1. Please tell us where you've seen openlineage.airflow.backend.OpenLineageBackend, so we can fix the documentation 🙂
  2. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-01 19:07:21
-
-

*Thread Reply:* https://pypi.org/project/openlineage-airflow/

-
-
PyPI
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-01 19:08:03
-
-

*Thread Reply:* (I googled it and found that page that seems to have an outdated doc)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 02:38:59
-
-

*Thread Reply:* @Maciej Obuchowski -@Julien Le Dem that's the page i followed. Please guys revise the documentation, as it is very important

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 04:34:14
-
-

*Thread Reply:* It should just copy actual readme

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-03 16:30:00
-
-

*Thread Reply:* PyPi is using the README at the time of the release 0.3.1, rather than the current README, which is 0.4.0. If we send the new release to PyPi it should also update the README

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 15:09:54
-
-

Related the Airflow integration. Is it required to install openlineage-airflow and setup the environment variables in both scheduler and webserver, or just in the scheduler?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 15:19:18
-
-

*Thread Reply:* I set i up in the scheduler and it starts to log data to marquez. But it fails with this error:

- -

Traceback (most recent call last): - File "/home/airflow/.local/lib/python3.8/site-packages/openlineage/client/client.py", line 49, in __init__ - raise ValueError(f"Need valid url for OpenLineageClient, passed {url}") -ValueError: Need valid url for OpenLineageClient, passed "<http://marquez-internal-eks.eu-west-1.dev.hbi.systems>"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 15:19:26
-
-

*Thread Reply:* why is it not a valid URL?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-01 18:39:58
-
-

*Thread Reply:* Which version of the OpenLineage client are you using? On first check it should be fine

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 05:14:30
-
-

*Thread Reply:* @John Thomas I was appending double quotes as part of the url. Forget about this error

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-02 10:35:28
-
-

*Thread Reply:* aaaah, gotcha, good catch!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 05:15:52
-
-

Hello, I am receiving this error today when I deployed openlineage in development environment (not using docker-compose locally).

- -

I am running with KubernetesExecutor

- -

airflow.exceptions.AirflowConfigException: The object could not be loaded. Please check "backend" key in "lineage" section. Current value: "openlineage.lineage_backend.OpenLineageBackend".

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 05:18:18
-
-

*Thread Reply:* Are you sure that openlineage-airflow is present in the container?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 05:23:09
-
-

So in this case in my template I am adding:

- -

```env:
- ADDITIONALPYTHONDEPS: "openpyxl==3.0.3 smartopen==2.0.0 apache-airflow-providers-http apache-airflow-providers-cncf-kubernetes apache-airflow-providers-amazon openlineage-airflow" - OPENLINEAGEURL: https://marquez-internal-eks.eu-west-1.dev.hbi.systems - OPENLINEAGENAMESPACE: dnsairflow - AIRFLOWKUBERNETESENVIRONMENTVARIABLESOPENLINEAGEURL: https://marquez-internal-eks.eu-west-1.dev.hbi.systems - AIRFLOWKUBERNETESENVIRONMENTVARIABLESOPENLINEAGENAMESPACE: dns_airflow

- -

configmap: - mountPath: /var/airflow/config # mount path of the configmap - data: - airflow.cfg: | - [lineage] - backend = openlineage.lineage_backend.OpenLineageBackend

- -
pod_template_file.yaml: |
-
-    containers:
-      - args: []
-        command: []
-        env:
-          - name: AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__OPENLINEAGE_URL
-            value: <https://marquez-internal-eks.eu-west-1.dev.hbi.systems>
-          - name: AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__OPENLINEAGE_NAMESPACE
-            value: dns_airflow
-          - name: AIRFLOW__LINEAGE__BACKEND
-            value: openlineage.lineage_backend.OpenLineageBackend```
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 05:23:31
-
-

I am installing openlineage in the ADDITIONAL_PYTHON_DEPS

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 05:25:43
-
-

*Thread Reply:* Maybe ADDITIONAL_PYTHON_DEPS are dependencies needed by the tasks, and are installed after Airflow tries to initialize LineageBackend?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 06:34:11
-
-

*Thread Reply:* I am checking this accessing the Kubernetes pod

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 06:34:54
-
-

I have a question related airflow and open lineage:

- -

I have a dag that contains 2 tasks:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 06:35:34
-
-

I see that every task is displayed as a different job. I was expecting to see one job per dag.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 07:29:43
-
-

Is this the expected behaviour??

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 07:34:47
-
-

*Thread Reply:* Yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 07:35:53
-
-

*Thread Reply:* Probably what you want is job hierarchy: https://github.com/MarquezProject/marquez/issues/1737

-
- - - - - - - -
-
Assignees
- collado-mike -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 07:46:02
-
-

*Thread Reply:* I do not see any benefit of just having some airflow task metadata. I do not see relationship between tasks. Every task is a job. When I was thinking about lineage when i started working on my company integration with openlineage i though that openlineage would give me relationship between task or datasets and the only thing i see is some metadata of the history of airflow runs that is already provided by airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 07:46:20
-
-

*Thread Reply:* i was expecting to see a nice graph. I think it is missing some features

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 07:46:25
-
-

*Thread Reply:* at this early stage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 07:50:10
-
-

*Thread Reply:* It probably depends on whether those tasks are covered by the extractors: https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow/openlineage/airflow/extractors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 07:55:50
-
-

*Thread Reply:* We are not using any of those operators: bigquery, postsgress or snowflake.

- -

And what is it doing GreatExpectactions extractor?

- -

It would be good if there is one extractor that relies in the inlets and outlets that you can define in any Airflow task, and that that can be the general way to make relationships between datasets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 07:56:30
-
-

*Thread Reply:* And that the same dag graph can be seen in marquez, and not one job per task.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 08:07:06
-
-

*Thread Reply:* > It would be good if there is one extractor that relies in the inlets and outlets that you can define in any Airflow task -I think this is good idea. Overall, OpenLineage strongly focuses on automatic metadata collection. However, using them would be a nice fallback for not-covered-yet cases.

- -

> And that the same dag graph can be seen in marquez, and not one job per task. -This currently depends on dataset hierarchy. If you're not using any of the covered extractors, then Marquez can't build dataset graph like in the demo: https://raw.githubusercontent.com/MarquezProject/marquez/main/web/docs/demo.gif

- -

With the job hierarchy ticket, probably some graph could be generated using just the job data though.

-
- - - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 08:09:55
-
-

*Thread Reply:* Created issue for the manual fallback: https://github.com/OpenLineage/OpenLineage/issues/384

-
- - - - - - - -
-
Assignees
- mobuchowski -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 08:28:29
-
-

*Thread Reply:* @Maciej Obuchowski how many people are working full time in this library? I really would like to adopt it in my company, as we use airflow and spark, but i see that yet it does not have the features we would like to.

- -

At the moment the same info we have in marquez related the tasks, is available in airflow UI or using airflow API.

- -

The game changer for us would be that it could give us features/metadata that we cannot query directly from airflow. That's why if the airflow inlets/outlets could be used, then it really would make much more sense for us to adopt it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 09:33:31
-
-

*Thread Reply:* > how many people are working full time in this library? -On Airflow integration or on OpenLineage overall? 🙂

- -

> The game changer for us would be that it could give us features/metadata that we cannot query directly from airflow. -I think there are three options there:

- -
  1. Contribute relevant extractors for Airflow operators that you use
  2. Use those extractors as custom extractors: https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow#custom-extractors
  3. Create that manual fallback mechanism with Airflow inlets/outlets: https://github.com/OpenLineage/OpenLineage/issues/384
  4. -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 09:35:10
-
-

*Thread Reply:* But first, before implementing last option, I'd like to get consensus about it - so feel free to comment there about your use case

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 09:19:14
-
-

@Maciej Obuchowski even i can contribute or help with my ideas (from what i consider that should be lineage from a client side)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-03 07:58:56
-
-

@Maciej Obuchowski I was able to put to work Airflow in Kubernetes pointing to Marquez using the openlineage library. I have a few problems I found that would be good to comment.

- -

I see a warning -[2021-11-03 11:47:04,309] {great_expectations_extractor.py:27} WARNING - Did not find great_expectations_provider library or failed to import it -I couldnt find any information about GreatExpectationsExtractor. Could you tell me what is this extractor about?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-03 08:00:34
-
-

*Thread Reply:* It should only affect you if you're using https://greatexpectations.io/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-03 15:57:02
-
-

*Thread Reply:* I have a similar message after installing openlineage into Amazon MWAA from the scheduler logs:

- -

WARNING:/usr/local/airflow/.local/lib/python3.7/site-packages/openlineage/airflow/extractors/great_expectations_extractor.py:Did not find great_expectations_provider library or failed to import it

- -

I am not using great expectations in the DAG.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-03 08:00:52
-
-

I see a few priorities for Airflow integration:

- -
  1. Direct relationship 1-1 between Dag && Job. At the moment every task is a different job in marquez. What i consider wrong.
  2. Airflow Inlets/outlets integration with marquez -When do you think you guys can have this? If you need any help I can happily contribute, but I would need some help
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-03 08:08:21
-
-

*Thread Reply:* I don't think 1) is a good idea. You can have multiple tasks in one dag, processing different datasets and producing different datasets. If you want visual linking of jobs that produce disjoint datasets, then I think you want this: https://github.com/MarquezProject/marquez/issues/1737 -which wuill affect visual layer.

- -

Regarding 2), I think we need to get along with Airflow maintainers regarding long term mechanism on which OL will work: https://github.com/apache/airflow/issues/17984

- -

I think using inlets/outlets as a fallback mechanism when we're not doing automatic metadata extraction is a good idea, but we don't know if hypothetical future mechanism will have access to these. It's hard to commit to mechanism which might disappear soon.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-03 08:13:28
-
-

Another option is that I build my own extractor, do you have any example of how to create a custom extractor? How I can apply that customExtractor to specific operators? Is there a way to link an extractor with an operator, so at runtime airflow knows which extractor to run?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-03 08:19:00
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow#custom-extractors

- -

I think you can base your code on any existing extractor, like PostgresExtractor: https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/openlineage/airflow/extractors/postgres_extractor.py#L53

- -

Custom extractors work just like buildin ones, just that you need to add bit of mapping between operator and extractor, like OPENLINEAGE_EXTRACTOR_PostgresOperator=openlineage.airflow.extractors.postgres_extractor.PostgresExtractor

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Francis McGregor-Macdonald -
- -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-03 08:35:59
-
-

*Thread Reply:* Thank you very much @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-03 08:36:52
-
-

Last question of the morning. Running one task that failed i could see that no information appeared in Marquez. Is this something that is expected to happen? I would like to see in Marquez all the history of runs, successful and unsucessful them.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-03 08:41:14
-
-

*Thread Reply:* It worked like that in Airflow 1.10.

- -

This is an unfortunate limitation of LineageBackend API that we're using for Airflow 2. We're trying to work out solution for this with Airflow maintainers: https://github.com/apache/airflow/issues/17984

-
- - - - - - - -
-
Labels
- kind:feature, area:lineage -
- -
-
Comments
- 23 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-04 03:41:38
-
-

Hello openlineage community.

- -

Yesterday I tried the integration with spark.

- -

The result was not satisfactory. This is what I did:

- -
  1. Add openlineage-spark dependency
  2. Add these lines: -.config("spark.jars.packages", "io.openlineage:openlineage_spark:0.3.1") -.config("spark.extraListeners", "io.openlineage.spark.agent.OpenLineageSparkListener") -.config("spark.openlineage.url", "<https://marquez-internal-eks.eu-west-1.dev.hbi.systems/api/v1/namespaces/spark_integration/>" -This job was doing spark.read from 2 different json location. -It is doing spark write to 5 different parquet location in s3. -The job finished succesfully and the result in marquez is:
  3. -
- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-04 03:43:40
-
-

It created 3 namespaces. One was the one that I point in the spark config property. The other 2 are the bucket that we are writing to () and the bucket where we are reading from ()

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-04 03:44:00
-
-

If I enter in the bucket namespaces I see nowthing inside

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-04 03:48:35
-
-

I can see if i enter in one of the weird jobs generated this:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-04 18:47:41
-
-

*Thread Reply:* This job with no output is a symptom of the output not being understood. you should be able to see the facets for that job. There will be a spark_unknown facet with more information about the problem. If you put that into an issue with some more details about this job we should be able to help.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-05 04:36:30
-
-

*Thread Reply:* I ll try to put all the info in a ticket, as it is not working as i would expect

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-04 03:52:24
-
-

And i am seeing this as well

- -

If I check the logs of marquez-web and marquez I can't see any error there

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-04 03:54:38
-
-

When I try to open the job fulfilments.execute_insert_into_hadoop_fs_relation_command I see this window:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-04 04:06:29
-
-

The page froze and no link from the menu works. Apart from that I see that there are no messages in the logs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-04 18:49:31
-
-

*Thread Reply:* Is there an error in the browser javascript console? (example on chrome: View -> Developer -> Javascript console)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Alessandro Rizzo - (l.alessandrorizzo@gmail.com) -
-
2021-11-04 17:22:29
-
-

Hi #general, I'm a data engineer for a UK-based insuretech (part of one of the biggest UK retail insurers). We run a series of tech meetups and we'd love to have someone from the OpenLineage project to give us a demo of the tool. Would anyone be interested (DM if so 🙂 ) ?

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Taleb Zeghmi - (talebz@zillowgroup.com) -
-
2021-11-04 21:30:24
-
-

Hi! Is there an example of tracking lineage when using Pandas to read/write and transform data?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-04 21:35:16
-
-

*Thread Reply:* Hi Taleb - I don’t know of a generalized example of lineage tracking with Pandas, but you should be able to accomplish this by sending the runEvents manually to the OpenLineage API in your code: -https://openlineage.io/docs/openapi/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Taleb Zeghmi - (talebz@zillowgroup.com) -
-
2021-11-04 21:38:25
-
-

*Thread Reply:* Is this a work in progress, that we can investigate? Because I see it in this image https://github.com/OpenLineage/OpenLineage/blob/main/doc/Scope.png

-
- - - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-04 21:54:51
-
-

*Thread Reply:* To my knowledge, while there are a few proposals around adding a wrapper on some Pandas methods to output runEvents, it’s not something that’s had work started on it yet

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-04 21:56:26
-
-

*Thread Reply:* I sent some feelers out to get a little more context from folks who are more informed about this than I am, so I’ll get you more info about potential future plans and the considerations around them when I know more

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-04 23:04:47
-
-

*Thread Reply:* So, Pandas is tricky because unlike Airflow, DBT, or Spark, Pandas doesn’t own the whole flow, and you might dip in and out of it to use other Python Packages (at least I did when I was doing more Data Science).

- -

We have this issue open in OpenLineage that you should go +1 to help with our planning 🙂

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Taleb Zeghmi - (talebz@zillowgroup.com) -
-
2021-11-05 15:08:09
-
-

*Thread Reply:* interesting... what if it were instead on all the read_** to_** functions?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-05 12:00:57
-
-

Hi! I am working alongside David at integrating OpenLineage into our Data Pipelines. I have a questions around Marquez and OpenLineage's divergent APIs: -That is to say, these 2 APIs differ: -https://openlineage.io/docs/openapi/ -https://marquezproject.github.io/marquez/openapi.html -This makes sense since they are at different layers of abstraction, but Marquez requires a few things that are absent from OpenLineage's API, for example the type in a data source, the distinctions between physicalName and sourceName in Datasets. Is that intentional? And can these be set using the OpenLineage API as some additional facets or keys? I noticed that the DatasourceDatasetFacet has a map of additionalProperties .

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-05 12:59:49
-
-

*Thread Reply:* The Marquez write APIs are artifacts from before OpenLineage existed, and they’re already slated for deprecation soon.

- -

If you POST an OpenLineage runEvent to the /lineage endpoint in Marquez, it’ll create any missing jobs or datasets that are relevant.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-05 13:06:06
-
-

*Thread Reply:* Thanks for the response. That sounds good. Does this include the query interface e.g. -http://localhost:5000/api/v1/namespaces/testing_java/datasets/incremental_data -as that currently returns the Marquez version of a dataset including default set fields for type and the above mentioned properties.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-11-05 17:01:55
-
-

*Thread Reply:* I believe the intention for type is to support a new facet- TBH, it hasn't been the most pressing concern for most users, as most people are only recording tables, not streams. However, there's been some recent work to support Kafka in Spark- maybe it's time to address that deficiency.

- -

I don't actually know what happened to the datasource type field- maybe @Julien Le Dem can comment on whether that field was dropped intentionally or whether it was an oversight.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-05 18:18:06
-
-

*Thread Reply:* It looks like an oversight, currently Marquez hard codes it to POSGRESQL: https://github.com/MarquezProject/marquez/blob/734bfd691636cb00212d7d22b1a489bd4870fb04/api/src/main/java/marquez/db/OpenLineageDao.java#L438

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-05 18:18:25
-
-

*Thread Reply:* https://github.com/MarquezProject/marquez/blob/734bfd691636cb00212d7d22b1a489bd4870fb04/api/src/main/java/marquez/db/OpenLineageDao.java#L438-L440

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-05 18:20:25
-
-

*Thread Reply:* The source has a name though: https://github.com/OpenLineage/OpenLineage/blob/8afc4ff88b8dd8090cd9c45061a9f669fea2151e/spec/facets/DatasourceDatasetFacet.json#L12

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-05 18:07:16
-
-

The next OpenLineage monthly meeting is this coming Wednesday at 9am PT -The tentative agenda is: -• OL Client use cases for Apache Iceberg [Ryan] -• OpenLineage and Azure Purview [Shrikanth] -• Proxy Backend and Egeria integration progress update (Issue #152) [Mandy] -• OpenLineage last release overview (0.3.1) - ◦ Facet versioning - ◦ Airflow 2 / Spark 3 support, dbt improvements -• OpenLineage 0.4 scope review - ◦ Proxy Backend (Issue #152) - ◦ Spark, Airflow, dbt improvements (documentation, coverage, ...) - ◦ improvements to the OpenLineage model -• Open discussion 

-
- - - - - - - -
-
Assignees
- mandy-chessell -
- -
-
Comments
- 3 -
- - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-05 18:07:57
-
-

*Thread Reply:* If you want to add something please chime in this thread

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-05 19:27:44
- -
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-09 19:47:26
-
-

*Thread Reply:* The monthly meeting is happening tomorrow. -The purview team will present at the December meeting instead -See full agenda here: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting -You are welcome to contribute

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-10 11:10:17
-
-

*Thread Reply:* The slides for the meeting later today: https://docs.google.com/presentation/d/1z2NTkkL8hg_2typHRYhcFPyD5az-5-tl/edit#slide=id.ge7d4b64ef4_0_0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-10 12:02:23
-
-

*Thread Reply:* It’s happening now ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-16 19:57:23
-
-

*Thread Reply:* I have posted the notes and the recording from the last instance of our monthly meeting: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting#MonthlyTSCmeeting-Nov10th2021(9amPT) -I have a few TODOs to follow up on tickets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-05 18:09:10
-
-

The next release of OpenLineage is being scoped: https://github.com/OpenLineage/OpenLineage/projects/6 -Please chime in if you want to raise the priority of something or are planning to contribute

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anthony Ivanov - (anthvt@gmail.com) -
-
2021-11-09 08:18:11
-
-

Hi, I have been looking at open lineage for some time. And I really like it. It is very simple specification that covers a lot of use-cases. You can create any provider or consumer in a very simple way. So that’s pretty powerful. -I have some questions about things that are not clear to me. I am not sure if this is the best place to ask. Please refer me to other place if this is not appropriate.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anthony Ivanov - (anthvt@gmail.com) -
-
2021-11-09 08:18:58
-
-

*Thread Reply:* How do you model continuous process (not batch processes). For example a flume or spark job that does some real time processing on data.

- -

Maybe it’s simply a “Job” But than what is run ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anthony Ivanov - (anthvt@gmail.com) -
-
2021-11-09 08:19:44
-
-

*Thread Reply:* How do you model consumers at the end - they can be reports? Data applications, ML model deployments, APIs, GUI consumed by end users ?

- -

Have you considered having some examples of different use cases like those?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anthony Ivanov - (anthvt@gmail.com) -
-
2021-11-09 08:21:43
-
-

*Thread Reply: By definition, Job is a process definition that consumes and produces datasets. It is many to many relations? I’ve been wondering about that.Shouldn’t be more restrictive? -For example important use-case for lineage is troubleshooting or error notifications (e.g mark report or job as temporarily in bad state if upstream data integration is broken). -In order to be able to that you need to be able to traverse the graph to find the original error. So having multiple inputs produce single output make sense (e.g insert into output_1 select * from x,y group by a,b) . -But what are the cases where you’d want to see multiple outputs ? You can have single process produce multiple tables (in above example) but they’d alway be separate queries. The actual inputs for each output would be different.

- -

But having multiple outputs create ambiguity as now If x or y is broken but have multiple outputs I do not know which is really impacted?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-09 08:34:01
-
-

*Thread Reply:* > How do you model continuous process (not batch processes). For example a flume or spark job that does some real time processing on data. -> -> Maybe it’s simply a “Job” But than what is run ? -Every continuous process eventually has end - for example, you can deploy new version of your Flink pipeline. The new version would be the next Run for the same Job.

- -

Moreover, OTHER event type is useful to update metadata like amount of processed records. In this Flink example, it could be emitted per checkpoint.

- -

I think more attention for streaming use cases will be given soon.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-09 08:43:09
-
-

*Thread Reply:* > How do you model consumers at the end - they can be reports? Data applications, ML model deployments, APIs, GUI consumed by end users ? -Our reference implementation is an web application https://marquezproject.github.io/marquez/

- -

We definitely do not exclude any of the things you're talking about - and it would make a lot of sense to talk more about potential usages.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-09 08:45:47
-
-

*Thread Reply:* > By definition, Job is a process definition that consumes and produces datasets. It is many to many relations? I’ve been wondering about that.Shouldn’t be more restrictive? -I think this is too SQL-centric view 🙂

- -

Not everything is a query. For example, those Flink streaming jobs can produce side outputs, or even push data to multiple sinks. We need to model those types of jobs too.

- -

If your application does not do multiple outputs, then I don't see how specification allowing those would impact you.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anthony Ivanov - (anthvt@gmail.com) -
-
2021-11-17 12:11:37
-
-

*Thread Reply:* > We definitely do not exclude any of the things you’re talking about - and it would make a lot of sense to talk more about potential usages. -Yes I think that would be great if we expand on potential usages. if Open Lineage documentation (perhaps) has all kind of examples for different use-cases or case studies. Financal or healthcase industry case study and how would someone doing integration with OpenLineage. It would be easier to understand the concepts and make sure things are modeled consistently.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anthony Ivanov - (anthvt@gmail.com) -
-
2021-11-17 14:19:19
-
-

*Thread Reply:* > I think this is too SQL-centric view 🙂 -> -> Not everything is a query. For example, those Flink streaming jobs can produce side outputs, or even push data to multiple sinks. We need to model those types of jobs too. -Thanks for answering @Maciej Obuchowski

- -

Even in SQL you can have multiple outputs if you look thing at transaction level. I was simply using it as an example.

- -

Maybe it would be clear what I mean in another example . Let’s say we have those phases

- -
  1. Ingest from sources
  2. Process/transform
  3. export to somewhere -(image/diagram) -https://mermaid.ink/img/eyJjb2RlIjoiXG5ncmFwaCBMUlxuICAgIHN1YmdyYXBoIFNvdXJjZXNcbi[…]yIjpmYWxzZSwiYXV0b1N5bmMiOnRydWUsInVwZGF0ZURpYWdyYW0iOmZhbHNlfQ
  4. -
- -

Let’s look at those two cases:

- -
  1. Within a single flink job and even task: Inventory & UI are both written to both S3, DB
  2. Within a single flink job and even task: Inventory is written only to S3, UI is written only to DB
  3. -
- -

In 1. open lineage run event could look like {inputs: [ui, inventory], outputs: [s3, db] }

- -

In 2. user can either do same as 1. (because data changes or copy-paste) which would be an error since both do not go to both -Likely accurate one would be -{inputs: [ui], outputs: [s3] } {inputs: [ui], outputs: [db] }

- -

If the specification standard required single output then

- -
  1. would be modelled like run event {inputs: [ui, inventory], outputs: [s3] } ; {inputs: [ui, inventory], outputs: [db] } which is still correct if more verbose.
  2. could only be modelled this way: -{inputs: [ui], outputs: [s3] }; {inputs: [ui], outputs: [db] }
  3. -
- -

The more restrictive specification seems to lower the chance for an error doesn’t it?

- -

Also if tools know spec guarantees single output , they’d be able to write tracing capabilities which are more precise because the structure would allow for less ambiguity. -Storage backends that implement the spec could be also written in more optimal ways perhaps I have not looked into those accuracy of those hypothesis though.

- -

Those were the thoughts I was thinking when asking about that. I’d be curious if there’s document on the research of pros/cons and alternatives for the design of the current specifications

-
- - - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-23 05:38:11
-
-

*Thread Reply:* @Anthony Ivanov I see what you're trying to model. I think this could be solved by column level lineage though - when we'll have it. OL consumer could look at particular columns and derive which table contained particular error.

- -

> 2. Within a single flink job and even task: Inventory is written only to S3, UI is written only to DB -Does that actually happen? I understand this in case of job, but having single operator write to two different systems seems like bad design. Wouldn't that leave the possibility of breaking exactly-once unless you're going full into two phase commit?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anthony Ivanov - (anthvt@gmail.com) -
-
2021-11-23 17:02:36
-
-

*Thread Reply:* > Does that actually happen? I understand this in case of job, but having single operator write to two different systems seems like bad design -In a Spark or flink job it is less likely now that you mention it. But in a batch job (airflow python or kubernetes operator for example) users could do anything and then they’d need lineage to figure out what is wrong if even if what they did is suboptimal 🙂

- -

> I see what you’re trying to model. -I am not trying to model something specific. I am trying to understand how would openlineage be used in different organisations/companies and use-cases.

- -

> I think this could be solved by column level lineage though -There’s something specific planned ? I could not find a ticket in github. I thought you can use Dataset Facets - Schema for example could be subset of columns for a table …

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-24 04:55:41
-
-

*Thread Reply:* @Anthony Ivanov take a look at this: https://github.com/OpenLineage/OpenLineage/issues/148

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-10 13:21:23
-
-

How do you deleting jobs/runs from Marquez/OpenLineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-11-10 16:17:10
-
-

*Thread Reply:* We’re adding APIs to delete metadata in Marquez 0.20.0. Here’s the related issue, https://github.com/MarquezProject/marquez/issues/1736

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-11-10 16:17:37
-
-

*Thread Reply:* Until then, you can connected to the DB directly and drop the rows from both the datasets and jobs tables (I know, not dieal)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 05:03:50
-
-

*Thread Reply:* Thanks! I assume deleting information will remain a Marquez only feature rather than becoming part of OpenLineage itself?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-12-10 14:07:57
-
-

*Thread Reply:* Yes! Delete operations will be an action supported by consumers of OpenLineage events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 05:13:31
-
-

Am I understanding namespaces correctly? A job namespace is different to a Dataset namespace. -And that job namespaces define a job environment, like Airflow, Spark or some other system that executes jobs. But Dataset namespace define data locations, like an S3 bucket, local file system or schema in a Database?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 05:14:39
-
-

*Thread Reply:* I've been skimming this page: https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-11 05:46:06
-
-

*Thread Reply:* Yes!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 06:17:01
-
-

*Thread Reply:* Excellent, I think I had mistakenly conflated the two originally. This document makes it a little clearer. -As an additional question: -When viewing a Dataset in Marquez will it cross the job namespace bounds? As in, will I see jobs from different job namespaces?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 09:20:14
-
-

*Thread Reply:* In this example I have 1 job namespace and 2 dataset namespaces: -sql-runner-dev is the job namespace. -I cannot see a graph of my job now. Is this something to do with the namespace names?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 09:21:46
-
-

*Thread Reply:* The above document seems to have implied a namespace could be like a connection string for a database

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 09:22:25
-
-

*Thread Reply:* Wait, it does work? Marquez was being temperamental

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 09:24:01
-
-

*Thread Reply:* Yes, marquez is unable to fetch lineage for either dataset

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 09:32:19
-
-

*Thread Reply:* Here's what I mean:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-11 09:59:24
-
-

*Thread Reply:* I think you might have hit this issue: https://github.com/MarquezProject/marquez/issues/1744

-
- - - - - - - -
-
Labels
- bug -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-11 10:00:29
-
-

*Thread Reply:* or, maybe not? It was released already.

- -

Can you create issue on github with those helpful gifs? @Lyndon Armitage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 10:58:25
-
-

*Thread Reply:* I think you are right Maciej

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 10:58:52
-
-

*Thread Reply:* Was that patched in 0,19.1?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-11 11:06:06
-
-

*Thread Reply:* As far as I see yes: https://github.com/MarquezProject/marquez/releases/tag/0.19.1

- -

Haven't tested this myself unfortunately.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 11:07:07
-
-

*Thread Reply:* Perhaps not. It is urlencoding them: -<http://localhost:3000/lineage/dataset/jdbc%3Ah2%3Amem%3Asql_tests_like/HBMOFA.ORDDETP> -But the error seems to be in marquez getting them.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 11:09:23
-
-

*Thread Reply:* This is an example Lineage event JSON I am sending.

- -
- - - - - - - -
- - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 11:11:29
-
-

*Thread Reply:* I did run into another issue with really long names not being supported due to Marquez's DB using a fixed size string for a column, but that is understandable and probably a non-issue (my test code was generating temporary folders with long names).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 11:22:00
- -
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-11 11:36:01
-
-

*Thread Reply:* @Lyndon Armitage can you create issue on the Marquez repo? https://github.com/MarquezProject/marquez/issues

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 11:52:36
-
-

*Thread Reply:* https://github.com/MarquezProject/marquez/issues/1761 Is this sufficient?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-11 11:54:41
-
-

*Thread Reply:* Yup, thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-15 13:00:39
-
-

I am looking at an AWS Glue Crawler lineage event. The glue crawler creates or updates a table schema, and I have a few questions on aligning to best practice.

- -
  1. Is this a dataset create/update or…
  2. … a job with no dataset inputs and only dataset outputs or
  3. … is the path in S3 the input and the Glue table the output?
  4. Is there an example of the lineage even here I can clone or work from? -Thanks.
  5. -
- - - -
- 🚀 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-15 13:04:19
-
-

*Thread Reply:* Hi Francis, for the event is it creating a new table with new data in glue / adding new data to an existing one or is it simply reformatting an existing table or making an empty one?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-15 13:35:00
-
-

*Thread Reply:* The table does not exist in the Glue catalog until …

- -

A Glue crawler connects to one or more data stores (in this case S3), determines the data structures, and writes tables into the Data Catalog.

- -

The data/objects are in S3, the Glue catalog is a metadata representation (HIVE) as as table.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-15 13:41:14
-
-

*Thread Reply:* Hmm, interesting, so the lineage of interest here would be of the metadata flow not of the data itself?

- -

In that case I’d say that the glue Crawler is a job that outputs a dataset.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-11-15 15:03:36
-
-

*Thread Reply:* The crawler is a job that discovers a dataset. It doesn't create it. If you're posting lineage yourself, I'd post it as an input event, not an output. The thing that actually wrote the data - generated the records and stored them in S3 - is the thing that would be outputting the dataset

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-15 15:23:23
-
-

*Thread Reply:* @Michael Collado I agree the crawler discovers the S3 dataset. It also creates an event which creates/updates the HIVE/Glue table.

- -

If the Glue table isn’t a distinct dataset from the S3 data, how does this compare to a view in a database on top of a table. Are they 2 datasets or just one?

- -

Glue can discover data in remote databases too, in those cases does it make sense to have only the source dataset?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-15 15:24:39
-
-

*Thread Reply:* @John Thomas yes, its the metadata flow.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-11-15 15:24:52
-
-

*Thread Reply:* that's how the Spark integration currently treats Hive datasets- I'd like to add a facet to attach that indicates that it is being read as a Hive table, and include all the appropriate metadata, but it uses the dataset's location in S3 as the canonical dataset identifier

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-15 15:29:22
-
-

*Thread Reply:* @Francis McGregor-Macdonald I think the way to represent this is predicated on what you’re looking to accomplish by sending a runEvent for the Glue crawler. What are your broader objectives in adding this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-15 15:50:37
-
-

*Thread Reply:* I am working through AWS native services seeing how they could, can, or do best integrate with openlineage (I’m an AWS SA). Hence the questions on best practice.

- -

Aligning with the Spark integration sounds like it might make sense then. Is there an example I could build from?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-11-15 17:56:17
-
-

*Thread Reply:* an example of reporting lineage? you can look at the Spark integration here - https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-15 17:59:14
-
-

*Thread Reply:* Ahh, in that case I would have to agree with Michael’s approach to things!

- - - -
- ✅ Diogo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-19 03:30:03
-
-

*Thread Reply:* @Michael Collado I am following the Spark integration you recommended (for a Glue job) and while everything appears to be set up correct, I am getting no lineage appear in marquez (a request.get from the pyspark script can reach the endpoint). Is there a way to enable a debug log so I can look to identify where the issue is? -Is there a specific place to look in the regular logs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-11-19 13:39:01
-
-

*Thread Reply:* listener output should be present in the driver logs. you can turn on debug logging in your log4j config (or whatever logging tool you use) for the package io.openlineage.spark.agent

- - - -
- ✅ Francis McGregor-Macdonald -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-11-19 19:44:06
-
-

Woo hoo! Initial Spark <-> Kafka support has been merged 🙂 https://github.com/OpenLineage/OpenLineage/pull/387

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- 🎉 Willy Lulciuc, John Thomas, Peter Hicks, Maciej Obuchowski -
- -
- 🙌 Willy Lulciuc, John Thomas, Francis McGregor-Macdonald, Peter Hicks, Maciej Obuchowski -
- -
- 🚀 Willy Lulciuc, John Thomas, Peter Hicks, Francis McGregor-Macdonald, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 13:32:57
-
-

I am “successfully” exporting lineage to openlineage from AWS Glue using the listener. Only the source load is showing, not the transforms, or the sink

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 13:34:15
-
-

*Thread Reply:* Output event:

- -

2021-11-22 08:12:15,513 INFO [spark-listener-group-shared] agent.OpenLineageContext (OpenLineageContext.java:emit(50)): Lineage completed successfully: ResponseMessage(responseCode=201, body=, error=null) { - “eventType”: “COMPLETE”, - “eventTime”: “2021-11-22T08:12:15.478Z”, - “run”: { - “runId”: “03bfc770-2151-499e-9265-8457a38ceec3”, - “facets”: { - “sparkversion”: { - “producer”: “https://github.com/OpenLineage/OpenLineage/tree/0.3.1/integration/spark”, - “schemaURL”: “https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet”, - “spark-version”: “3.1.1-amzn-0”, - “openlineage-spark-version”: “0.3.1” - } - } - }, - “job”: { - “namespace”: “sparkintegration”, - “name”: “nyctaxirawstage.mappartitionsunionmappartitionsnew_hadoop” - }, - “inputs”: [ - { - “namespace”: “s3.cdkdl-dev-foundationstoragef3787fa8-raw1d6fb60a-171gwxf2sixt9”, - “name”: “” - } - ], - “outputs”: [], - “producer”: “https://github.com/OpenLineage/OpenLineage/tree/0.3.1/integration/spark”, - “schemaURL”: “https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunEvent” -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 13:34:59
-
-

*Thread Reply:* This sink record is missing details …

- -

2021-11-22 08:12:15,481 INFO [Thread-7] sinks.HadoopDataSink (HadoopDataSink.scala:$anonfun$writeDynamicFrame$1(275)): nameSpace: , table:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 13:40:30
-
-

*Thread Reply:* I can also see multiple history events (presumably for each transform, each as above) emitted for the same Glue Job, with different RunId, with the same inputs and the same (null) output.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-22 14:31:06
-
-

*Thread Reply:* Are you using the existing spark integration for the spark lineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 14:46:47
-
-

*Thread Reply:* I followed: https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark -In the Glue context I was not clear on the correct settings for “spark.openlineage.parentJobName” and “spark.openlineage.parentRunId”, I put in static values (which may be incorrect)? -I injected these via: "--conf": "spark.openlineage.parentJobName=nyc-taxi-raw-stage",

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 14:47:54
-
-

*Thread Reply:* Happy to share what is working when I am done, I can’t seem to find an AWS Glue specific example to walk me through.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-22 15:03:31
-
-

*Thread Reply:* yeah, We haven’t spent any significant time with AWS Glue, but we just released the Databricks integration, which might help guide the way you’re working a little bit more

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 15:12:15
-
-

*Thread Reply:* from what I can see in the DBX integration (https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark/databricks) all of what is being done here I am doing in Glue (upload the jar, embed the settings into the Glue spark job). -It is emitting the above for each transform in the Glue job, but does not seem to capture the output …

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 15:13:54
-
-

*Thread Reply:* Is there a standard Spark test script in use with openlineage I could put into Glue to test without using any Glue specific functionality (without for example the GlueContext, or Glue dynamic frames)?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 15:25:30
-
-

*Thread Reply:* The initialisation does appear to be working if I compare it to the DBX README -Mine from AWS Glue… -21/11/22 18:48:48 INFO SparkContext: Registered listener io.openlineage.spark.agent.OpenLineageSparkListener -21/11/22 18:48:49 INFO OpenLineageContext: Init OpenLineageContext: Args: ArgumentParser(host=<http://ec2>-….<a href="http://compute-1.amazonaws.com:5000">compute-1.amazonaws.com:5000</a>, version=v1, namespace=spark_integration, jobName=default, parentRunId=null, apiKey=Optional.empty) URI: <http://ec2>-….<a href="http://compute-1.amazonaws.com:5000/api/v1/lineage">compute-1.amazonaws.com:5000/api/v1/lineage</a> -21/11/22 18:48:49 INFO AsyncEventQueue: Process of event SparkListenerApplicationStart(nyc-taxi-raw-stage,Some(spark-application-1637606927106),1637606926281,spark,None,None,None) by listener OpenLineageSparkListener took 1.092252643s.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-22 16:12:40
-
-

*Thread Reply:* We don’t have a test run, unfortunately, but you could follow this blog post’s processes in each and see what the differences are? https://openlineage.io/blog/openlineage-spark/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 16:43:23
-
-

*Thread Reply:* Thanks, I have been looking at that. I will create a Glue job aligned with that. What is the best way to pass feedback? Keep it here?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-22 16:49:50
-
-

*Thread Reply:* yeah, this thread will work great 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilya Davidov - (idavidov@marpaihealth.com) -
-
2022-07-18 11:37:02
-
-

*Thread Reply:* @Francis McGregor-Macdonald are you managed to enable it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2022-07-18 15:14:47
-
-

*Thread Reply:* Just DM you the code I used a while back (app.py + CDK code). I haven’t used it in a while, and there is some duplication in it. I had openlineage enabled, but dynamic frames not working yet with lineage. Let me know how you go. -I haven’t had the space to look at it in a while, but happy to support if you are looking at it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-11-23 08:48:51
-
-

how to use the Open lineage with amundsen ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-23 09:01:11
-
-

*Thread Reply:* You can use this: https://github.com/amundsen-io/amundsen/pull/1444

-
- - - - - - - -
-
Labels
- area:databuilder, area:dev-tools, area:docs -
- -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-23 09:38:44
-
-

*Thread Reply:* you can also check out this section from the Amundsen Community Meeting in october: https://www.youtube.com/watch?v=7WgECcmLSRk

-
-
YouTube
- -
- - - } - - Amundsen - (https://www.youtube.com/channel/UCgOyzG0sEoolxuC9YXDYPeg) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-11-23 08:49:16
-
-

do we need to use the Marquez ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-11-23 12:45:34
-
-

*Thread Reply:* No, I believe the databuilder OpenLineage extractor for Amundsen will continue to store lineage metadata in Atlas

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-11-23 12:47:01
-
-

*Thread Reply:* We've spoken to the Amundsen team, and though using Marquez to store lineage metadata isn't an option, it's an integration that makes sense but hasn't yet been prioritized

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-11-23 13:51:00
-
-

*Thread Reply:* Thanks , Right now amundsen has no support for lineage extraction from spark or airflow , if this case do we need to use marquez for open lineage implementation to capture the lineage from airflow & spark

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-11-23 13:57:13
-
-

*Thread Reply:* Maybe, that would mean running the full Amundsen stack as well as the Marquez stack along side each other (not ideal). The OpenLineage integration for Amundsen is very recent, so haven't had a chance to look deeply into the implementation. But, briefly looking over the config for Openlineagetablelineageextractor, you can only send metadata to Atlas

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-11-24 00:36:56
-
-

*Thread Reply:* @Willy Lulciuc thats our real concern , running the two stacks will make a mess environment , let me explain our amundsen setup , we are having neo4j as backend , (front end , search service , metadata service,elastic search & neo4j) . our requirement to capture lineage from spark and airflow , imported into amundsen

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vinith Krishnan US - (vinithk@nvidia.com) -
-
2022-03-11 22:33:39
-
-

*Thread Reply:* We are running into a similar issue. @Dinakar Sundar were you able to get the Amundsen OpenLineage integration to work with a neo4j backend?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
bitsofinfo - (bitsofinfo.g@gmail.com) -
-
2021-11-24 11:41:31
-
-

Hi all - i just watched the presentation on this and Marquez from the Airflow 21 summit. I was pretty impressed with this. My question is what other open source players are in this space or are pretty much people consolidating around this? (which would be great). Was looking at the available datasource extractors for the airflow side and would hope to see more here, looking at the code doesn't seem like too huge of a deal. Is there a roadmap available?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-24 11:49:14
-
-

*Thread Reply:* You can take a look at https://github.com/OpenLineage/OpenLineage/projects

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2021-11-24 19:24:48
-
-

Hi all, I was wondering what is the status of native support of openlineage for DataHub or Amundzen. re https://openlineage.slack.com/archives/C01CK9T7HKR/p1633633476151000?thread_ts=1633008095.115900&cid=C01CK9T7HKR -Many thanks!

-
- - -
- - - } - - Julien Le Dem - (https://openlineage.slack.com/team/U01DCLP0GU9) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2021-12-01 16:35:17
-
-

*Thread Reply:* Anyone? Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-11-25 01:42:26
-
-

our amundsen setup , we are having neo4j as backend , (front end , search service , metadata service,elastic search & neo4j) . our requirement to capture lineage from spark and airflow , imported into amundsen ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-11-29 23:30:12
-
-

Hello, OpenLineage folks - I'm curious if anyone here has ran into an issue like we're running into as we look to extend OpenLineage's Spark integration into Databricks.

- -

Has anyone ran into an issue where a scala class should exist (based on a decompiled jar, I see that it's a public class) but you keep getting an error like object SqlDWRelation in package sqldw cannot be accessed in package com.databricks.spark.sqldw?

- -

Databricks has a Synapse SQL DW connector: https://docs.databricks.com/data/data-sources/azure/synapse-analytics.html

- -

I want to extract the database URL, table, and schema from the logical plan but

- -

I execute something like the below command that runs a SELECT ** on the given tableName ("borrower" in this case) in the Azure Synapse database.

- -

val df = spark.read.format("com.databricks.spark.sqldw") -.option("url", sqlDwUrl) -.option("tempDir", tempDir) -.option("forwardSparkAzureStorageCredentials", "true") -.option("dbTable", tableName) -.load() -val logicalPlan = df.queryExecution.logical -val logicalRelation = logicalPlan.asInstanceOf[LogicalRelation] -val sqlBaseRelation = logicalRelation.relation -I end up with something like this, all good so far: -```logicalPlan: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = -Relation[memberId#97,residentialState#98,yearsEmployment#99,homeOwnership#100,annualIncome#101,incomeVerified#102,dtiRatio#103,lengthCreditHistory#104,numTotalCreditLines#105,numOpenCreditLines#106,numOpenCreditLines1Year#107,revolvingBalance#108,revolvingUtilizationRate#109,numDerogatoryRec#110,numDelinquency2Years#111,numChargeoff1year#112,numInquiries6Mon#113] SqlDWRelation("borrower")

- -

logicalRelation: org.apache.spark.sql.execution.datasources.LogicalRelation = -Relation[memberId#97,residentialState#98,yearsEmployment#99,homeOwnership#100,annualIncome#101,incomeVerified#102,dtiRatio#103,lengthCreditHistory#104,numTotalCreditLines#105,numOpenCreditLines#106,numOpenCreditLines1Year#107,revolvingBalance#108,revolvingUtilizationRate#109,numDerogatoryRec#110,numDelinquency2Years#111,numChargeoff1year#112,numInquiries6Mon#113] SqlDWRelation("borrower")

- -

sqlBaseRelation: org.apache.spark.sql.sources.BaseRelation = SqlDWRelation("borrower")`` -Schema, I can easily get withsqlBaseRelation.schema` but I cannot figure out:

- -
  1. How I can get the database name from the logical relation
  2. How I can get the table name from the logical relation ("borrower" is the table name so I can always parse the string if necessary" -I know that Databricks has the SqlDWRelation class which I think I need to cast the BaseRelation to BUT it appears to be in a jar / package that is inaccessible during the execution of a notebook. Specifically import com.databricks.spark.sqldw.SqlDWRelation is the relation and it appears to have a few accessors that would help me answer some of these questions: params and JDBCWrapper
  3. -
- -

Of course this is undocumented on the Databricks side 😰

- -

If I could cast the BaseRelation into this SqlDWRelation, I'd be able to get this info. However, whenever I attempt to use the imported SqlDWRelation, I get an error object SqlDWRelation in package sqldw cannot be accessed in package com.databricks.spark.sqldw I'm hoping someone has run into something similar in the past on the Spark / Databricks / Scala side and might share some advice. Thank you for any guidance!

-
-
docs.databricks.com
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-30 07:03:30
- -
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-11-30 11:21:34
-
-

*Thread Reply:* I have not! Will give it a try, Maciej! Thank you for the reply!

- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-11-30 15:20:18
-
-

*Thread Reply:* 🙏 @Maciej Obuchowski we're not worthy! That was the magic we needed. Seems like a hack since we're snooping in on private classes but if it works...

- -

Thank you so much for pointing to those utilities!

- - - -
- ❤️ Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-30 15:48:25
-
-

*Thread Reply:* Glad I could help!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-30 19:43:03
-
-

A colleague pointed me at https://open-metadata.org/, is there anywhere a view or comparison of this and openlineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-12-01 08:51:28
-
-

*Thread Reply:* Different concepts. OL is focused on describing the lineage and metadata of the running jobs. So it keeps track of all the metadata (schema, ...) of inputs and outputs at the time transformation occurs + transformation metadata (code version, cost, etc.)

- -

OM I am not an expert but it's a metadata model with clients and API around it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
RamanD - (romantanzar@gmail.com) -
-
2021-12-01 12:33:51
-
-

Hey! OpenLineage is a beautiful initiative, to be honest! We also try to accommodate it. One question, maybe it's already described somewhere then many apologies :) if we need to propagate run id from Airflow to a child task (AWS Batch job, for instance) what will be the best way to do it in the current realization (as we get run id only at post execute phase)?.. We use Airflow 2+ integration.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-01 12:40:53
-
-

*Thread Reply:* Hey. For technical reasons, we can't automatically register macro that does this job, as we could in Airflow 1 integration. You could put it yourself:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-01 12:41:02
-
-

*Thread Reply:* ```def lineageparentid(run_id, task): - """ - Macro function which returns the generated job and run id for a given task. This - can be used to forward the ids from a task to a child run so the job - hierarchy is preserved. Child run can create ParentRunFacet from those ids. - Invoke as a jinja template, e.g.

- -
PythonOperator(
-    task_id='render_template',
-    python_callable=my_task_function,
-    op_args=['{{ lineage_parent_id(run_id, task) }}'], # lineage_run_id macro invoked
-    provide_context=False,
-    dag=dag
-)
-
-:param run_id:
-:param task:
-:return:
-"""
-with create_session() as session:
-    job_name = openlineage_job_name(task.dag_id, task.task_id)
-    ids = JobIdMapping.get(job_name, run_id, session)
-    if ids is None:
-        return ""
-    elif isinstance(ids, list):
-        run_id = "" if len(ids) == 0 else ids[0]
-    else:
-        run_id = str(ids)
-    return f"{_DAG_NAMESPACE}/{job_name}/{run_id}"
-
- -

def openlineagejobname(dagid: str, taskid: str) -> str: - return f'{dagid}.{taskid}'```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-01 12:41:13
-
-

*Thread Reply:* from here: https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/openlineage/airflow/dag.py#L77

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
RamanD - (romantanzar@gmail.com) -
-
2021-12-01 12:53:27
-
-

*Thread Reply:* the quickest response ever! And that works like a charm 🙌

- - - -
- 👍 Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-01 13:21:16
-
-

*Thread Reply:* Glad I could help!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-01 14:14:23
-
-

@Maciej Obuchowski and @Michael Collado given your work on the Spark Integration, what's the right way to explore the Write operations' logical plans? When doing a read, it's easy! In scala df.queryExecution.logical gives you exactly what you need but how do you guys interactively explore what sort of commands are being used during a write? We are exploring some of the DataSourceV2 data sources and are hoping to learn from you guys a bit more, please 😃

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-01 14:18:00
-
-

*Thread Reply:* For SQL, EXPLAIN EXTENDED and show() in scala-shell is helpful:

- -

spark.sql("EXPLAIN EXTENDED CREATE TABLE tbl USING delta LOCATION '/tmp/delta' AS SELECT ** FROM tmp").show(false) -```|== Parsed Logical Plan == -'CreateTableAsSelectStatement [tbl], delta, /tmp/delta, false, false -+- 'Project [**] - +- 'UnresolvedRelation [tmp], [], false

- -

== Analyzed Logical Plan ==

- -

CreateTableAsSelect org.apache.spark.sql.delta.catalog.DeltaCatalog@63c5b63a, default.tbl, [provider=delta, location=/tmp/delta], false -+- Project [x#12, y#13] - +- SubqueryAlias tmp - +- LocalRelation [x#12, y#13]

- -

== Optimized Logical Plan == -CreateTableAsSelect org.apache.spark.sql.delta.catalog.DeltaCatalog@63c5b63a, default.tbl, [provider=delta, location=/tmp/delta], false -+- LocalRelation [x#12, y#13]

- -

== Physical Plan == -AtomicCreateTableAsSelect org.apache.spark.sql.delta.catalog.DeltaCatalog@63c5b63a, default.tbl, LocalRelation [x#12, y#13], [provider=delta, location=/tmp/delta, owner=mobuchowski], [], false -+- LocalTableScan [x#12, y#13] -|```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-01 14:27:25
-
-

*Thread Reply:* For dataframe api, I'm usually just either logging plan to console from OpenLineage listener, or looking at sparklogicalPlan or sparkunknown facets send by listener - even when the particular write operation isn't supported by integration, those facets should have some relevant info.

- - - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-01 14:27:40
-
-

*Thread Reply:* For example, for the query I've send at comment above, the spark_logicalPlan facet looks like this:

- -

"spark.logicalPlan": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.4.0-SNAPSHOT/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>", - "plan": [ - { - "allowExisting": false, - "child": [ - { - "class": "org.apache.spark.sql.catalyst.plans.logical.LocalRelation", - "data": null, - "isStreaming": false, - "num-children": 0, - "output": [ - [ - { - "class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", - "dataType": "integer", - "exprId": { - "id": 2, - "jvmId": "e03e2860-a24b-41f5-addb-c35226173f7c", - "product-class": "org.apache.spark.sql.catalyst.expressions.ExprId" - }, - "metadata": {}, - "name": "x", - "nullable": false, - "num-children": 0, - "qualifier": [] - } - ], - [ - { - "class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", - "dataType": "integer", - "exprId": { - "id": 3, - "jvmId": "e03e2860-a24b-41f5-addb-c35226173f7c", - "product-class": "org.apache.spark.sql.catalyst.expressions.ExprId" - }, - "metadata": {}, - "name": "y", - "nullable": false, - "num-children": 0, - "qualifier": [] - } - ] - ] - } - ], - "class": "org.apache.spark.sql.execution.command.CreateViewCommand", - "name": { - "product-class": "org.apache.spark.sql.catalyst.TableIdentifier", - "table": "tmp" - }, - "num-children": 0, - "properties": null, - "replace": true, - "userSpecifiedColumns": [], - "viewType": { - "object": "org.apache.spark.sql.catalyst.analysis.LocalTempView$" - } - } - ] - },

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-01 14:38:55
-
-

*Thread Reply:* Okay! That is very helpful! I wasn't sure if there was a fancier trick but I can definitely do logging 🙂 Our challenge was that our proprietary packages were resulting in Null Pointer Exceptions when it tried to push to OpenLineage 😞

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-01 14:39:02
-
-

*Thread Reply:* Thank you as usual!!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-01 14:40:25
-
-

*Thread Reply:* You can always add test cases and add breakpoints to debug in your IDE. That doesn't work for the container tests, but it does work for the other ones

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-01 14:47:20
-
-

*Thread Reply:* Ah! That's a great point! I definitely would appreciate being able to poke at the objects interactively in a debug mode. Thank you for the guidance as well!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ricardo Gaspar - (ricardogaspar2@gmail.com) -
-
2021-12-03 11:49:10
-
-

hi everyone! 👋 -Very noob question here: I’ve been wanting to play with Marquez and open lineage for my company’s projects. I use mostly scala & spark, but also Airflow. -I’ve been reading and watching talks about OpenLineage and Marquez. -So far i didn’t quite discover if Marquez or OpenLineage does field-level lineage (with Spark), like spline tries to.

- -

Any idea?

- -

Other sources about this topic -• https://medium.com/cdapio/data-integration-with-field-level-lineage-5d9986524316 -• https://medium.com/cdapio/field-level-lineage-part-1-3cc5c9e1d8c6 -• https://medium.com/cdapio/designing-field-level-lineage-part-2-b6c7e6af5bf4 -• https://www.youtube.com/playlist?list=PL897MHVe_nHeEQC8UnCfXecmZdF0vka_T -• https://www.youtube.com/watch?v=gKYGKXIBcZ0 -• https://www.youtube.com/watch?v=eBep6rRh7ic

-
-
Medium
- - - - - - -
-
Reading time
- 6 min read -
- - - - - - - - - - - - -
-
-
Medium
- - - - - - -
-
Reading time
- 6 min read -
- - - - - - - - - - - - -
-
-
Medium
- - - - - - -
-
Reading time
- 10 min read -
- - - - - - - - - - - - -
-
-
YouTube
- - - - - - - - - - - - - - - - - -
-
-
YouTube
- -
- - - } - - CDAP - (https://www.youtube.com/c/CDAPio) -
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Francis McGregor-Macdonald -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-12-03 11:55:17
-
-

*Thread Reply:* Hi Ricardo - OpenLineage doesn’t currently have support for field-level lineage, but it’s definitely something we’ve been looking into. This is a great collection of resources 🙂

- -

We’ve to-date been working on our integrations library, making it as easy to set up as possible.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ricardo Gaspar - (ricardogaspar2@gmail.com) -
-
2021-12-03 12:01:25
-
-

*Thread Reply:* Thanks John! I was checking the issues on github and other posts here. Just wanted to clarify that. -I’ll keep an eye on it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-12-06 20:25:19
-
-

The next OpenLineage monthly meeting is this Wednesday at 9am PT. (everybody is welcome to join) -The slides are here: https://docs.google.com/presentation/d/1q2Be7WTKlIhjLPgvH-eXAnf5p4w7To9v/edit#slide=id.ge4b57c6942_0_75 -tentative agenda: -• SPDX headers [Mandy Chessel] -• Azure Purview + OpenLineage [Will Johnson, Mark Taylor] -• Logging backend (OpenTelemetry, ...) [Julien Le Dem] -• Open discussion -Please chime in in this thread if you’d want to add something

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-12-06 20:28:09
-
-

*Thread Reply:* The link to join the meeting is on the wiki: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-12-06 20:28:25
-
-

*Thread Reply:* Please reach out to me if you’d like to be added to a gcal invite

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-12-06 22:37:29
-
-

@John Thomas we in Condenast currently exploring the features of open lineage to integrate to databricks , https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark/databricks , spark configuration not working ,

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-08 02:03:37
-
-

*Thread Reply:* Hi Dinakar. Can you give some specifics regarding what kind of problem you're running into?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-12-09 10:15:50
-
-

*Thread Reply:* Hi @Michael Collado, were able to set the spark configuration for spark extra listener & placed jars as well , wen i ran the sapark job , Lineage is not get tracked into the marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-12-09 10:34:39
-
-

*Thread Reply:* {"producer":"https://github.com/OpenLineage/OpenLineage/tree/0.3.1/integration/spark","schemaURL":"https://github.com/OpenLineage/OpenLineage/tree/0.3.1/integration/spark/facets/spark/v1/output-statistics-facet.json","rowCount":0,"size":-1,"status":"DEPRECATED"}},"outputFacets":{"outputStatistics":{"producer":"https://github.com/OpenLineage/OpenLineage/tree/0.3.1/integration/spark","schemaURL":"https://openlineage.io/spec/facets/1-0-0/OutputStatisticsOutputDatasetFacet.json#/$defs/OutputStatisticsOutputDatasetFacet","rowCount":0,"size":-1}}}],"producer":"https://github.com/OpenLineage/OpenLineage/tree/0.3.1/integration/spark","schemaURL":"https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunEvent"} -OpenLineageHttpException(code=0, message=java.lang.IllegalArgumentException: Cannot construct instance of io.openlineage.spark.agent.client.HttpError (although at least one Creator exists): no String-argument constructor/factory method to deserialize from String value ('{"code":404,"message":"HTTP 404 Not Found"}') - at [Source: UNKNOWN; line: -1, column: -1], details=java.util.concurrent.CompletionException: java.lang.IllegalArgumentException: Cannot construct instance of io.openlineage.spark.agent.client.HttpError (although at least one Creator exists): no String-argument constructor/factory method to deserialize from String value ('{"code":404,"message":"HTTP 404 Not Found"}') - at [Source: UNKNOWN; line: -1, column: -1]) - at io.openlineage.spark.agent.OpenLineageContext.emit(OpenLineageContext.java:48) - at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:122) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$onJobStart$3(OpenLineageSparkListener.java:159) - at java.util.Optional.ifPresent(Optional.java:159) - at io.openlineage.spark.agent.OpenLineageSparkListener.onJobStart(OpenLineageSparkListener.java:148) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:119) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:103) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1585) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-12-09 13:29:42
-
-

*Thread Reply:* Issue solved , mentioned the version wrongly as 1 instead v1

- - - -
- 🙌 Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jitendra Sharma - (jitendra_sharma@condenast.com) -
-
2021-12-07 02:07:06
-
-

👋 Hi everyone!

- - - -
- 👋 Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
kavuri raghavendra - (kavuri.raghavendra@gmail.com) -
-
2021-12-08 05:37:44
-
-

Hello Everyone.. we are exploring Openlineage for capturing Spark lineage.. but form the GitHub(https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark) ..I see that the output send to API (Marquez).. how can I send it to Kafka topic.. can some body please guide me on this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2021-12-08 12:15:38
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/400/files

- -

there’s ongoing PR for proxy backend, which opens http API and redirects events to Kafka.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-12-08 12:17:38
-
-

*Thread Reply:* Hi Kavuri, as minkyu said, there's currently work going on to simplify this process.

- -

For now, you'll need to make something to capture the HTTP api events and send them to the Kafka topic. Changing the spark.openlineage.url parameter will send the runEvents wherever you like, but obviously you can't directly produce HTTP events to a topic

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
kavuri raghavendra - (kavuri.raghavendra@gmail.com) -
-
2021-12-08 22:13:09
-
-

*Thread Reply:* Many Thanks for the Reply.. As I understand, currently pushing lineage to kafka topic is not yet there. it is under implementation. If you can help me out in understanding in which version it is going to be present, that will help me a lot. Thanks in advance.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2021-12-09 12:57:10
-
-

*Thread Reply:* Not sure about the release plan, but the http endpoint is just regular RESTful API, and you will be able to write a super simple proxy for your own use case if you want.

- - - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-12 00:13:54
-
-

Hi, Open Lineage team - For the Spark Integration, I'm looking to extract information from a DataSourceV2 data source.

- -

I'm working on the WRITE side of the data source and right now I'm touching the AppendData logical plan (I can't find the Java Doc): https://github.com/rdblue/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L446

- -

I was able to extract out the table name (from the named relation) but I'm struggling getting out the schema next.

- -

I noticed that the AppendData offers inputSet, schema, and outputSet. -• inputSet gives me an AttributeSet which does contain the names of my columns (https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala#L69) -• schema returns an empty StructType -• outputSet is an empty AttributeSet -I thought I read in the Spark Internals book that outputSet would only be populated if there was some sort of change to the DataFrame columns but I cannot find that page and searching for spark outputSet turns up few relevant results.

- -

Has anyone else worked with the AppendData plan and gotten the schema out of it? Am I going down the wrong path with this snippet of code below? Thank you for any guidance!

- -

if (logical instanceof AppendData) { - AppendData appendOp = (AppendData) logical; - NamedRelation namedRel = appendOp.table(); - <a href="http://log.info">log.info</a>(namedRel.name()); // Works great! - <a href="http://log.info">log.info</a>(appendOp.inputSet().toString());// This will get you a rough schema - StructType schema = appendOp.schema(); // This is an empty StructType - <a href="http://log.info">log.info</a>(schema.json()); // Nothing useful here - }

- -
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-12 07:34:13
-
-

*Thread Reply:* One thing, you're looking at Ryan's fork of Spark, which is few thousand commits behind head 🙂

- -

This one should be good: -https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala#L72

- -

About schema: looking at AppendData's query schema should work, if there's no change to columns, because to pass analysis, data being inserted have to match table's schema. I would test that though 🙂

- -

On the other hand, current AppendDataVisitor just looks at AppendData's table and tries to extract dataset from it using list of common output visitors:

- -

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/src/main/co[…]o/openlineage/spark/agent/lifecycle/plan/AppendDataVisitor.java

- -

In this case, the DataSourceV2RelationVisitor would look at it, provided we're using Spark 3:

- -

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/src/main/sp[…]ge/spark3/agent/lifecycle/plan/DataSourceV2RelationVisitor.java

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-12 07:37:04
-
-

*Thread Reply:* In this case, we basically need more info about nature of this DataSourceV2Relation, because this is provider-dependent. We have Iceberg in main branch and Delta here: https://github.com/OpenLineage/OpenLineage/pull/393/files#diff-7b66a9bd5905f4ba42914b73a87d834c1321ebcf75137c1e2a2413c0d85d9db6

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-13 14:54:13
-
-

*Thread Reply:* Ah! Maciej! As always, thank you! Looking through the DataSourceV2RelationVisitor you provided, it looks like the connector (Azure Cosmos Db) doesn't provide that Provider property 😞 😞 😞

- -

Is there any other method for determining the type of DataSourceV2Relation?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-13 14:57:06
-
-

*Thread Reply:* And, to make sure I close out on my original question, it was as simple as the code that Maciej was using:

- -

I merely needed to use DataSourceV2Realtion rather than NamedRelation!

- -

DataSourceV2Relation relation = (DataSourceV2Relation)appendOp.table(); - <a href="http://log.info">log.info</a>(relation.schema().toString()); - <a href="http://log.info">log.info</a>(relation.name());

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-15 06:20:31
-
-

*Thread Reply:* Are we talking about this connector? https://github.com/Azure/azure-sdk-for-java/blob/934200f63dc5bc7d5502a95f8daeb8142[…]/src/main/scala/com/azure/cosmos/spark/ItemsReadOnlyTable.scala

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-15 06:22:05
-
-

*Thread Reply:* I guess you can use object.getClass.getCanonicalName() to find if the passed class matches the one that Cosmos provider uses.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-15 09:53:24
-
-

*Thread Reply:* Yes! That's the one, Maciej! I will give getCanonicalName a try but also make a PR into that repo to get the provider property set up correctly 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-15 09:53:28
-
-

*Thread Reply:* Thank you so much!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-15 10:09:39
-
-

*Thread Reply:* Glad to help 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-15 10:22:58
-
-

*Thread Reply:* @Will Johnson could you tell on which commands from https://github.com/OpenLineage/OpenLineage/issues/368#issue-1038510649 you'll be working?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-15 10:24:14
-
-

*Thread Reply:* If any, of course 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-15 10:49:31
-
-

*Thread Reply:* From all of our tests on that Cosmos connector, it looks like it strictly uses athe AppendData operation. However @Harish Sune is looking at more of these commands from a Delta data source.

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-22 22:43:34
-
-

*Thread Reply:* Just to close the loop on this one - I submitted a PR for the work we've been doing. Looking forward to any feedback! https://github.com/OpenLineage/OpenLineage/pull/450

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-23 05:04:36
-
-

*Thread Reply:* Thanks @Will Johnson! I added one question about dataset naming.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-14 19:45:59
-
-

Finally got this doc posted - https://github.com/OpenLineage/OpenLineage/pull/437 (see the readable version here ) -Looking for feedback, @Willy Lulciuc @Maciej Obuchowski @Will Johnson

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-15 10:54:41
-
-

*Thread Reply:* Yes! This is awesome!! How might this work for an existing command like the DataSourceV2Visitor.

- -

Right now, OpenLineage checks based on the provider property if it's an Iceberg or Delta provider.

- -

Ideally, we'd be able to extend the list of providers or have a custom "CosmosDbDataSourceV2Visitor" that knew how to work with a custom DataSourceV2.

- -

Would that cause any conflicts if the base class is already accounted for in OpenLineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-15 11:13:20
-
-

*Thread Reply:* Resolving this would be nice addition to the doc (and, to the implementation) - currently, we're just returning result of first function for which isDefinedAt is satisfied.

- -

This means, that we can depend on the order of the visitors...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-15 13:59:12
-
-

*Thread Reply:* great question. For posterity, I'd like to move this to the PR discussion. I'll address the question there.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-14 19:50:57
-
-

Oh, and I forgot to post yesterday -OpenLineage 0.4.0 was released 🥳

- -

This was a big one. -• Split tests for Spark 2 and Spark 3 -• Spark output metrics -• Databricks support with init scripts -• Initial Iceberg support for Spark -• Initial Kafka support for Spark -• dbt build support -• forward compatibility for dbt versions -• lots of bug fixes 🙂 -Check the full changelog for details

- - - -
- 🙌 Maciej Obuchowski, Will Johnson, Peter Hicks, Manuel, Peter Hanssens -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-12-14 21:42:40
-
-

Hi @Michael Collado is there any documentation on using great expectations with open lineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-15 11:50:47
-
-

*Thread Reply:* hmm, actually the only documentation we have right now is on the demo.datakin.com site https://demo.datakin.com/onboarding . The great expectations tab should be enough to get you started

-
-
demo.datakin.com
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-15 11:51:04
-
-

*Thread Reply:* I'll open a ticket to copy that documentation to the OpenLineage site repo

- - - -
- 👍 Madhu Maddikera, Dinakar Sundar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Carlos Meza - (omar.m.8x@gmail.com) -
-
2021-12-15 09:52:51
-
-

Hello ! I am new on OpenLineage , awesome project !! ; anybody knows about integration with Deequ ? Or a way to capture dataset stats with openlineage ? Thanks ! Appreciate the help !

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-15 19:01:50
-
-

*Thread Reply:* Hi! We don't have any integration with deequ yet. We have a structure for recording data quality assertions and statistics, though - see https://github.com/OpenLineage/OpenLineage/blob/main/spec/facets/DataQualityAssertionsDatasetFacet.json and https://github.com/OpenLineage/OpenLineage/blob/main/spec/facets/DataQualityMetricsInputDatasetFacet.json for the specs.

- -

Check the great expectations integration to see how those facets are being used

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno González - (brugms2@gmail.com) -
-
2022-05-24 06:20:50
-
-

*Thread Reply:* This is great. Thanks @Michael Collado!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anatoliy Zhyzhkevych - (Anatoliy.Zhyzhkevych@franklintempleton.com) -
-
2021-12-19 22:40:33
-
-

Hi,

- -

I am testing Open Lineage/Marquez 0.4.0 with dbt 1.0.0 using dbt-ol build -It seems 12 events were generated but UI shows only history of runs with "Nothing to show here" in detail section about datasets/tests failures in dbt namespace. -The warehouse namespace shows lineage but no details about dataset/test failures .

- -

Please advice.

- -

02:57:54 Done. PASS=4 WARN=0 ERROR=3 SKIP=2 TOTAL=9 -02:57:54 Error sending message, disabling tracking -Emitting OpenLineage events: 100%|██████████████████████████████████████████████████████| 12/12 [00:00<00:00, 12.50it/s]

- -
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-20 04:15:51
-
-

*Thread Reply:* This is nothing to show here when you click on test node, right? What about run node?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anatoliy Zhyzhkevych - (Anatoliy.Zhyzhkevych@franklintempleton.com) -
-
2021-12-20 12:28:21
-
-

*Thread Reply:* There is no details about failure.

- -

```dbt-ol build -t DEV --profile cdp --profiles-dir /c/Work/dbt/cdp100/profiles --project-dir /c/Work/dbt/cdp100 --select +riskrawmastersharedshareclass -Running OpenLineage dbt wrapper version 0.4.0 -This wrapper will send OpenLineage events at the end of dbt execution. -02:57:21 Running with dbt=1.0.0 -02:57:23 [WARNING]: Configuration paths exist in your dbtproject.yml file which do not apply to any resources. -There are 1 unused configuration paths:

  • models.cdp.risk.raw.liquidity.shared
  • -
- -

02:57:23 Found 158 models, 181 tests, 0 snapshots, 0 analyses, 574 macros, 0 operations, 2 seed files, 56 sources, 1 exposure, 0 metrics -02:57:23 -02:57:35 Concurrency: 10 threads (target='DEV') -02:57:35 -02:57:35 1 of 9 START test dbtexpectationssourceexpectcompoundcolumnstobeuniquebsesharedpbshareclassEDMPORTFOLIOIDSHARECLASSCODEanyvalueismissingDELETEDFLAGFalse [RUN] -02:57:37 1 of 9 PASS dbtexpectationssourceexpectcompoundcolumnstobeuniquebsesharedpbshareclassEDMPORTFOLIOIDSHARECLASSCODEanyvalueismissingDELETEDFLAGFalse [PASS in 2.67s] -02:57:37 2 of 9 START view model REPL.SHARECLASSDIM.................................... [RUN] -02:57:39 2 of 9 OK created view model REPL.SHARECLASSDIM............................... [SUCCESS 1 in 2.12s] -02:57:39 3 of 9 START test dbtexpectationsexpectcompoundcolumnstobeuniquerawreplpbsharedshareclassRISKPORTFOLIOIDSHARECLASSCODEanyvalueismissingDELETEDFLAGFalse [RUN] -02:57:43 3 of 9 PASS dbtexpectationsexpectcompoundcolumnstobeuniquerawreplpbsharedshareclassRISKPORTFOLIOIDSHARECLASSCODEanyvalueismissingDELETEDFLAGFalse [PASS in 3.42s] -02:57:43 4 of 9 START view model RAWRISKDEV.STG.SHARECLASSDIM........................ [RUN] -02:57:46 4 of 9 OK created view model RAWRISKDEV.STG.SHARECLASSDIM................... [SUCCESS 1 in 3.44s] -02:57:46 5 of 9 START view model RAWRISKDEV.MASTER.SHARECLASSDIM..................... [RUN] -02:57:46 6 of 9 START test relationshipsriskrawstgsharedshareclassRISKINSTRUMENTIDRISKINSTRUMENTIDrefriskrawstgsharedsecurity_ [RUN] -02:57:46 7 of 9 START test relationshipsriskrawstgsharedshareclassRISKPORTFOLIOIDRISKPORTFOLIOIDrefriskrawstgsharedportfolio_ [RUN] -02:57:51 5 of 9 ERROR creating view model RAWRISKDEV.MASTER.SHARECLASSDIM............ [ERROR in 4.31s] -02:57:51 8 of 9 SKIP test relationshipsriskrawmastersharedshareclassRISKINSTRUMENTIDRISKINSTRUMENTIDrefriskrawmastersharedsecurity_ [SKIP] -02:57:51 9 of 9 SKIP test relationshipsriskrawmastersharedshareclassRISKPORTFOLIOIDRISKPORTFOLIOIDrefriskrawmastersharedportfolio_ [SKIP] -02:57:52 7 of 9 FAIL 7282 relationshipsriskrawstgsharedshareclassRISKPORTFOLIOIDRISKPORTFOLIOIDrefriskrawstgsharedportfolio_ [FAIL 7282 in 5.41s] -02:57:54 6 of 9 FAIL 6520 relationshipsriskrawstgsharedshareclassRISKINSTRUMENTIDRISKINSTRUMENTIDrefriskrawstgsharedsecurity_ [FAIL 6520 in 7.23s] -02:57:54 -02:57:54 Finished running 6 tests, 3 view models in 30.71s. -02:57:54 -02:57:54 Completed with 3 errors and 0 warnings: -02:57:54 -02:57:54 Database Error in model riskrawmastersharedshareclass (models/risk/raw/master/shared/riskrawmastersharedshareclass.sql) -02:57:54 002003 (42S02): SQL compilation error: -02:57:54 Object 'RAWRISKDEV.AUDIT.STGSHARECLASSDIMRELATIONSHIPRISKINSTRUMENTID' does not exist or not authorized. -02:57:54 compiled SQL at target/run/cdp/models/risk/raw/master/shared/riskrawmastersharedshareclass.sql -02:57:54 -02:57:54 Failure in test relationshipsriskrawstgsharedshareclassRISKPORTFOLIOIDRISKPORTFOLIOIDrefriskrawstgsharedportfolio (models/risk/raw/stg/shared/riskrawstgsharedschema.yml) -02:57:54 Got 7282 results, configured to fail if != 0 -02:57:54 -02:57:54 compiled SQL at target/compiled/cdp/models/risk/raw/stg/shared/riskrawstgsharedschema.yml/relationshipsriskrawstgsha19e10fb324f7d0cccf2aab512683f693.sql -02:57:54 -02:57:54 Failure in test relationshipsriskrawstgsharedshareclassRISKINSTRUMENTIDRISKINSTRUMENTID_refriskrawstgsharedsecurity_ (models/risk/raw/stg/shared/riskrawstgsharedschema.yml) -02:57:54 Got 6520 results, configured to fail if != 0 -02:57:54 -02:57:54 compiled SQL at target/compiled/cdp/models/risk/raw/stg/shared/riskrawstgsharedschema.yml/relationshipsriskrawstgsha_e3148a1627817f17f7f5a9eb841ef16f.sql -02:57:54 -02:57:54 See test failures:

- -
- -

select ** from RAWRISKDEV.AUDIT.STGSHARECLASSDIMrelationship_RISKINSTRUMENT_ID

- -
- -

02:57:54 -02:57:54 Done. PASS=4 WARN=0 ERROR=3 SKIP=2 TOTAL=9 -02:57:54 Error sending message, disabling tracking -Emitting OpenLineage events: 100%|██████████████████████████████████████████████████████| 12/12 [00:00<00:00, 12.50it/s]Emitted 14 openlineage events -(dbt) linux@dblnbk152371:/c/Work/dbt/cdp$```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-20 12:30:20
-
-

*Thread Reply:* I'm talking on clicking on non-test node in Marquez UI - the screenshots shared show you clicked on the one ending in test

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anatoliy Zhyzhkevych - (Anatoliy.Zhyzhkevych@franklintempleton.com) -
-
2021-12-20 16:46:11
-
-

*Thread Reply:* There are two types of failures: tests failed on stage model (relationships) and physical error in master model (no table with such name). The stage test node in Marquez does not show any indication of failures and dataset node indicates failure but without number of failed records or table name for persistent test storage. The failed master model shows in red but no details of failure. Master model tests were skipped because of model failure but UI reports "Complete".

- -
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-20 18:11:50
-
-

*Thread Reply:* If I understood correctly, for model you would like OpenLineage to capture message error, like this one -22:52:07 Database Error in model customers (models/customers.sql) -22:52:07 Syntax error: Expected "(" or keyword SELECT or keyword WITH but got identifier "PLEASE_REMOVE" at [56:12] -22:52:07 compiled SQL at target/run/jaffle_shop/models/customers.sql -And for dbt test failures, to visualize better that error is happening, for example like that:

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-20 18:23:12
-
-

*Thread Reply:* We actually do the first one for Airflow and Spark, I've missed it for dbt 😞

- -

Created issue to add it to spec in a generic way: -https://github.com/OpenLineage/OpenLineage/issues/446

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anatoliy Zhyzhkevych - (Anatoliy.Zhyzhkevych@franklintempleton.com) -
-
2021-12-20 22:49:54
-
-

*Thread Reply:* Sounds great. Failed/Skipped Tests/Models could be color-coded as well. Thanks.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorge Reyes (Zenta Group) - (jorge.reyes@zentagroup.com) -
-
2021-12-22 12:37:00
-
-

hello everyone , i'm learning Openlineage, I am trying to connect with airflow 2, is it possible? or that version is not yet released. this is currently throwing me airflow

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-22 12:38:26
-
-

*Thread Reply:* Hey. If you're using Airflow 2, you should use LineageBackend method described here: https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow#airflow-21-experimental

- - - -
- 🙌 Jorge Reyes (Zenta Group) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-22 12:39:06
-
-

*Thread Reply:* You don't need to do anything with DAG import then.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorge Reyes (Zenta Group) - (jorge.reyes@zentagroup.com) -
-
2021-12-22 12:40:30
-
-

*Thread Reply:* Thanks!!!!! i'll try

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-27 16:49:20
-
-

The PR at https://github.com/OpenLineage/OpenLineage/pull/451 should be everything needed to complete the implementation for https://github.com/OpenLineage/OpenLineage/pull/437 . The PR is in draft mode, as I still need ~1 day to update the integration test expectations to match the refactoring (there are some new events, but from my cursory look, the old events still match expected contents). But I think it's in a state that can be reviewed before the tests are updated.

- -

There are two other PRs that this one is based on - broken up for easier reviewing -• https://github.com/OpenLineage/OpenLineage/pull/447 -• https://github.com/OpenLineage/OpenLineage/pull/448

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-27 16:49:56
-
-

*Thread Reply:* @Will Johnson @Maciej Obuchowski FYI 👆

- - - -
- 🙌 Will Johnson, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-01-07 15:25:11
-
-

The next OpenLineage Technical Steering Committee meeting is Wednesday, January 12! Meetings are on the second Wednesday of each month from 9:00 to 10:00am PT.  -Join us on Zoom: https://us02web.zoom.us/j/81831865546?pwd=RTladlNpc0FTTDlFcWRkM2JyazM4Zz09 -All are welcome. -Agenda: -• OpenLineage 0.4 and 0.5 releases -• Egeria version 3.4 support for OpenLineage -• Airflow TaskListener to simplify OpenLineage integration [Maciej] -• Open Discussion -Notes: https://tinyurl.com/openlineagetsc

- - - -
- 🙌 Maciej Obuchowski, Ross Turk, John Thomas, Minkyu Park, Joshua Wankowski, Dalin Kim -
- -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2022-01-11 12:16:09
-
-

Hello community,

- -

We are able to post this datasource in marquez. But then the information about the facet with the datasource is not displayed in the UI.

- -

We want to display the S3 location (URI) where this datasource is pointing to. -{ - id: { - namespace: "<s3://hbi-dns-staging>", - name: "PCHG" - }, - type: "DB_TABLE", - name: "PCHG", - physicalName: "PCHG", - createdAt: "2022-01-11T16:15:54.887Z", - updatedAt: "2022-01-11T16:56:04.093153Z", - namespace: "<s3://hbi-dns-staging>", - sourceName: "<s3://hbi-dns-staging>", - fields: [], - tags: [], - lastModifiedAt: null, - description: null, - currentVersion: "c565864d-1a66-4cff-a5d9-2e43175cbf88", - facets: { - dataSource: { - uri: "<s3://hbi-dns-staging/sql-runner/2022-01-11/PCHG.avro>", - name: "<s3://hbi-dns-staging>", - _producer: "<a href="http://ip-172-25-23-163.dir.prod.aws.hollandandbarrett.comeu-west-1.com/172.25.23.163">ip-172-25-23-163.dir.prod.aws.hollandandbarrett.comeu-west-1.com/172.25.23.163</a>", - _schemaURL: "<https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet>" - } - } -}

- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2022-01-11 12:24:00
-
-

As you see there is no much info in openlineage UI

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-01-11 13:02:16
-
-

The OpenLineage TSC meeting is tomorrow! https://openlineage.slack.com/archives/C01CK9T7HKR/p1641587111000700

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-01-12 11:59:44
-
-

*Thread Reply:* ^ It’s happening now!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2022-01-14 06:46:44
-
-

any idea guys about the previous question?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2022-01-18 14:19:39
-
-

*Thread Reply:* Just to be clear, were you able to get a datasource information from API but just now showing up in the UI? Or you weren’t able to get it from API too?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2022-01-17 03:41:56
-
-

Hi everyone !! I am doing POC of OpenLineage with Airflow version 2.1, before that would like to know, if this version is supported by OpenLineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-01-18 11:40:00
-
-

*Thread Reply:* It does generally work, but, there's a known limitation in that only successful task runs are reported to the lineage backend. This is planned to be fixed in Airflow 2.3.

- - - -
- ✅ SAM -
- -
- ❤️ Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2022-01-18 20:35:52
-
-

*Thread Reply:* thank you. 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2022-01-17 06:47:54
-
-

Hello there, I’m using docker Airflow version 2.1.0 , below were the steps I performed but I encountered error, pls help:

- -
  1. Inside requirements.txt file i added openlineage-airflow . Then ran pip install -r requirements.txt .
  2. Added environmental variable using this command -export AIRFLOW__LINEAGE__BACKEND = openlineage.lineage_backend.OpenLineageBackend
  3. Then configured HTTP Backend environment variables inside “airflow” folder: -export OPENLINEAGE_URL=<http://marquez:5000>
  4. Ran Marquez using ./docker/up.sh & open web frontend UI and saw below error msg:
  5. -
- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-01-18 11:30:38
-
-

*Thread Reply:* hey, I'm aware of one small bug ( which will be fixed in the upcoming OpenLineage 0.5.0 ) which means you would also have to include google-cloud-bigquery in your requirements.txt. This is the bug: https://github.com/OpenLineage/OpenLineage/issues/438

- - - -
- ✅ SAM -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-01-18 11:31:51
-
-

*Thread Reply:* The other thing I think you should check is, did you def define the AIRFLOW__LINEAGE__BACKEND variable correctly? What you pasted above looks a little odd with the 2 = signs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-01-18 11:34:25
-
-

*Thread Reply:* I'm looking a task log inside my own Airflow and I see msgs like: -INFO - Constructing openlineage client to send events to

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-01-18 11:34:47
-
-

*Thread Reply:* ^ i.e. I think checking the task logs you can see if it's at least attempting to send data

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-01-18 11:34:52
-
-

*Thread Reply:* hope this helps!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2022-01-18 20:40:37
-
-

*Thread Reply:* Thank you, will try again.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-01-18 20:10:25
-
-

Just published OpenLineage 0.5.0 . Big items here are -• dbt-spark support -• New proxy message broker for forwarding OpenLineage messages to Kafka -• New extensibility API for Spark integration -Accompanying tweet thread on the latter two items here: https://twitter.com/PeladoCollado/status/1483607050953232385

- - - -
- 🙌 Maciej Obuchowski, Kevin Mellott -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-01-19 12:39:30
-
-

*Thread Reply:* BTW, this was actually the 0.5.1 release. Because, pypi... 🤷‍♂️:skintone4:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2022-01-27 06:45:08
-
-

*Thread Reply:* nice on the dbt-spark support 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:12:14
-
-

HELLO everyone . I’ve been reading and watching talks about OpenLineage and Marquez . this solution is exactly what we been looking to lineage our etls . GREAT WORK . our etls based on postgres redshift and airflow. SO

- -

I tried to implement the example respecting all the steps required. everything runs successfully (the two dags on airflow ) on host http://localhost:3000/ but nothing appeared on marquez ui . am i missing something ? .

- -

I’am thinking about create a simple etl pandas to a pandas with some transformation . Like to have a poc to show it to my team . I REALLY NEED SOME HELP

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-01-19 11:13:35
-
-

*Thread Reply:* Are you using docker on mac with "Use Docker Compose V2" enabled?

- -

We've just found yesterday that it somehow breaks our example...

- - - -
- ✅ Mohamed El IBRAHIMI -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:14:51
-
-

*Thread Reply:* yes i just installed docker on mac

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:15:02
-
-

*Thread Reply:* and docker compose version 1.29.2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-01-19 11:20:24
-
-

*Thread Reply:* What you can do is to uncheck this, do docker system prune -a and try again.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:21:56
-
-

*Thread Reply:* done but i get this : Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-01-19 11:22:15
-
-

*Thread Reply:* Try to restart docker for mac

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-01-19 11:23:00
-
-

*Thread Reply:* It needs to show Docker Desktop is running :

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:24:01
-
-

*Thread Reply:* yeah done . I will try to implement the example again and see thank you very much

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:32:55
-
-

*Thread Reply:* i dont why i getting this when i $ docker-compose up :

- -

WARNING: The TAG variable is not set. Defaulting to a blank string. -WARNING: The APIPORT variable is not set. Defaulting to a blank string. -WARNING: The APIADMINPORT variable is not set. Defaulting to a blank string. -WARNING: The WEBPORT variable is not set. Defaulting to a blank string. -ERROR: The Compose file ‘./../docker-compose.yml’ is invalid because: -services.api.ports contains an invalid type, it should be a number, or an object -services.api.ports contains an invalid type, it should be a number, or an object -services.web.ports contains an invalid type, it should be a number, or an object -services.api.ports value [‘:’, ‘:’] has non-unique elements

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-01-19 11:46:12
-
-

*Thread Reply:* are you running it exactly like here, with respect to directories, etc?

- -

https://github.com/MarquezProject/marquez/tree/main/examples/airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:59:36
-
-

*Thread Reply:* yeah yeah my bad . every things work fine know . I see the graph in the ui

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 12:04:01
-
-

*Thread Reply:* one more question plz . As i said our etls based on postgres redshift and airflow . any advice you have for us to integrate OL to our pipeline ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:12:17
-
-

thank you very much

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-01-19 17:29:51
-
-

I’m upgrading our OL Java client from an older version (0.2.3) and noticed that the ol.newCustomFacetBuilder() method to create custom facets no longer exists. I can see in this code diff that it might be replaced by simply adding to the additional properties of the standard element you are extending.

- -

Can you please let me know if I’m understanding this change correctly? In other words, is the code in the diff functionally equivalent or is there a large change I should be understanding better?

- -

https://github.com/OpenLineage/OpenLineage/compare/0.2.3...0.4.0#diff-f0381d7e68797d9ec60551c96897809072582350e1657d23425747358ec6e471L196

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-01-19 17:50:39
-
-

*Thread Reply:* Hi Kevin - to my understanding that's correct. Do you guys have a custom extractor using this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-01-19 20:49:49
-
-

*Thread Reply:* Thanks John! We have custom code emitting OL events within our ingestion pipeline and it includes a custom facet. I’ll refactor the code to the new format and should be good to go.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-01-21 00:34:37
-
-

*Thread Reply:* Just to follow up, this code update worked as expected and we are all good on the upgrade.

- - - -
- 👍 Minkyu Park, John Thomas, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2022-01-21 02:13:51
-
-

I’m not sure what went wrong, with Airflow docker, version 2.1.0 , below were the steps I performed but Marquez UI is showing no jobs, pls help:

- -
  1. requirements.txt i added openlineage-airflow==0.5.1 . Then ran pip install -r requirements.txt .
  2. Added environmental variable inside my airflow docker folder using this command: -export AIRFLOW__LINEAGE__BACKEND = openlineage.lineage_backend.OpenLineageBackend
  3. Then configured HTTP Backend environment variables inside same airflow docker folder: -export OPENLINEAGE_URL=<http://localhost:5000>
  4. Ran Marquez using ./docker/up.sh  which is in another folder, Front end UI is not showing any job, its empty:
  5. Attached in the airflow DAG log.
  6. -
- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-01-25 14:46:58
-
-

*Thread Reply:* Hm, that is odd. Usually there are a few lines in the DAG log from the OpenLineage bits. I’d expect to see something about not having an extractor for the operator you are using.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-01-25 14:47:53
-
-

*Thread Reply:* If you open a shell in your Airflow Scheduler container and check for the presence of AIRFLOW__LINEAGE__BACKEND is it properly set? Possible the env isn’t making it all the way there.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lena Kullab - (Lena.Kullab@storable.com) -
-
2022-01-21 13:38:37
-
-

Hi All,

- -

I am working on a POC of OpenLineage-Airflow integration and was attempting to get it configured with Amundsen (also working on a POC). Reading through the tutorial here https://openlineage.io/integration/apache-airflow/, under the Prerequisites section it says: -To use the OpenLineage Airflow integration, you'll need a running Airflow instance. You'll also need an OpenLineage compatible HTTP backend. -The example uses Marquez, but I was trying to figure out how to get it to send metadata to the Amundsen graph db backend. Does the Airflow integration only support configuration with an HTTP compatible backend?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-01-21 14:03:29
-
-

*Thread Reply:* Hi Lena! That’s correct, Openlineage is designed to send events to an HTTP backend. There’s a ticket on the future section of the roadmap to support pushing to Amundsen, but it’s not yet been worked on (Ref: Roadmap Issue #86)

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lena Kullab - (Lena.Kullab@storable.com) -
-
2022-01-21 14:08:35
-
-

*Thread Reply:* Thank you for the info!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
naman shaundik - (namanshaundik@gmail.com) -
-
2022-01-30 11:01:42
-
-

hi , i am completely new to openlineage and marquez, i have to integrate openlineage to my existing java project but i am completely confused on where to start, i have gone through documentation and all but i am not able to understand how to integrate openlineage using marquez http backend in my existing project. please someone help me. I may sound naive here but i am in dire need of help.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-01-30 12:37:39
-
-

*Thread Reply:* what do you mean by “Integrate Openlineage”?

- -

Can you give a little more information on what you’re trying to accomplish and what the existing project is?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
naman shaundik - (namanshaundik@gmail.com) -
-
2022-01-31 03:49:22
-
-

*Thread Reply:* I work in a datalake team and we are trying to implement data lineage property in our project using openlineage. our project basically keeps track of datasets coming from different sources(hive, redshift, elasticsearch etc.) and jobs.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-01-31 15:01:31
-
-

*Thread Reply:* Gotcha!

- -

Broadly speaking, all an integration needs to do is to send runEvents to Marquez.

- -

I'd start by understanding the OpenLineage data model, and then looking at your system to identify when / where runEvents should be sent from, and what information needs to be included.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
TJ Tang - (tj@tapdata.io) -
-
2022-02-15 15:28:03
-
-

*Thread Reply:* I suppose OpenLineage itself only defines the standard/protocol to design your data model. To be able to visualize/trace the lineage, you either have to implement your self with the standard data models or including Marquez in your project. You would need to use HTTP API to send lineage events from your Java project to Marquez in this case.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-16 11:17:13
-
-

*Thread Reply:* Exactly! This project also includes connectors for more common data tools (Airflow, dbt, spark, etc), but at it's core OpenLineage is a standard and protocol

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-02-02 19:55:13
-
-

The next OpenLineage Technical Steering Committee meeting is Wednesday, February 9. Meetings are on the second Wednesday of each month from 9:00 to 10:00am PT. -Join us on Zoom: https://us02web.zoom.us/j/81831865546?pwd=RTladlNpc0FTTDlFcWRkM2JyazM4Zz09 -All are welcome. Agenda items are always welcome, as well. Reply in thread with yours. -Current agenda: -• OpenLineage 0.5.1 release -• Apache Flink effort -• Dagster integration -• Open Discussion -Notes: https://tinyurl.com/openlineagetsc

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jensen Yap - (jensen@contxts.io) -
-
2022-02-03 00:33:45
-
-

Hi everybody!

- - - -
- 👋 Maciej Obuchowski, John Thomas -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-03 12:39:57
-
-

*Thread Reply:* Hello!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Albert Bikeev - (albert.bikeev@gmail.com) -
-
2022-02-04 09:36:46
-
-

Hi everybody! -Very cool initiative, thank you! Is there any traction on Apache Atlas integration? Is there some way to help you there?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-04 15:07:07
-
-

*Thread Reply:* Hey Albert! There aren't yet any issues or proposals around Apache Atlas yet, but that's definitely something you can help with!

- -

I'm not super familiar with Atlas, were you thinking in terms of enabling Atlas to receive runEvents from OpenLineage connectors?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Albert Bikeev - (albert.bikeev@gmail.com) -
-
2022-02-07 05:49:16
-
-

*Thread Reply:* Hi John! -Yes, exactly, it’d be nice to see Atlas as a receiver side of the OpenLineage events. Is there some guidelines on how to implement it? I guess we need OpenLineage-compatible server implementation so we could receive events and send them to Atlas, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-07 11:30:14
-
-

*Thread Reply:* exactly - This would be a change on the Atlas side. I’d start by opening an issue in the atlas repo about making an API endpoint that can receive OpenLineage events. -Marquez is our reference implementation of OpenLineage, so I’d look around in that repo to see how it’s been implemented :)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Albert Bikeev - (albert.bikeev@gmail.com) -
-
2022-02-07 11:50:27
-
-

*Thread Reply:* Got it, thanks! Did that: https://issues.apache.org/jira/browse/ATLAS-4550 -If it’d not get any traction we at New Work might contribute as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-07 11:56:09
-
-

*Thread Reply:* awesome! if you guys have any questions, reach out and I can get you in touch with some of the engineers on our end

- - - -
- 👍 Albert Bikeev -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-08 11:20:47
-
-

*Thread Reply:* @Albert Bikeev one minor thing that could be helpful: java OpenLineage library contains server model classes: https://github.com/OpenLineage/OpenLineage/pull/300#issuecomment-923489097

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Albert Bikeev - (albert.bikeev@gmail.com) -
-
2022-02-08 11:32:12
-
-

*Thread Reply:* Got it, thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Carlos Fernández Rodríguez - (jcfernandez@keedio.com) -
-
2022-05-04 11:12:23
-
-

*Thread Reply:* This is a quite old discussion, but isn't possible to use openlineage proxy to send json to kafka topic and let Atlas read that json without any modification? -It would be needed to create a new model for spark, other than https://github.com/apache/atlas/blob/release-2.1.0-rc3/addons/models/1000-Hadoop/1100-spark_model.json and upload it to atlas (what could be done with a call to the atlas Api) -Does it makes sense?

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Albert Bikeev -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-05-04 11:24:02
-
-

*Thread Reply:* @Juan Carlos Fernández Rodríguez - You still need to build a bridge between the OpenLineage Spec and the Apache Atlas entity JSON. So far, no one has contributed something like that to the open source community... yet!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Carlos Fernández Rodríguez - (jcfernandez@keedio.com) -
-
2022-05-04 14:24:28
-
-

*Thread Reply:* sorry for the ignorance, -But what is the purpose of the bridge?the comunicación with atlas should be done throw kafka, and that messages can be sent by the proxy. What are I missing?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-04 16:37:33
-
-

*Thread Reply:* "bridge" in this case refers to a service of some sort that converts from OpenLineage run event to Atlas entity JSON, since there's currently nothing that will do that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-05-19 09:08:23
-
-

*Thread Reply:* If OpenLineage send an event to kafka, I think we can use kafka stream or kafka connect to rebuild message to atlas event.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-05-19 09:11:37
-
-

*Thread Reply:* @John Thomas Our company used to use atlas as a metadata service. I just came into know this project. After I learned how openlineage works, I think I can create an issue to describe my design first.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-05-19 09:13:36
-
-

*Thread Reply:* @Juan Carlos Fernández Rodríguez If you already have some experience and design, can you directly create an issue so that we can discuss it in more detail ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Carlos Fernández Rodríguez - (jcfernandez@keedio.com) -
-
2022-05-19 12:42:31
-
-

*Thread Reply:* Hi @xiang chen we are discussing internally in my company if rewrite to atlas or another alternative. If we do this, we will share and could involve you in some way.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-02-04 15:02:29
-
-

Who here is working with OpenLineage at Dagster or Flink? We would love to hear about your work at the next on February 9 at 9 a.m. PT. Please reply here or message me to coordinate. @Ziyoiddin Yusupov

- - - -
- 👍 Ziyoiddin Yusupov -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luca Soato - (lucasoato@gmail.com) -
-
2022-02-04 19:18:24
-
-

Hi everyone, -OpenLineage is wonderful, we really needed something like this! -Has anyone else used it with Databricks, Delta tables or Spark? If someone is interested into these technologies we can work together to get a POC and share some thoughts. -Thanks and have a nice weekend! :)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julius Rentergent - (julius.rentergent@thetradedesk.com) -
-
2022-02-25 13:06:16
-
-

*Thread Reply:* Hi Luca, I agree this looks really promising. I’m working on getting it to run on Databricks, but I’m only just starting out 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-02-08 12:00:02
-
-

Friendly reminder: this month’s OpenLineage TSC meeting is tomorrow! https://openlineage.slack.com/archives/C01CK9T7HKR/p1643849713216459

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- ❤️ Kevin Mellott, John Thomas -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Albert Bikeev - (albert.bikeev@gmail.com) -
-
2022-02-10 08:22:28
-
-

Hi people, -One question regarding error reporting - what is the mechanism for that? E.g. if I send duplicated job to Openlineage, is there a way to notify me about that?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-10 09:05:39
-
-

*Thread Reply:* By duplicated, you mean with the same runId?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Albert Bikeev - (albert.bikeev@gmail.com) -
-
2022-02-10 11:40:55
-
-

*Thread Reply:* It’s only one example, could be also duplicated job name or anything else. The question is if there is mechanism to report that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-14 17:21:20
-
-

Reducing the Logging of Spark Integration

- -

Hey, OpenLineage community! I'm curious if there are any quick tricks / fixes to reduce the amount of logging happening in the OpenLineage Spark Integration. Each job seems to print out the Logical Plan with INFO level logging. The default behavior of Databricks is to print out INFO level logs and so it gets pretty cluttered and noisy.

- -

I'm hoping there's a feature flag that would help me shut off those kind of logs in OpenLineage's Spark integration 🤞

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-15 05:15:12
-
-

*Thread Reply:* I think this log should be dropped to debug: https://github.com/OpenLineage/OpenLineage/blob/d66c41872f3cc7f7cd5c99664d401e070e[…]c/main/common/java/io/openlineage/spark/agent/EventEmitter.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-15 23:27:07
-
-

*Thread Reply:* @Maciej Obuchowski that is a good one! It would be nice to still have SOME logging in info to know that the event complete successfully but that response and event is very verbose.

- -

I was also thinking about here: -https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/src/main/common/java/io/openlineage/spark/agent/lifecycle/OpenLineageRunEventBuilder.java#L337-L340

- -

and here: -https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/src/main/common/java/io/openlineage/spark/agent/lifecycle/OpenLineageRunEventBuilder.java#L405-L408

- -

These spots are where it's printing out the full logical plan for some reason.

- -

Can I just open up a PR and switch these to log.debug instead?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-16 04:59:17
-
-

*Thread Reply:* Yes, that would be good solution for now. Later would be nice to have some option to raise the log level - OL logs are absolutely drowning in logs from rest of Spark cluster when set to debug.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-16 13:35:15
-
-

[SPARK][INTEGRATION] Need Brainstorming Ideas - How to Persist / Access Spark Configs in JobEnd

- -

Hey, OL community! I'm working on PR#490 and I finally have all tests passing but now my desired behavior - display environment properties during COMPLETE / JobEnd events - is not happening 😭

- -

The previous approach stored the spark properties in the OpenLineageContext with a properties attribute but that was part of all of the test failures I believe.

- -

What are some other ways to store the jobStart's properties and make them accessible to the corresponding jobEnd? Hopefully it's okay to tag @Maciej Obuchowski, @Michael Collado, and @Paweł Leszczyński who have been extremely helpful in the past and brought great ideas to the table.

-
- - - - - - - -
-
Comments
- 6 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-02-16 13:44:30
-
-

*Thread Reply:* Hey, I responded on the issue, but just to make it clear for everyone, the OL events for a run are not expected to be an accumulation of all past events. Events should be treated as additive by the backend - each event can post what information it has about the run and the backend is responsible for constructing a holistic picture of the run

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-02-16 13:47:18
-
-

*Thread Reply:* e.g., here is the marquez code that fetches the facets for a run. Note that all of the facets are included from all events with the requested run_uuid. If the env facet is present on any event, it will be returned by the API

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-16 13:51:30
-
-

*Thread Reply:* Ah! Thanks for that @Michael Collado it's good to understand the OpenLineage perspective.

- -

So, we do need to maintain some state. That makes total sense, Mike.

- -

How does Marquez handle failed jobs currently? Based on this issue (https://github.com/OpenLineage/OpenLineage/issues/436) I think Marquez would show a START but no COMPLETE event, right?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-02-16 14:00:03
-
-

*Thread Reply:* If I were building the backend, I would store events, then calculate the end state later, rather than trying to "maintain some state" (maybe we mean the same thing, but using different words here 😀). -Re: the failure events, I think job failures will currently result in one FAIL event and one COMPLETE event. The SparkListenerJobEnd event will trigger a FAIL event but the SparkListenerSQLExecutionEnd event will trigger the COMPLETE event.

- - - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-16 15:16:27
-
-

*Thread Reply:* Oooh! I did not know we already could get a FAIL event! That is super helpful to know, Mike! Thank you so much!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-21 10:04:18
-
-

[SPARK] Connecting SparkListenerSQLExecutionStart to the various SparkListenerJobStarts

- -

TL;DR: How can I connect the SparkListenerSQLExecutionStart to the SparkListenerJobStart events coming out of OpenLineage? The events appear to have two separate run ids and no link to indicate that the ExecutionStart event owns the subsequent JobStart events.

- -

More Context:

- -

Recently, I implemented a connector for Azure Synapse (data warehouse on the Microsoft cloud) for the Spark integration and now with https://github.com/OpenLineage/OpenLineage/pull/490, I realize now that the SparkListenerSQLExecutionStart events carries with it the necessary inputs and outputs to tell the "real" lineage. The way the Synapse in Databricks works is:

- -

• SparkListenerSQLExecutionStart fires off an event with the end to end input and output (e.g. S3 as input and SQL table as output) -• SparkListenerJobStart events fire off that move content from one S3 location to a "staging" location controlled by Azure Synapse. OpenLineage records this event with INPUT S3 and output is a WASB "tempfolder" (which is a temporary locatio and not really useful for lineage since it will be destroyed at the end of the job) -• The final operation actually happens ALL in Synapse and OpenLineage does not fire off an event it seems. The Synapse database has a "COPY" command which moves the data from "tempfolder" in to the database. -• Finally a SparkListenerSQLExecutionEnd event happens and the query is complete. -Ideally, I could connect the SQLExecutionStart of SQLExecutionEnd with the SparkListenerJobStart so that I can get the JobStart properties. I see that ExecutionStart has an execution id and JobStart should have the same Execution Id BUT I think by the time I reach the ExecutionEND, all the JobStart events would have been removed from the HashMap that contains all of the events in OpenLineage.

- -

Any guidance on how to reach a JobStart properties from an ExecutionStart or ExecutionEnd would be greatly appreciated!

-
- - - - - - - -
-
Comments
- 7 -
- - - - - - - - - - -
- - - -
- 🤔 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-22 09:02:48
-
-

*Thread Reply:* I think this scenario only happens when spark job spawns another "sub-job", right?

- -

I think that maybe you can check sparkContext.getLocalProperty("spark.sql.execution.id")

- -

> I see that ExecutionStart has an execution id and JobStart should have the same Execution Id BUT I think by the time I reach the ExecutionEND, all the JobStart events would have been removed from the HashMap that contains all of the events in OpenLineage. -But pairwise, those starts and ends should at least have the same runId as they were created with same OpenLineageContext, right?

- -

Anyway, what @Michael Collado wrote on the issue is true: https://github.com/OpenLineage/OpenLineage/pull/490#issuecomment-1042011803 - you should not assume that we hold all the metadata somewhere in memory during whole execution of the run. The backend should be able to take care of it.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-22 10:53:09
-
-

*Thread Reply:* @Maciej Obuchowski - I was hoping they'd have the same run id as well but they do not 😞

- -

But that is the expectation? A SparkSQLExecutionStart and JobStart SHOULD have the same execution ID, right?

- -

I will take a look at sparkContext.getLocalProperty. Thank you so much for the reply Maciej!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-22 10:57:24
-
-

*Thread Reply:* SparkSQLExecutionStart and SparkSQLExecutionEnd should have the same runId, as well as JobStart and JobEnd events. Beyond those it can get wild. For example, some jobs don't emit JobStart/JobEnd events. Some jobs, like Delta emit multiple, that aren't easily tied to SQL event.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-23 03:48:38
-
-

*Thread Reply:* Okay, I dug into the Databricks Synapse Connector and it does the following:

- -
  1. SparkSQLExecutionStart with execution id of 8 happens (so gets runid of abc123). It contains the real inputs and outputs that we want.
  2. The Synapse connector starts executing JDBC commands. These commands prepare the synapse database to connect with data that Spark will land in a staging area in the cloud. (I don't know how it' executing arbitrary commands before the official job start begins 😞 )
  3. SparkJobStart beings with execution id of 9 happens (so it gets runid of jkl456). This contains the inputs and an output to a temp folder (NOT the real output we want but a staging location) -a. There are four JobIds 0 - 3, all of which point back to execution id 9 with the same physical plan. -b. After job1, it runs more JDBC commands. -c. I think at Job2, it runs the actual Spark code to query and join my raw input data and land it in a cloud storage account "tempfolder"/ -d. After job3, it runs the final JDBC commands to actually move the data from "tempfolder/" to Synapse Db.
  4. Finally, the SparkSQLListenerEnd event occurs. -I can see this in the Spark UI as well.
  5. -
- -

Because the Databricks Synapse connector somehow adds these additional JobStarts WITHOUT referencing the original SparkSQLExeuctionStart execution ID, we have to rely on heuristics to connect the /tempfolder to the real downstream table that was already provided in the ExecutionStart event 😞

- -

I've attached the logs and a screenshot of what I'm seeing the Spark UI. If you had a chance to take a look, it's a bit verbose but I'd appreciate a second pair of eyes on my analysis. Hopefully I got something wrong 😅

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-23 07:19:01
-
-

*Thread Reply:* I think we've encountered the same stuff in Delta before 🙂

- -

https://github.com/OpenLineage/OpenLineage/issues/388#issuecomment-964401860

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-02-23 14:13:18
-
-

*Thread Reply:* @Will Johnson , am I reading your report correctly that the SparkListenerJobStart event is reported with a spark.sql.execution.id that differs from the execution id of the SparkSQLExecutionStart?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-02-23 14:18:04
-
-

*Thread Reply:* WILLJ: We're deep inside this thing and have an executionid |9| -😂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-23 21:56:48
-
-

*Thread Reply:* Hah @Michael Collado I see you found my method of debugging in Databricks 😅

- -

But you're exactly right, there's a SparkSQLExecutionStart event with execution id 8 and then a set of JobStart events all with execution id 9!

- -

I don't know enough about Spark internals on how you can just run arbitrary Scala code while making it look like a Spark Job but that's what it looks like. As if the SqlDwWriter somehow submits a new job without a ExecutionStart... maybe it's an RDD operation instead? This has given me another idea to add some more log.info statements to my jar 😅😬

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-02-28 14:00:23
-
-

One of our own will be talking OpenLineage, Airflow and Spark at the Subsurface Conference this week. Register to attend @Michael Collado’s session on March 3rd at 11:45. You can register and learn more here: https://www.dremio.com/subsurface/live/winter2022/

-
-
Dremio
- - - - - - - - - - - - - - - - - -
- -
- - - - - - - -
- - -
- 🎉 Willy Lulciuc, Maciej Obuchowski -
- -
- 🙌 Will Johnson, Ziyoiddin Yusupov, Julien Le Dem -
- -
- 👍 Ziyoiddin Yusupov -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-02-28 14:00:56
-
-

*Thread Reply:* You won’t want to miss this talk!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2022-02-28 15:06:43
-
-

I have a question about DataHub integration through OpenLineage standard. Is anyone working on it, or was it rather just an icon used in previous materials? We have build a openlineage API endpoint in our product and we were hoping OL will gain enough traction so it will be a native way to connect to variaty of data discovery/observability tools, such as datahub, amundzen, etc.

- -

Many thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-28 15:29:58
-
-

*Thread Reply:* hi Martin - when you talk about a DataHub integration, did you mean a method to collect information from DataHub? I don't see a current issue open for that, but I recommend you make one and to kick off the discussion around it.

- -

If you mean sending information to DataHub, that should already be possible if users pass a datahub api endpoint to the OPENLINEAGE_ENDPOINT variable

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2022-02-28 16:29:54
-
-

*Thread Reply:* Hi, thanks for a reply! I meant to emit Openlineage JSON structure to datahub.

- -

Could you be please more specific, possibly link an article how to find the endpoint on the datahub side? Many thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-28 17:15:31
-
-

*Thread Reply:* ooooh, sorry I misread - I thought you meant that datahub had built an endpoint. Your integration should emit openlineage events to an endpoint, but datahub would have to build that support into their product likely? I'm not sure how to go about it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-28 17:16:27
-
-

*Thread Reply:* I'd reach out to datahub, potentially?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2022-02-28 17:21:51
-
-

*Thread Reply:* i see. ok, will do!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-03-02 18:15:21
-
-

*Thread Reply:* It has been discussed in the past but I don’t think there is something yet. The Kafka transport PR that is in flight should facilitate this

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2022-03-02 18:33:45
-
-

*Thread Reply:* Thanks for the response! though dragging Kafka in just for data delivery bit is too much. I think the clearest way would be to push Datahub to make an API endpoint and parser for OL /lineage data structure.

- -

I see this is more political think that would require join effort of DataHub team and OpenLineage with a common goal.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-02-28 17:22:47
-
-

Is there a topic you think the community should discuss at the next OpenLineage TSC meeting? Reply or DM with your item, and we’ll add it to the agenda. Mark your calendars: the next TSC meeting is Wednesday, March 9 at 9 am PT on zoom.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-03-02 10:24:58
-
-

The next OpenLineage Technical Steering Committee meeting is Wednesday, March 9! Meetings are on the second Wednesday of each month from 9:00 to 10:00am PT. -Join us on Zoom: https://us02web.zoom.us/j/81831865546?pwd=RTladlNpc0FTTDlFcWRkM2JyazM4Zz09 -All are welcome. -Agenda: -• New committers -• Release overview (0.6.0) -• New process for blog posts -• Retrospective: Spark integration -Notes: https://tinyurl.com/openlineagetsc

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-03-02 14:29:33
-
-

FYI, there's a talk on OpenLineage at Subsurface live tomorrow - https://www.dremio.com/subsurface/live/winter2022/session/cross-platform-data-lineage-with-openlineage/

-
-
Dremio
- - - - - - -
-
Est. reading time
- 1 minute -
- - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski, John Thomas, Paweł Leszczyński, Francis McGregor-Macdonald -
- -
- 👍 Ziyoiddin Yusupov, Michael Robinson, Jac. -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-03-04 15:25:20
-
-

@channel The latest release (0.6.0) of OpenLineage is now available, featuring a new Dagster integration, updates to the Airflow and Java integrations, a generic facet for env properties, bug fixes, and more. For more info, visit https://github.com/OpenLineage/OpenLineage/releases/tag/0.6.0

- - - -
- 🙌 Conor Beverland, Dalin Kim, Ziyoiddin Yusupov, Luca Soato -
- -
- 👍 Julien Le Dem -
- -
- 👀 William Angel, Francis McGregor-Macdonald -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-07 14:06:19
-
-

Hello Guys,

- -

Where do I find an example of building a custom extractor? We have several custom airflow operators that I need to integrate

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-07 14:56:58
-
-

*Thread Reply:* Hi marco - we don't have documentation on that yet, but the Postgres extractor is a pretty good example of how they're implemented.

- -

all the included extractors are here: https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow/openlineage/airflow/extractors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-07 15:07:41
-
-

*Thread Reply:* Thanks. I can follow that to build my own. Also I am installing this environment right now in Airflow 2. It seems I need Marquez and openlinegae-aiflow library. It seems that by this example I can put my extractors in any path as long as it is referenced in the environment variable. Is that correct? -OPENLINEAGE_EXTRACTOR_&lt;operator&gt;=full.path.to.ExtractorClass -Also do I need anything else other than Marquez and openlineage_airflow?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-07 15:30:45
-
-

*Thread Reply:* Yes, as long as the extractors are in the python path.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-07 15:31:59
-
-

*Thread Reply:* I built one a little while ago for a custom operator, I'd be happy to share what I did. I put it in the same file as the operator class for convenience.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-07 15:32:51
-
-

*Thread Reply:* That will be great help. Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-08 20:38:27
-
-

*Thread Reply:* This is the one I wrote:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-08 20:39:30
-
-

*Thread Reply:* to make it work, I set this environment variable:

- -

OPENLINEAGE_EXTRACTOR_HttpToBigQueryOperator=http_to_bigquery.HttpToBigQueryExtractor

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-08 20:40:57
-
-

*Thread Reply:* the extractor starts at line 183, and the really important bits start at line 218

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-03-07 15:16:37
-
-

@channel At the next OpenLineage TSC meeting, we’ll be reminiscing about the Spark integration. If you’ve had a hand in OL support for Spark, please join and share! The meeting will start at 9 am PT on Wednesday this week. @Maciej Obuchowski @Oleksandr Dvornik @Willy Lulciuc @Michael Collado https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
- 👍 Ross Turk, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-07 18:44:26
-
-

Would Marquez create some lineage for operators that don't have a custom extractor built yet?

- - - -
- ✅ Fuming Shih -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-08 12:05:25
-
-

*Thread Reply:* You would see that job was run - but we couldn't extract dataset lineage from it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-08 12:05:49
-
-

*Thread Reply:* The good news is that we're working to solve this problem in general.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-08 12:15:52
-
-

*Thread Reply:* I see, so i definitively will need the custom extractor built. I just need to understand where to set the path to the extractor. I can build one by following the postgress extractor you have built.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-08 12:50:00
-
-

*Thread Reply:* That depends how you deploy Airflow. Our tests use environment in docker-compose: https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/tests/integration/tests/docker-compose-2.yml#L34

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-08 13:19:37
-
-

*Thread Reply:* Thanks for the example. I can show this to my infra support person for his reference.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-03-08 11:47:11
-
-

This month’s OpenLineage TSC community meeting is tomorrow at 9am PT! It’s not too late to add an item to the agenda. Reply here or msg me with yours. https://openlineage.slack.com/archives/C01CK9T7HKR/p1646234698326859

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-09 19:31:23
-
-

I am running the last command to install marquez in AWS -helm upgrade --install marquez . - --set marquez.db.host &lt;AWS-RDS-HOST&gt; - --set marquez.db.user &lt;AWS-RDS-USERNAME&gt; - --set marquez.db.password &lt;AWS-RDS-PASSWORD&gt; - --namespace marquez - --atomic - --wait -And I am receiving this error -Error: query: failed to query with labels: secrets is forbidden: User "xxx@xxx.xx" cannot list resource "secrets" in API group "" in the namespace "default"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-03-10 12:46:18
-
-

*Thread Reply:* Do you need to specify a namespace that is not « default »?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-09 19:31:48
-
-

Can anyone let me know what is happening? My DI guy said it is a chart issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-10 07:40:13
-
-

*Thread Reply:* @Kevin Mellott aren't you the chart wizard? Maybe you could help 🙂

- - - -
- 👀 Kevin Mellott -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 14:09:26
-
-

*Thread Reply:* Ok so I had to update a chart dependency

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 14:10:39
-
-

*Thread Reply:* Now I installed the service in amazon using this -helm install marquez . --dependency-update --set marquez.db.host=myhost --set marquez.db.user=myuser --set marquez.db.password=mypassword --namespace marquez --atomic --wait

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 14:11:31
-
-

*Thread Reply:* i can see marquez-web running and marquez as well as the database i set up manually

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 14:12:27
-
-

*Thread Reply:* however I can not fetch initial data when login into the endpoint

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-10 14:52:06
-
-

*Thread Reply:* 👋 @Marco Diaz happy to hear that the Helm install is completing without error! To help troubleshoot the error above, can you please let me know if this endpoint is available and working?

- -

http://localhost:5000/api/v1/namespaces

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 15:13:16
-
-

*Thread Reply:* i got this -{"namespaces":[{"name":"default","createdAt":"2022_03_10T18:05:55.780593Z","updatedAt":"2022-03-10T19:03:31.309713Z","ownerName":"anonymous","description":"The default global namespace for dataset, job, and run metadata not belonging to a user-specified namespace."}]}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 15:13:34
-
-

*Thread Reply:* i have to use the namespace marquez to redirect there -kubectl port-forward svc/marquez 5000:80 -n marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 15:13:48
-
-

*Thread Reply:* is there something i need to change in a config file?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 15:14:39
-
-

*Thread Reply:* also how would i change the "localhost" address to something that is accessible in amazon without the need to redirect?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 15:14:59
-
-

*Thread Reply:* Sorry for all the questions. I am not an infra guy and have had to do all this by myself

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-10 15:39:23
-
-

*Thread Reply:* No problem at all, I think there are a couple of things at play here. With the local setup, it appears that the web is attempting to access the API on the wrong port number (3000 instead of 5000). I’ll create an issue for that one so that we can fix it.

- -

As to the EKS installation (or any non-local install), this is where you would need to use what’s called an ingress controller to expose the services outside of the Kubernetes cluster. There are different flavors of these (NGINX is popular), and I believe that AWS EKS has some built-in capabilities that might help as well.

- -

https://www.eksworkshop.com/beginner/130_exposing-service/ingress/

-
-
Amazon EKS Workshop
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 15:40:50
-
-

*Thread Reply:* So how do i fix this issue?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-10 15:46:56
-
-

*Thread Reply:* If your goal is to deploy to AWS, then you would need to get the EKS ingress configured. It’s not a trivial task, but they do have a bit of a walkthrough at https://www.eksworkshop.com/beginner/130_exposing-service/.

- -

However, if you are just seeking to explore Marquez and try things out, then I would highly recommend the “Open in Gitpod” functionality at https://github.com/MarquezProject/marquez#try-it. That will perform a full deployment for you in a temporary environment very quickly.

-
-
Amazon EKS Workshop
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 16:02:05
-
-

*Thread Reply:* i need to use it in aws for a POC

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 19:15:08
-
-

*Thread Reply:* Is there a better guide on how to install and setup Marquez in AWS? -This guide is omitting many steps -https://marquezproject.github.io/marquez/running-on-aws.html

-
-
Marquez
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-10 12:35:37
-
-

We're trying to find best way to track upstream releases of projects we have integrations for, to support newer versions faster and with less bugs. If you have any opinions on this topic, please chime in here

- -

https://github.com/OpenLineage/OpenLineage/issues/602

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 13:34:30
-
-

@Kevin Mellott Hello Kevin I followed the tutorial you sent me and I have exposed my services. However I am still seeing the same errors (this comes from the api/namescape call) -{"namespaces":[{"name":"default","createdAt":"2022_03_10T18:05:55.780593Z","updatedAt":"2022-03-10T19:03:31.309713Z","ownerName":"anonymous","description":"The default global namespace for dataset, job, and run metadata not belonging to a user-specified namespace."}]}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 13:35:08
-
-

Is there something i need to change in the chart? I do not have access to the default namespace in kubernetes only marquez namescpace

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-11 13:56:27
-
-

@Marco Diaz that is actually a good response! This is the JSON returned back by the API to show some of the default Marquez data created by the install. Is there another error you are experiencing?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 13:59:28
-
-

*Thread Reply:* I still see this -https://files.slack.com/files-pri/T01CWUYP5AR-F036JKN77EW/image.png

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:00:09
-
-

*Thread Reply:* I created my own database and changed the values for host, user and password inside the chart.yml

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-11 14:00:23
-
-

*Thread Reply:* Does it show that within the AWS deployment? It looks to show localhost in your screenshot.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-11 14:00:52
-
-

*Thread Reply:* Or are you working through the local deploy right now?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:01:57
-
-

*Thread Reply:* It shows the same using the exposed service

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:02:09
-
-

*Thread Reply:* i just didnt do another screenshot

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:02:27
-
-

*Thread Reply:* Could it be communication with the DB?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-11 14:04:37
-
-

*Thread Reply:* What do you see if you view the network traffic within your web browser (right click -> Inspect -> Network). Specifically, wondering what the response code from the Marquez API URL looks like.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:14:48
-
-

*Thread Reply:* i see this error -Error occured while trying to proxy to: <a href="http://xxxxxxxxxxxxxxxxxxxxxxxxx.us-east-1.elb.amazonaws.com/api/v1/namespaces">xxxxxxxxxxxxxxxxxxxxxxxxx.us-east-1.elb.amazonaws.com/api/v1/namespaces</a>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:16:00
-
-

*Thread Reply:* it seems to be trying to use the same address to access the api endpoint

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:16:26
-
-

*Thread Reply:* however the api service is in a different endpoint

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:18:24
-
-

*Thread Reply:* The API resides here -<a href="http://Xxxxxxxxxxxxxxxxxxxxxx-2064419849.us-east-1.elb.amazonaws.com">Xxxxxxxxxxxxxxxxxxxxxx-2064419849.us-east-1.elb.amazonaws.com</a>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:19:13
-
-

*Thread Reply:* The web service resides here -<a href="http://xxxxxxxxxxxxxxxxxxxxxxxxxxx-335729662.us-east-1.elb.amazonaws.com">xxxxxxxxxxxxxxxxxxxxxxxxxxx-335729662.us-east-1.elb.amazonaws.com</a>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:19:25
-
-

*Thread Reply:* do they both need to be under the same LB?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:19:56
-
-

*Thread Reply:* How would i do that is they install as separate services?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-11 14:27:15
-
-

*Thread Reply:* You are correct, both the website and API are expecting to be exposed on the same ALB. This will give you a single URL that can reach your Kubernetes cluster, and then the ALB will allow you to configure Ingress rules to route the traffic based on the request.

- -

Here is an example from one of the AWS repos - in the ingress resource you can see the single rule setup to point traffic to a given service.

- -

https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/main/docs/examples/2048/2048_full.yaml

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:36:40
-
-

*Thread Reply:* Thanks for the help. Now I know what the issue is

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-11 14:51:34
-
-

*Thread Reply:* Great to hear!!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-03-16 00:55:36
-
-

👋 Hi everyone! Our company is looking to adopt data lineage tool, so i have few queries on open lineage, so 1. Is this completey free.

- -
  1. What are tha database it supports?
  2. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-16 10:29:06
-
-

*Thread Reply:* Hi! Yes, OpenLineage is free. It is an open source standard for collection, and it provides the agents that integrate with pipeline tools to capture lineage metadata. You also need a metadata server, and there is an open source one called Marquez that you can use.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-16 10:29:15
-
-

*Thread Reply:* It supports the databases listed here: https://openlineage.io/integration

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-03-16 08:27:20
-
-

and when i run the ./docker/up.sh --seed i got the result from java code(sample example) But how to get the same thing in python example?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-16 10:29:53
-
-

*Thread Reply:* Not sure I understand - are you looking for example code in Python that shows how to make OpenLineage calls?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-03-16 12:45:14
-
-

*Thread Reply:* yup

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-03-16 13:10:04
-
-

*Thread Reply:* how to run

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-16 23:08:31
-
-

*Thread Reply:* this is a good post for getting started with Marquez: https://openlineage.io/blog/explore-lineage-api/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-16 23:08:51
-
-

*Thread Reply:* once you have run ./docker/up.sh, you should be able to run through that and see how the system runs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-16 23:09:45
-
-

*Thread Reply:* There is a python client you can find here: https://github.com/OpenLineage/OpenLineage/tree/main/client/python

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-03-17 00:05:58
-
-

*Thread Reply:* Thank you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-19 00:00:32
-
-

*Thread Reply:* You are welcome 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-19 09:28:50
-
-

*Thread Reply:* Hey @Ross Turk, (and potentially @Maciej Obuchowski) - what are the plans for OL Python client? I'd like to use it, but without a pip package it's not really project-friendly.

- -

Is there any work in that direction, is the current client code considered mature and just needs re-packaging, or is it just a thought sketch and some serious work is needed?

- -

I'm trying to avoid re-inventing the wheel, so if there's already something in motion, I'd rather support than start (badly) from scratch?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-19 09:32:17
-
-

*Thread Reply:* What do you mean without pip-package?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-19 09:32:18
- -
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-19 09:35:08
-
-

*Thread Reply:* It's still developed, for example next release will have pluggable backends - like Kafka -https://github.com/OpenLineage/OpenLineage/pull/530

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-19 09:40:11
-
-

*Thread Reply:* My apologies Maciej! -In my defense - looking for "open lineage" on pypi doesn't show this in the first 20 results. Still, should have checked setup.py. My bad, and thank you for the pointer!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-19 10:00:49
-
-

*Thread Reply:* We might need to add some keywords to setup.py - right now we have only "openlineage" there 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-20 08:12:29
-
-

*Thread Reply:* My mistake was that I was expecting a separate repo for the clients. But now I'm playing around with the package and trying to figure out the OL concepts. Thank you for your contribution, it's much nicer to experiment from ipynb than curl 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-03-16 12:00:01
-
-

@Julien Le Dem and @Willy Lulciuc will be at Data Council Austin next week talking OpenLineage and Airflow https://www.datacouncil.ai/talks/data-lineage-with-apache-airflow-using-openlineage?hsLang=en

-
-
datacouncil.ai
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-03-16 12:50:20
-
-

I couldn't figure out for the sample lineage flow (etldelivery7_days) when we ran the seed command after from which file its fetching data

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-16 14:35:14
-
-

*Thread Reply:* the seed data is being inserted by this command here: https://github.com/MarquezProject/marquez/blob/main/api/src/main/java/marquez/cli/SeedCommand.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-03-17 00:06:53
-
-

*Thread Reply:* Got it, but if i changed the code in this java file lets say i added another job here satisfying the syntax its not appearing in the lineage flow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-22 18:18:22
-
-

@Kevin Mellott Hello Kevin, sorry to bother you again. I was finally able to configure Marquez in AWS using an ALB. Now I am receiving this error when calling the API

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-22 18:18:32
-
-

Is this an issue accessing the database?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-22 18:19:15
-
-

I created the database and host manually and passed the parameters using helm --set

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-22 18:19:33
-
-

Do the database services need to be exposed too through the ALB?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-23 10:20:47
-
-

*Thread Reply:* I’m not too familiar with the 504 error in ALB, but found a guide with troubleshooting steps. If this is an issue with connectivity to the Postgres database, then you should be able to see errors within the marquez pod in EKS (kubectl logs <marquez pod name>) to confirm.

- -

I know that EKS needs to have connectivity established to the Postgres database, even in the case of RDS, so that could be the culprit.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-23 16:09:09
-
-

*Thread Reply:* @Kevin Mellott This is the error I am seeing in the logs -[HPM] Proxy created: /api/v1 -&gt; <http://localhost:5000/> -App listening on port 3000! -[HPM] Error occurred while trying to proxy request /api/v1/namespaces from <a href="http://marquez-interface-test.di.rbx.com">marquez-interface-test.di.rbx.com</a> to <http://localhost:5000/> (ECONNREFUSED) (<https://nodejs.org/api/errors.html#errors_common_system_errors>)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-23 16:22:13
-
-

*Thread Reply:* It looks like the website is attempting to find the API on localhost. I believe this can be resolved by setting the following Helm chart value within your deployment.

- -

marquez.hostname=marquez-interface-test.di.rbx.com

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-23 16:22:54
-
-

*Thread Reply:* assuming that is the DNS used by the website

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-23 16:48:53
-
-

*Thread Reply:* thanks, that did it. I have a question regarding the database

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-23 16:50:01
-
-

*Thread Reply:* I made my own database manually. Do the marquez tables should be created automatically when install marquez?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-23 16:56:10
-
-

*Thread Reply:* Also could you put both the API and interface on the same port (3000)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-23 17:21:58
-
-

*Thread Reply:* Seems I am still having the forwarding issue -[HPM] Proxy created: /api/v1 -&gt; <http://marquez-interface-test.di.rbx.com:5000/> -App listening on port 3000! -[HPM] Error occurred while trying to proxy request /api/v1/namespaces from <a href="http://marquez-interface-test.di.rbx.com">marquez-interface-test.di.rbx.com</a> to <http://marquez-interface-test.di.rbx.com:5000/> (ECONNRESET) (<https://nodejs.org/api/errors.html#errors_common_system_errors>)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-03-23 09:08:14
-
-

Guidance on How / When a Spark SQL Execution event Controls JobStart Events?

- -

@Maciej Obuchowski and @Paweł Leszczyński and @Michael Collado I'd really appreciate your thoughts on how / when JobStart events are triggered for a given execution. I've ran into two situations now where a SQLExecutionStart event fires with execution id X and then JobStart events fire with execution id Y.

- -

• Spark 2 Delta SaveIntoDataSourceCommand on Databricks - I see it has a SparkSQLExecutionStart event but only on Spark 3 does it have JobStart events with the SaveIntoDataSourceCommand and the same execution id. -• Databricks Synapse Connector - A SparkSQLExecutionStart event occurs but then the job starts are different execution ids. -Is there any guidance / books / videos that dive deeper into how these events are triggered?

- -

We need the JobStart event with the same execution id so that we can get some environment properties stored in the job start event.

- -

Thanks you so much for any guidance!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-23 09:25:18
-
-

*Thread Reply:* It's always Delta, isn't it?

- -

When I originally worked on Delta support I tried to find answer on Delta slack and got an answer:

- -

Hi Maciej, the main reason is that Delta will run queries on metadata to figure out what files should be read for a particular version of a Delta table and that's why you might see multiple jobs. In general Delta treats metadata as data and leverages Spark to handle them to make it scalable.

- - - -
- 🤣 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-23 09:25:48
-
-

*Thread Reply:* I haven't touched how it works in Spark 2 - wanted to make it work with Spark 3's new catalogs, so can't help you there.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-03-23 09:46:14
-
-

*Thread Reply:* Argh!! It's always Databricks doing something 🙄

- -

Thanks, Maciej!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-03-23 09:51:59
-
-

*Thread Reply:* One last question for you, @Maciej Obuchowski, any thoughts on how I could identify WHY a particular JobStart event fired? Is it just stepping through every event? Was that your approach to getting Spark3 Delta working? Thank you so much for the insights!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-23 09:58:08
-
-

*Thread Reply:* Before that, we were using just JobStart/JobEnd events and I couldn't find events that correspond to logical plan that has anything to do with what job was actually doing. I just found out that SQLExecution events have what I want, so I just started using them and stopped worrying about Projection or Aggregate, or other events that don't really matter here - and that's how filtering idea was born: https://github.com/OpenLineage/OpenLineage/issues/423

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-23 09:59:37
-
-

*Thread Reply:* Are you trying to get environment info from those events, or do you actually get Job event with proper logical plans like SaveIntoDataSourceCommand?

- -

Might be worth to just post here all the events + logical plans that are generated for particular job, as I've done in that issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-23 09:59:40
-
-

*Thread Reply:* scala&gt; spark.sql("CREATE TABLE tbl USING delta AS SELECT ** FROM tmp") -21/11/09 19:01:46 WARN SparkSQLExecutionContext: SparkListenerSQLExecutionStart - executionId: 3 -21/11/09 19:01:46 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.CreateTableAsSelect -21/11/09 19:01:46 WARN SparkSQLExecutionContext: SparkListenerSQLExecutionStart - executionId: 4 -21/11/09 19:01:46 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.LocalRelation -21/11/09 19:01:46 WARN SparkSQLExecutionContext: SparkListenerJobStart - executionId: 4 -21/11/09 19:01:46 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.LocalRelation -21/11/09 19:01:47 WARN SparkSQLExecutionContext: SparkListenerJobEnd - executionId: 4 -21/11/09 19:01:47 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.LocalRelation -21/11/09 19:01:47 WARN SparkSQLExecutionContext: SparkListenerSQLExecutionEnd - executionId: 4 -21/11/09 19:01:47 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.LocalRelation -21/11/09 19:01:48 WARN SparkSQLExecutionContext: SparkListenerSQLExecutionStart - executionId: 5 -21/11/09 19:01:48 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.Aggregate -21/11/09 19:01:48 WARN SparkSQLExecutionContext: SparkListenerJobStart - executionId: 5 -21/11/09 19:01:48 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.Aggregate -21/11/09 19:01:49 WARN SparkSQLExecutionContext: SparkListenerJobEnd - executionId: 5 -21/11/09 19:01:49 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.Aggregate -21/11/09 19:01:49 WARN SparkSQLExecutionContext: SparkListenerSQLExecutionEnd - executionId: 5 -21/11/09 19:01:49 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.Aggregate -21/11/09 19:01:49 WARN SparkSQLExecutionContext: SparkListenerSQLExecutionEnd - executionId: 3 -21/11/09 19:01:49 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.CreateTableAsSelect

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-03-23 11:41:37
-
-

*Thread Reply:* The JobStart event contains a Properties field and that contains a bunch of fields we want to extract to get more precise lineage information within Databricks.

- -

As far as we know, the SQLExecutionStart event does not have any way to get these properties :(

- -

https://github.com/OpenLineage/OpenLineage/blob/21b039b78bdcb5fb2e6c2489c4de840ebb[…]ark/agent/facets/builder/DatabricksEnvironmentFacetBuilder.java

- -

As a result, I do have to care about the subsequent JobStart events coming from a given ExecutionStart 😢

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-03-23 11:42:33
-
-

*Thread Reply:* I started down this path with the Project statement but I agree with @Michael Collado that a ProjectVisitor isn't a great idea.

- -

https://github.com/OpenLineage/OpenLineage/issues/617

-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-24 09:43:38
-
-

Hey. I'm working on replacing current SQL parser - on which we rely for Postgres, Snowflake, Great Expectations - and I'd appreciate your opinion.

- -

https://github.com/OpenLineage/OpenLineage/pull/627/files

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-25 19:30:29
-
-

Am i supposed to see this when I open marquez fro the first time on an empty database?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-25 20:33:02
-
-

*Thread Reply:* Marquez and OpenLineage are job-focused lineage tools, so once you run a job in an OL-integrated instance of Airflow (or any other supported integration), you should see the jobs and DBs appear in the marquez ui

- - - -
- 👍 Marco Diaz -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-25 21:44:54
-
-

*Thread Reply:* If you want to seed it with some data, just to try it out, you can run docker/up.sh -s and it will run a seeding job as it starts.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-25 19:31:09
-
-

Would datasets be created when I send data from airflow?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-03-31 18:34:40
-
-

*Thread Reply:* Yep! Marquez will register all in/out datasets present in the OL event as well as link them to the run

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-03-31 18:35:47
-
-

*Thread Reply:* FYI, @Peter Hicks is working on displaying the dataset version to run relationship in the web UI, see https://github.com/MarquezProject/marquez/pull/1929

-
- - - - - - - -
-
Labels
- feature, review, web, javascript -
- -
-
Comments
- 1 -
- - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-28 14:31:32
-
-

How is Datakin used in conjunction with Openlineage and Marquez?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-28 15:43:46
-
-

*Thread Reply:* Hi Marco,

- -

Datakin is a reporting tool built on the Marquez API, and therefore designed to take in Lineage using the OpenLineage specification.

- -

Did you have a more specific question?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-28 15:47:53
-
-

*Thread Reply:* No, that is it. Got it. So, i can install Datakin and still use openlineage and marquez?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-28 15:55:07
-
-

*Thread Reply:* if you set up a datakin account, you'll have to change the environment variables used by your OpenLineage integrations, and the runEvents will be sent to Datakin rather than Marquez. You shouldn't have any loss of functionality, and you also won't have to keep manually hosting Marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-28 16:10:25
-
-

*Thread Reply:* Will I still be able to use facets for backfills?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-28 17:04:03
-
-

*Thread Reply:* yeah it works in the same way - Datakin actually submodules the Marquez API

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-28 16:52:41
-
-

Another question. I installed the open-lineage library and now I am trying to configure Airflow 2 to use it -Do I follow these steps?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-28 16:53:20
-
-

If I have marquez access via alb ingress what would i use the marquezurl variable or openlineageurl?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-28 16:54:53
-
-

So, i don't need to modify my dags in Airflow 2 to use the library? Would this just allow me to start collecting data? -openlineage.lineage_backend.OpenLineageBackend

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-29 06:24:21
-
-

*Thread Reply:* Yes, you don't need to modify dags in Airflow 2.1+

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-29 17:47:39
-
-

*Thread Reply:* ok, I added that environment variable. Now my question is how do i configure my other variables. -I have marquez running in AWS with an ingress. -Do i use OpenLineageURL or Marquez_URL?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-29 17:48:09
-
-

*Thread Reply:* Also would a new namespace be created if i add the variable?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
data_fool - (data.fool.me@gmail.com) -
-
2022-03-29 02:12:30
-
-

Hello! Are there any plans for openlineage to support dbt on trino?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-30 14:59:13
-
-

*Thread Reply:* Hi Datafool - I'm not familiar with how trino works, but the DBT-OL integration works by wrapping the dbt run command with dtb-ol run , and capturing lineage data from the runresult file

- -

These things don't necessarily preclude you from using OpenLineage on trino, so it may work already.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
data_fool - (data.fool.me@gmail.com) -
-
2022-03-30 18:34:38
-
-

*Thread Reply:* hey @John Thomas yep, tried to use dbt-ol run command but it seems trino is not supported, only bigquery, redshift and few others.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-30 18:36:41
-
-

*Thread Reply:* aaah I misunderstood what Trino is - yeah we don't currently support jobs that are running outside of those environments.

- -

We don't currently have plans for this, but a great first step would be opening an issue in the OpenLineage repo.

- -

If you're interested in implementing the support yourself I'm also happy to connect you to people that can help you get started.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
data_fool - (data.fool.me@gmail.com) -
-
2022-03-30 20:23:46
-
-

*Thread Reply:* oh okay, got it, yes I can contribute, I'll see if I can get some time in the next few weeks. Thanks @John Thomas

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2022-03-30 16:08:39
-
-

I can see 2 articles using Spline with BMW and Capital One. Could OpenLineage be doing the same job as Spline here? What would the differences be? -Are there any similar references for OpenLineage? I can see Northwestern Mutual but that article does not contain a lot of detail.

-
-
SpringerLink
- - - - - - - - - - - - - - - - - -
-
-
Capital One
- - - - - - - - - - - - - - - - - -
-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 12:47:59
-
-

Could anyone help me wit this custom extractor. I am not sure what I am doing wrong. I added the variable to airflow2, but I still see this in the logs -[2022-03-31, 16:43:39 UTC] {__init__.py:97} WARNING - Unable to find an extractor. task_type=QueryOperator -Here is the code

- -

```import logging -from typing import Optional, List -from openlineage.airflow.extractors.base import BaseExtractor,TaskMetadata -from openlineage.client.facet import SqlJobFacet, ExternalQueryRunFacet -from openlineage.common.sql import SqlMeta, SqlParser

- -

logger = logging.getLogger(name)

- -

class QueryOperatorExtractor(BaseExtractor):

- -
def __init__(self, operator):
-    super().__init__(operator)
-
-@classmethod
-def get_operator_classnames(cls) -&gt; List[str]:
-    return ['QueryOperator']
-
-def extract(self) -&gt; Optional[TaskMetadata]:
-    # (1) Parse sql statement to obtain input / output tables.
-    sql_meta: SqlMeta = SqlParser.parse(self.operator.hql)
-    inputs = sql_meta.in_tables
-    outputs = sql_meta.out_tables
-    task_name = f"{self.operator.dag_id}.{self.operator.task_id}"
-    run_facets = {}
-    job_facets = {
-        'hql': SqlJobFacet(self.operator.hql)
-    }
-
-    return TaskMetadata(
-        name=task_name,
-        inputs=[inputs.to_openlineage_dataset()],
-        outputs=[outputs.to_openlineage_dataset()],
-        run_facets=run_facets,
-        job_facets=job_facets
-    )```
-
- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Orbit - -
-
2022-03-31 13:20:55
-
-

@Orbit has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Orbit - -
-
2022-03-31 13:21:23
-
-

@Orbit has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 14:07:24
-
-

@Ross Turk Could you please take a look if you have a minute☝️? I know you have built one extractor before

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:11:35
-
-

*Thread Reply:* Hmmmm. Are you running in Docker? Is it possible for you to shell into your scheduler container and make sure the ENV is properly set?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:11:57
-
-

*Thread Reply:* looks to me like the value you posted is correct, and return ['QueryOperator'] seems right to me

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 14:33:00
-
-

*Thread Reply:* It is in an EKS cluster -I checked and the variable is there -OPENLINEAGE_EXTRACTOR_QUERYOPERATOR=shared.plugins.ol_custom_extractors.QueryOperatorExtractor

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 14:33:56
-
-

*Thread Reply:* I am wondering if it is an issue with my extractor code. Something not rendering well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:40:17
-
-

*Thread Reply:* I don’t think it’s even executing your extractor code. The error message traces back to here: -https://github.com/OpenLineage/OpenLineage/blob/249868fa9b97d218ee35c4a198bcdf231a9b874b/integration/airflow/openlineage/lineage_backend/__init__.py#L77

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:40:45
-
-

*Thread Reply:* I am currently digging into _get_extractor to see where it might be missing yours 🤔

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 14:46:36
-
-

*Thread Reply:* Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:47:19
-
-

*Thread Reply:* silly idea, but you could add a log message to __init__ in your extractor.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:47:25
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/blob/249868fa9b97d218ee35c4a198bcdf231a[…]ntegration/airflow/openlineage/airflow/extractors/extractors.py

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:48:20
-
-

*Thread Reply:* the openlineage client actually tries to import the value of that env variable from pos 22. if that happens, but for some reason it fails to register the extractor, we can at least know that it’s importing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:48:54
-
-

*Thread Reply:* if you add a log line, you can verify that your PYTHONPATH and env are correct

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 14:49:23
-
-

*Thread Reply:* will try that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 14:49:29
-
-

*Thread Reply:* and let you know

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:49:39
-
-

*Thread Reply:* ok!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-31 15:04:05
-
-

*Thread Reply:* @Marco Diaz can you try env variable OPENLINEAGE_EXTRACTOR_QueryOperator instead of full caps?

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 15:13:37
-
-

*Thread Reply:* Will try that too

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 15:13:44
-
-

*Thread Reply:* Thanks for helping

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 16:58:24
-
-

*Thread Reply:* @Maciej Obuchowski My setup does not allow me to submit environment variables with lowercases. Is the name of the variable used to register the extractor?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-31 17:15:57
-
-

*Thread Reply:* yes, it's case sensitive...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 17:18:42
-
-

*Thread Reply:* i see

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 17:39:16
-
-

*Thread Reply:* So it is definitively the name of the variable. I changed the name of the operator to capitals and now is being registered

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 17:39:44
-
-

*Thread Reply:* Could there be a way not to make this case sensitive?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-31 18:31:27
-
-

*Thread Reply:* yes - could you create issue on OpenLineage repository?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-01 10:46:59
-
-

*Thread Reply:* sure

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-01 10:48:28
-
-

I have another question. I have this query -INSERT OVERWRITE TABLE schema.daily_play_sessions_v2 - PARTITION (ds = '2022-03-30') - SELECT - platform_id, - universe_id, - pii_userid, - NULL as session_id, - NULL as session_start_ts, - COUNT(1) AS session_cnt, - SUM( - UNIX_TIMESTAMP(stopped) - UNIX_TIMESTAMP(joined) - ) AS time_spent_sec - FROM schema.fct_play_sessions_merged - WHERE ds = '2022-03-30' - AND UNIX_TIMESTAMP(stopped) - UNIX_TIMESTAMP(joined) BETWEEN 0 AND 28800 - GROUP BY - platform_id, - universe_id, - pii_userid -And I am seeing the following inputs -[DbTableName(None,'schema','fct_play_sessions_merged','schema.fct_play_sessions_merged')] -But the outputs are empty -Shouldn't this be an output table -schema.daily_play_sessions_v2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:25:52
-
-

*Thread Reply:* Yes, it should. This line is the likely culprit: -https://github.com/OpenLineage/OpenLineage/blob/431251d25f03302991905df2dc24357823d9c9c3/integration/common/openlineage/common/sql/parser.py#L30

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:26:25
-
-

*Thread Reply:* I bet if that said ['INTO','OVERWRITE'] it would work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:27:23
-
-

*Thread Reply:* @Maciej Obuchowski do you agree? should OVERWRITE be a token we look for? if so, I can submit a short PR.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-01 13:30:36
-
-

*Thread Reply:* we have a better solution

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-01 13:30:37
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/644

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:31:27
-
-

*Thread Reply:* ah! I heard there was a new SQL parser, but did not know it was imminent!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-01 13:31:30
-
-

*Thread Reply:* I've added this case as a test and it works: https://github.com/OpenLineage/OpenLineage/blob/764dfdb885112cd0840ebc7384ff958bf20d4a70/integration/sql/tests/tests_insert.rs

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Ross Turk, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:31:33
-
-

*Thread Reply:* let me review this PR

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-01 13:36:32
-
-

*Thread Reply:* Do i have to download a new version of the opelineage-airflow python library

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-01 13:36:41
-
-

*Thread Reply:* If so which version?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:37:22
-
-

*Thread Reply:* this PR isn’t merged yet 😞 so if you wanted to try this you’d have to build the python client from the sql/rust-parser-impl branch

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-01 13:38:17
-
-

*Thread Reply:* ok, np. I am not in a hurry yet. Do you have an ETA for the merge?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:39:50
-
-

*Thread Reply:* Hard to say, it’s currently in-review. Let me pull some strings, see if I can get eyes on it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-01 13:40:34
-
-

*Thread Reply:* I will check again next week don't worry. I still need to make some things in my extractor work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:40:36
-
-

*Thread Reply:* after it’s merged, we’ll have to do an OpenLineage release as well - perhaps next week?

- - - -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:40:41
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 12:25:48
-
-

Hi everyone, I just started using openlineage to connect with DBT for my company. I work as data engineering. After the connection and run test on dbt-ol run, it gives me this error. I have looked up online to find the answer but couldn't see the answer anywhere. Can somebody please help me with? The error tells me that the correct version is DBT Schemajson version 2 instead of 3. I don't know where to change the schemajson version. Thank you everyone @channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:34:10
-
-

*Thread Reply:* Hm - what version of dbt are you using?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:47:50
-
-

*Thread Reply:* @Tien Nguyen The dbt schema version changes with different versions of dbt. If you have recently updated, you may have to make some changes: https://docs.getdbt.com/docs/guides/migration-guide/upgrading-to-v1.0

-
-
docs.getdbt.com
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:48:27
-
-

*Thread Reply:* also make sure you are on the latest version of openlineage-dbt - I believe we have made it a bit more tolerant of dbt schema changes.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 13:52:46
-
-

*Thread Reply:* @Ross Turk Thank you very much for your answer. I will update those and see if I can resolve the issues.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 14:20:00
-
-

*Thread Reply:* @Ross Turk Thank you very much for your help. The latest version of dbt couldn't work. But version 0.20.0 works for this problem.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 14:22:42
-
-

*Thread Reply:* Hmm. Interesting, I remember when dbt 1.0 came out we fixed a very similar issue: https://github.com/OpenLineage/OpenLineage/pull/397

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 14:25:17
-
-

*Thread Reply:* if you run pip3 list | grep openlineage-dbt, what version does it show?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 14:26:26
-
-

*Thread Reply:* I wonder if you have somehow ended up with an older version of the integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 14:33:43
-
-

*Thread Reply:* it is 0.1.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 14:34:23
-
-

*Thread Reply:* is it 0.1.0 the older version of openlineage ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 14:43:14
-
-

*Thread Reply:* ❯ pip3 list | grep openlineage-dbt -openlineage-dbt 0.6.2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 14:43:26
-
-

*Thread Reply:* the latest is 0.6.2 - that might be your issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 14:43:59
-
-

*Thread Reply:* How are you going about installing it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 18:35:26
-
-

*Thread Reply:* @Ross Turk. I follow instruction from open lineage "pip3 install openlineage-dbt"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 18:36:00
-
-

*Thread Reply:* Hm! Interesting. I did the same thing to get 0.6.2.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 18:51:36
-
-

*Thread Reply:* @Ross Turk Yes. I have tried to reinstall and clear cache but it still install 0.1.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 18:53:07
-
-

*Thread Reply:* But thanks for the version. I reinstall 0.6.2 version by specify the version

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-02 17:37:59
-
-

@Ross Turk @Maciej Obuchowski FYI the sql parser also seems not to return any inputs or outpus for queries that have subqueries -Example -INSERT OVERWRITE TABLE mytable - PARTITION (ds = '2022-03-31') - SELECT - ** - FROM - (SELECT ** FROM table2) a -INSERT OVERWRITE TABLE mytable - PARTITION (ds = '2022-03-31') - SELECT - ** - FROM - (SELECT ** FROM table2 - UNION - SELECT ** FROM table3 - UNION ALL - SELECT ** FROM table4) a

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-03 15:07:09
-
-

*Thread Reply:* they'll work with new parser - added test for those

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-03 15:07:39
-
-

*Thread Reply:* btw, thank you very much for notifying us about multiple bugs @Marco Diaz!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-03 15:20:55
-
-

*Thread Reply:* @Maciej Obuchowski thank you for making sure these cases are taken into account. I am getting more familiar with the Open lineage code as i build my extractors. If I see anything else I will let you know. Any ETA on the new parser release date?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-03 15:55:28
-
-

*Thread Reply:* it should be week-two, unless anything comes up

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-03 17:10:02
-
-

*Thread Reply:* I see. Keeping my fingers crossed this is the only thing delaying me right now.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-02 20:27:37
-
-

Also what would happen if someone uses a CTE in the SQL? Is the parser taken those cases in consideration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-03 15:02:13
-
-

*Thread Reply:* current one handles cases where you have one CTE (like this test) but not multiple - next one will handle arbitrary number of CTEs (like this test)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-04 10:54:47
-
-

Agenda items are requested for the next OpenLineage Technical Steering Committee meeting on Wednesday, April 13. Please reply here or ping me with your items!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-04 11:11:53
-
-

*Thread Reply:* I've mentioned it before but I want to talk a bit about new SQL parser

- - - -
- 🙌 Will Johnson, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-04 13:25:17
-
-

*Thread Reply:* Will the parser be released after the 13?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-08 11:47:05
-
-

*Thread Reply:* @Michael Robinson added additional item to Agenda - client transports feature that we'll have in next release

- - - -
- 🙌 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-08 12:56:44
-
-

*Thread Reply:* Thanks, Maciej

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sukanya Patra - (Sukanya_Patra@mckinsey.com) -
-
2022-04-05 02:39:59
-
-

Hi Everyone,

- -

I have come across OpenLineage at Data Council Austin, 2022 and am curious to try it out. I have reviewed the Getting Started section (https://openlineage.io/getting-started/) of OpenLineage docs but couldn't find clear reference documentation for using the API -• Are there any swagger API docs or equivalent dedicated for OpenLineage API? There is some reference docs of Marquez API: https://marquezproject.github.io/marquez/openapi.html#tag/Lineage -Secondly are there any means to use Open Lineage independent of Marquez? Any pointers would be appreciated.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Patrick Mol - (patrick.mol@prolin.com) -
-
2022-04-05 10:28:08
-
-

*Thread Reply:* I had kind of the same question. -I found https://marquezproject.github.io/marquez/openapi.html#tag/Lineage -With some of the entries marked Deprecated, I am not sure how to proceed.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-05 11:55:35
-
-

*Thread Reply:* Hey folks, are you looking for the OpenAPI specification found here?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-05 15:33:23
-
-

*Thread Reply:* @Patrick Mol, Marquez's deprecated endpoints were the old methods for creating lineage (making jobs, dataset, and runs independently), they were deprecated because we moved over to using the OpenLineage spec for all lineage collection purposes.

- -

The GET methods for jobs/datasets/etc are still functional

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sarat Chandra - (saratchandra9494@gmail.com) -
-
2022-04-05 21:10:39
-
-

*Thread Reply:* Hey John,

- -

Thanks for sharing the OpenAPI docs. Was wondering if there are any means to setup OpenLineage API that will receive events without a consumer like Marquez or is it essential to always pair with a consumer to receive the events?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-05 21:47:13
-
-

*Thread Reply:* the OpenLineage integrations don’t have any way to recieve events, since they’re designed to send events to other apps - what were you expecting OpenLinege to do?

- -

Marquez is our reference implementation of an OpenLineage consumer, but egeria also has a functional endpoint

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Patrick Mol - (patrick.mol@prolin.com) -
-
2022-04-06 09:53:31
-
-

*Thread Reply:* Hi @John Thomas, -Would creation of Sources and Datasets have an equivalent in the OpenLineage specification ? -Sofar I only see the Inputs and Outputs in the Run Event spec.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-06 11:31:10
-
-

*Thread Reply:* Inputs and outputs in the OL spec are Datasets in the old MZ spec, so they're equivalent

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-05 14:24:50
-
-

Hey Guys,

- -

The BaseExtractor is working fine with operators that are derived from Airflow BaseOperator. However for operators derived from LivyOperator the BaseExtractor does not seem to work. Is there a fix for this? We use livyoperator to run sparkjobs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-05 15:16:34
-
-

*Thread Reply:* Hi Marco - it looks like LivyOperator itself does derive from BaseOperator, have you seen any other errors around this problem?

- -

@Maciej Obuchowski might be more help here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-05 15:21:03
-
-

*Thread Reply:* It is the operators that inherit from LivyOperator. It doesn't find the parameters like sql, connection etc

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-05 15:25:42
-
-

*Thread Reply:* My guess is that operators that inherit from other operators (not baseoperator) will have the same problem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-05 15:32:13
-
-

*Thread Reply:* interesting! I'm not sure about that. I can look into it if I have time, but Maciej is definitely the person who would know the most.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-06 15:49:48
-
-

*Thread Reply:* @Marco Diaz I wonder - perhaps it would be better to instrument spark with OpenLineage. It doesn’t seem that Airflow will know much about what’s happening underneath here. Have you looked into openlineage-spark?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 15:51:57
-
-

*Thread Reply:* I have not tried that library yet. I need to see how it implement because we have several spark custom operators that use livy

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 15:52:59
-
-

*Thread Reply:* Do you have any examples?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-06 15:54:01
-
-

*Thread Reply:* there is a good blog post from @Michael Collado: https://openlineage.io/blog/openlineage-spark/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-06 15:54:37
-
-

*Thread Reply:* and the doc page here has a good overview: -https://openlineage.io/integration/apache-spark/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 16:38:15
-
-

*Thread Reply:* is this all we need to pass? -spark-submit --conf "spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener" \ - --packages "io.openlineage:openlineage_spark:0.2.+" \ - --conf "spark.openlineage.host=http://&lt;your_ol_endpoint&gt;" \ - --conf "spark.openlineage.namespace=my_job_namespace" \ - --class com.mycompany.MySparkApp my_application.jar

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 16:38:49
-
-

*Thread Reply:* If so, yes our operators have a way to pass configurations to spark and we may be able to implement it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-04-06 16:41:27
-
-

*Thread Reply:* Looks right to me

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 16:42:03
-
-

*Thread Reply:* Will give it a try

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 16:42:50
-
-

*Thread Reply:* Do we have to install the library on the spark side or the airflow side?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 16:42:58
-
-

*Thread Reply:* I assume is the spark side

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-04-06 16:44:25
-
-

*Thread Reply:* The —packages argument tells spark where to get the jar (you'll want to upgrade to 0.6.1)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 16:44:54
-
-

*Thread Reply:* sounds good

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-04-06 00:04:14
-
-

Hi, I saw there was some work done for integrating OpenLineage with Azure Purview - - - -

-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-06 04:54:27
-
-

*Thread Reply:* @Will Johnson

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-07 12:43:27
-
-

*Thread Reply:* Hey @Varun Singh! We are building a github repository that deploys a few resources that will support a limited number of Azure data sources being pushed into Azure Purview. You can expect a public release near the end of the month! Feel free to direct message me if you'd like more details!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-06 15:05:39
-
-

The next OpenLineage Technical Steering Committee meeting is Wednesday, April 13! Meetings are on the second Wednesday of each month from 9:00 to 10:00am PT. -Join us on Zoom: https://astronomer.zoom.us/j/87156607114?pwd=a3B0K210dnRaQmdkaFdGMytBREZEQT09 -All are welcome. -Agenda: -• OpenLineage 0.6.2 release overview -• Airflow integration update -• Dagster integration retrospective -• Open discussion -Notes: https://tinyurl.com/openlineagetsc

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
slackbot - -
-
2022-04-06 21:40:16
-
-

This message was deleted.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-07 01:00:43
-
-

*Thread Reply:* Are both airflow2 and Marquez installed locally on your computer?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorge Reyes (Zenta Group) - (jorge.reyes@zentagroup.com) -
-
2022-04-07 09:04:19
-
-

*Thread Reply:* yes Marco

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-07 15:00:18
-
-

*Thread Reply:* can you open marquez on -<http://localhost:3000>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-07 15:00:40
-
-

*Thread Reply:* and get a response from -<http://localhost:5000/api/v1/namespaces>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorge Reyes (Zenta Group) - (jorge.reyes@zentagroup.com) -
-
2022-04-07 15:26:41
-
-

*Thread Reply:* yes , i used this guide https://openlineage.io/getting-started and execute un post to marquez correctly

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-07 22:17:34
-
-

*Thread Reply:* In theory you should receive events in jobs under airflow namespace

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-07 14:18:05
-
-

Hi Everyone, Can someone please help me to debug this error ? Thank you very much all

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-07 14:59:06
-
-

*Thread Reply:* It looks like you need to add a payment method to your DBT account

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-04-11 12:46:41
-
-

Hello. Does Airflow's TaskFlow API work with OpenLineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-11 12:50:48
-
-

*Thread Reply:* It does, but admittedly not very well. It can't recognize what you're doing inside your tasks. The good news is that we're working on it and long term everything should work well.

- - - -
- 👍 Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-04-11 12:58:28
-
-

*Thread Reply:* Thanks for the quick reply Maciej.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 09:56:44
-
-

Hi all, watched few of your demos with airflow(astronomer) recently, really liked them. -Thanks for doing those

- -

Questions:

- -
  1. Are there plans to have a hive listener similar to the open-lineage spark integration ?
  2. If not will the sql parser work with the HiveQL ?
  3. Maybe one for presto too ?
  4. Will the run version and dataset version come out of the box or do we need to define some facets ?
  5. I read the blog on facets, is there a tutorial on how to create a sample facet ? -Background: -We have hive, spark jobs and big query tasks running from airflow in GCP Dataproc
  6. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 13:56:53
-
-

*Thread Reply:* Hi Sandeep,

- -

1&3: We don't currently have Hive or Presto on the roadmap! The best way to start the conversation around them would be to create a proposal in the OpenLineage repo, outlining your thoughts on implementation and benefits.

- -

2: I'm not familiar enough with HiveQL, but you can read about the new SQL parser we're implementing here

- -
  1. you can see the Standard Facets here - Dataset Version is included out of the box, but Run Version would have to be defined.

  2. the best place to start looking into making facets is the Spec doc here. We don't have a dedicated tutorial, but if you have more specific questions please feel free to reach out again on slack

  3. -
- - - -
- 👍 sandeep -
- -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 15:39:23
-
-

*Thread Reply:* Thank you John -The standard facets links to the github issues currently

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 15:40:33
- -
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 15:41:01
-
-

*Thread Reply:* Will check it out thank you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-12 10:37:58
-
-

Reminder: this month’s OpenLineage TSC meeting is tomorrow, 4/13, at 9 am PT! https://openlineage.slack.com/archives/C01CK9T7HKR/p1649271939878419

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 15:43:29
-
-

I setup the open-lineage spark integration for spark(dataproc) tasks from airflow. It’s able to post data to the marquez end point and I see the job information in Marquez UI.

- -

I don’t see any dataset information in it, I see just the jobs ? Is there some setup I need to do or something else I need to configure ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 16:08:30
-
-

*Thread Reply:* is there anything in your marquez-api logs that might indicate issues?

- -

What guide did you follow to setup the spark integration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 16:10:07
-
-

*Thread Reply:* Followed this guide https://openlineage.io/integration/apache-spark/ and used the spark-defaults.conf approach

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 16:11:04
-
-

*Thread Reply:* The logs from dataproc side show no errors, let me check from the marquez api side -To confirm, we should be able to see the datasets from the marquez UI with the spark integration right ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 16:11:50
-
-

*Thread Reply:* I'm not super familiar with the spark integration, since I work more with airflow - I'd start with looking through the readme for the spark integration here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 16:14:44
-
-

*Thread Reply:* Hmm, the readme says it aims to generate the input and output datasets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-12 16:40:38
-
-

*Thread Reply:* Are you looking at the same namespace?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 16:40:51
-
-

*Thread Reply:* Yes, the same one where I can see the job

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 16:54:49
-
-

*Thread Reply:* Tailing the API logs and rerunning the spark job now to hopefully catch errors if any, will ping back here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 17:01:10
-
-

*Thread Reply:* Don’t see any failures in the logs, any suggestions on how to debug this ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 17:08:24
-
-

*Thread Reply:* I'd next set up a basic spark notebook and see if you can't get it to send dataset information on something simple in order to check if it's a setup issue or a problem with your spark job specifically

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 17:14:43
-
-

*Thread Reply:* ok, that sounds good, will try that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 17:16:06
-
-

*Thread Reply:* before that, I see that spark-lineage integration posts lineage to the api -https://marquezproject.github.io/marquez/openapi.html#tag/Lineage/paths/~1lineage/post -We don’t seem to add a DataSet in this, does marquez internally create this “dataset” based on Output and fields ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 17:16:34
-
-

*Thread Reply:* yeah, you should be seeing "input" and "output" in the runEvents - that's where datasets come from

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 17:17:00
-
-

*Thread Reply:* I'm not sure if it's a problem with your specific spark job or with the integration itself, however

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 17:19:16
-
-

*Thread Reply:* By runEvents, do you mean a job Object or lineage Object ? -The integration seems to be only POSTing lineage objects

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 17:20:34
-
-

*Thread Reply:* yep, a runEvent is body that gets POSTed to the /lineage endpoint:

- -

https://openlineage.io/docs/openapi/

- - - -
- 👍 sandeep -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-12 17:41:01
-
-

*Thread Reply:* > Yes, the same one where I can see the job -I think you should look at other namespace, which name depends on what systems you're actually using

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 17:48:24
-
-

*Thread Reply:* Shouldn’t the dataset would be created in the same namespace we define in the spark properties?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-15 10:19:06
-
-

*Thread Reply:* I found few datasets in the table location, I ran it in a similar (hive metastore, gcs, sparksql and scala spark jobs) setup to the one mentioned in this post https://openlineage.slack.com/archives/C01CK9T7HKR/p1649967405659519

-
- - -
- - - } - - Will Johnson - (https://openlineage.slack.com/team/U02H4FF5M36) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 15:49:46
-
-

Is this the correct place for this Q or should I reach out to Marquez slack ? -I followed this post https://openlineage.io/integration/apache-spark/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-14 16:16:45
-
-

Before I create an issue around it, maybe I'm just not seeing it in Databricks. In the Spark Integration, does OpenLineage report Hive Metastore tables or it ONLY reports the file path?

- -

For example, if I have a Hive table called default.myTable stored at LOCATION /usr/hive/warehouse/default/mytable.

- -

For a query that reads a CSV file and inserts into default.myTable, would I see an output of default.myTable or /usr/hive/warehoues/default/mytable?

- -

We want to include a link between the physical path and the hive metastore table but it seems that OpenLineage (at least on Databricks) only reports the physical path with the table name showing up in the catalog but not as a facet.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-15 10:17:55
-
-

*Thread Reply:* This was my experience as well, I was under the impression we would see the table as a dataset. -Looking forward to understanding the expected behavior

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-15 10:39:34
-
-

*Thread Reply:* relevant: https://github.com/OpenLineage/OpenLineage/issues/435

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-15 12:36:08
-
-

*Thread Reply:* Ah! Thank you both for confirming this! And it's great to see the proposal, Maciej!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-06-10 12:37:41
-
-

*Thread Reply:* Is there a timeline around when we can expect this fix ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-10 12:46:47
-
-

*Thread Reply:* Not a simple fix, but I guess we'll start working on this relatively soon.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-06-10 13:10:31
-
-

*Thread Reply:* I see, thanks for the update ! We are very much interested in this feature.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-15 15:42:22
-
-

@channel A significant number of us have a conflict with the current TSC meeting day/time, so, unfortunately, we need to reschedule the meeting. When you have a moment, please share your availability here: https://doodle.com/meeting/participate/id/ejRnMlPe. Thanks in advance for your input!

-
-
doodle.com
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Arturo - (ggrmos@gmail.com) -
-
2022-04-19 13:35:23
-
-

Hello everyone, I'm learning Openlineage, I finally achieved the connection between Airflow 2+ and Openlineage+Marquez. The issue is that I don't see nothing on Marquez. Do I need to modify current airflow operators?

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-19 13:40:54
-
-

*Thread Reply:* You probably need to change dataset from default

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Arturo - (ggrmos@gmail.com) -
-
2022-04-19 13:47:04
-
-

*Thread Reply:* I click it on everything 😕 I manually (joining to the pod and send curl to the marquez local endpoint) created a namespaces to check if there is a network issue I was ok, I created a namespaces called: data-dev . The airflow is mounted over k8s using helm chart. -``` config: - AIRFLOWWEBSERVERBASEURL: "http://airflow.dev.test.io" - PYTHONPATH: "/opt/airflow/dags/repo/config" - AIRFLOWAPIAUTHBACKEND: "airflow.api.auth.backend.basicauth" - AIRFLOWCOREPLUGINSFOLDER: "/opt/airflow/dags/repo/plugins" - AIRFLOWLINEAGEBACKEND: "openlineage.lineage_backend.OpenLineageBackend"

- -

. -. -. -.

- -

extraEnv: - - name: OPENLINEAGEURL - value: http://marquez-dev.data-dev.svc.cluster.local - - name: OPENLINEAGENAMESPACE - value: data-dev```

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-19 15:16:47
-
-

*Thread Reply:* I think answer is somewhere in airflow logs 🙂 -For some reason, OpenLineage events aren't send to Marquez.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Arturo - (ggrmos@gmail.com) -
-
2022-04-20 11:08:09
-
-

*Thread Reply:* Thanks, finally was my error .. I created a dummy dag to see if maybe it's an issue over the dag and now I can see something over Marquez

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-20 08:15:32
-
-

One really novice question - there doesn't seem to be a way of deleting lineage elements (any of them)? While I can imagine that in production system we want to keep history, it's not practical while testing/developing. I'm using throw-away namespaces to step around the issue. Is there a better way, or alternatively - did I miss an API somewhere?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-20 08:20:35
-
-

*Thread Reply:* That's more of a Marquez question 🙂 -We have a long-standing issue to add that API https://github.com/MarquezProject/marquez/issues/1736

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-20 09:32:19
-
-

*Thread Reply:* I see it already got skipped for 2 releases, and my only conclusion is that people using Marquez don't make mistakes - ergo, API not needed 🙂 Lets see if I can stick around the project long enough to offer a bit of help, now I just need to showcase it and get interest in my org.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dan Mahoney - (dan.mahoney@sphericalanalytics.io) -
-
2022-04-20 10:08:33
-
-

Good day all. I’m trying out the openlineage-dagster plugin -• I’ve got dagit, dagster-daemon and marquez running locally -• The openlineagesensor is recognized in dagit and the daemon. -But, when I run a job, I see the following message in the daemon’s shell: -Sensor openlineage_sensor skipped: Last cursor: {"last_storage_id": 9, "running_pipelines": {"97e2efdf-9499-4ffd-8528-d7fea5b9362c": {"running_steps": {}, "repository_name": "hello_cereal_repository"}}} -I’ve attached my repos.py and serialjob.py. -Any thoughts?

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
David - (drobin1437@gmail.com) -
-
2022-04-20 10:40:03
-
-

Hi All, -I am walking through the curl examples on this page and have a question on the first curl example: -https://openlineage.io/getting-started/ -The curl command completes, and I can see the input file and job in the namespace, but the lineage graph does not show the input file connected as an input to the job. This only seems to happen after the job is marked complete.

- -

Is there a way to have a running job show connections to its input files in the lineage? -Thanks!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-20 18:06:29
-
-

Hi Team, we are using spark as a service, and we are planning to integrate open lineage spark listener and looking at the below params that we need to pass, we don't know the name of the spark cluster, is the spark.openlineage.namespace conf param mandatory? -spark-submit --conf "spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener" \ - --packages "io.openlineage:openlineage_spark:0.2.+" \ - --conf "spark.openlineage.host=http://&lt;your_ol_endpoint&gt;" \ - --conf "spark.openlineage.namespace=my_job_namespace" \ - --class com.mycompany.MySparkApp my_application.jar

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-20 18:11:19
-
-

*Thread Reply:* Namespace is defined by you, it does not have to be name of the spark cluster.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-20 18:11:42
-
-

*Thread Reply:* And I definitely recommend to use newer version than 0.2.+ 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-20 18:13:32
-
-

*Thread Reply:* oh i see that someone mentioned that it has to be replaced with name of the spark clsuter

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-20 18:13:57
-
-

*Thread Reply:* https://openlineage.slack.com/archives/C01CK9T7HKR/p1634089656188400?thread_ts=1634085740.187700&cid=C01CK9T7HKR

-
- - -
- - - } - - Julien Le Dem - (https://openlineage.slack.com/team/U01DCLP0GU9) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-20 18:19:19
-
-

*Thread Reply:* @Maciej Obuchowski may i know if i can add the --packages "io.openlineage:openlineage_spark:0.2.+" as part of the spark jar file, that meant as part of the pom.xml

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-21 03:54:25
-
-

*Thread Reply:* I think it needs to run on the driver

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-21 05:53:34
-
-

Hello, -when looking through Marquez API it seems that most individual-element creation APIs are marked as deprecated and are going to be removed by 0.25, with a point of switching to open lineage. That makes POST to /api/v1/lineage the only creation point of elements, but OpenLineage API is very limited in attributes that can be passed.

- -

Is that intended to stay that way? One practical question/example: how do we create a job of type STREAMING, when OL API only allows to pass name, namespace and facets. Do we now move all properties into facets?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-21 07:16:44
-
-

*Thread Reply:* > OpenLineage API is very limited in attributes that can be passed. -Can you specify where do you think it's limited? The way to solve that problems would be to evolve OpenLineage.

- -

> One practical question/example: how do we create a job of type STREAMING, -So, here I think the question is more how streaming jobs differ from batch jobs. One obvious difference is that output of the job is continuous (in practice, probably "microbatched" or commited on checkpoint). However, deprecated Marquez API didn't give us tools to properly indicate that. On the contrary, OpenLineage with different event types allows us to properly do that. -> Do we now move all properties into facets? -Basically, yes. Marquez should handle specific facets. For example, https://github.com/MarquezProject/marquez/pull/1847

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-21 07:23:11
-
-

*Thread Reply:* Hey Maciej

- -

first off - thanks for being active on the channel!

- -

> So, here I think the question is more how streaming jobs differ from batch jobs -Not really. I just gave an example of how would you express a specific job type creation which can be done with https://marquezproject.github.io/marquez/openapi.html#tag/Jobs/paths/~1namespaces~1{namespace}~1jobs~1{job}/put|/api/v1/namespaces/.../jobs/... , by passing the type field which is required. In the call to /api/v1/lineage the job field offers just to specify (namespace, name), but no other attributes.

- -

> However, deprecated Marquez API didn't give us tools to properly indicate that. On the contrary, OpenLineage with different event types allows us to properly do that. -I have the feeling I'm still missing some key concepts on how OpenLineage is designed. I think I went over the API and documentation, but trying to use just OpenLineage failed to reproduce mildly complex chain-of-job scenarios, and when I took a look how Marquez seed demo is doing it - it was heavily based on deprecated API. So, I'm kinda lost on how to use OpenLineage.

- -

I'm looking forward to some open-public meeting, as I don't think asking these long questions on chat really works. 😞 -Any pointers are welcome!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-21 07:53:59
-
-

*Thread Reply:* > I just gave an example of how would you express a specific job type creation -Yes, but you're trying to achieve something by passing this parameter or creating a job in a certain way. We're trying to cover everything in OpenLineage API. Even if we don't have everything, the spec from the beginning is focused to allow emitting custom data by custom facet mechanism.

- -

> I have the feeling I'm still missing some key concepts on how OpenLineage is designed. -This talk by @Julien Le Dem is a great place to start: https://www.youtube.com/watch?v=HEJFCQLwdtk

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-21 11:29:20
-
-

*Thread Reply:* > Any pointers are welcome! -BTW: OpenLineage is an open standard. Everyone is welcome to contribute and discuss. Every feedback ultimately helps us build better systems.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-22 03:32:48
-
-

*Thread Reply:* I agree, but for now I'm more likely to be in the I didn't get it category, and not in the brilliant new idea category 🙂

- -

My temporary goal is to go over the documentation and to write the gaps that confused me (and the solutions) and maybe publish that as an article for wider audience. So far I realized that: -• I don't get the naming convention - it became clearer that it's important with the Naming examples, but more info is needed -• I mis-interpret the namespaces. I was placing datasources and jobs in the same namespace which caused a lot of issues until I started using different ones. Not sure why... So now I'm interpreting namespaces=source as suggested by the naming convention -• JSON schema actually clarified things a lot, but that's not the most reader-friendly of resources, so surely there should be a better one -• I was questioning whether to move away from Marquez completely and go with DataHub, but for my scenario Marquez (with limitations outstanding) is still most suitable -• Marquez for some reason does not tolerate the datetimes if they're missing the 'T' delimiter in the ISO, which caused a lot of trial-and-error because the message is just "JSON parsing failed" -• Marquez doesn't give you (at least by default) meaningful OpenLineage parsing errors, so running examples against it is a very slow learning process

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Karatuğ Ozan BİRCAN - (karatugo@gmail.com) -
-
2022-04-21 10:20:55
-
-

Hi everyone,

- -

I'm running the Spark Listener on Databricks. It works fine for the event emit part for a basic Databricks SQL Create Table query. Nevertheless, it throws a NullPointerException exception after sending lineage successfully.

- -

I tried to debug a bit. Looks like it's thrown at the line: -QueryExecution queryExecution = SQLExecution.getQueryExecution(executionId); -So, does this mean that the listener can't get the query exec from Spark SQL execution?

- -

Please see the logs in the thread. Thanks.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Karatuğ Ozan BİRCAN - (karatugo@gmail.com) -
-
2022-04-21 10:21:33
-
-

*Thread Reply:* Driver logs from Databricks:

- -

```22/04/21 14:05:07 INFO EventEmitter: Lineage completed successfully: ResponseMessage(responseCode=200, body={}, error=null) {"eventType":"COMPLETE",[...], "schemaURL":"https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunEvent"}

- -

22/04/21 14:05:07 ERROR AsyncEventQueue: Listener OpenLineageSparkListener threw an exception -java.lang.NullPointerException - at io.openlineage.spark.agent.lifecycle.ContextFactory.createSparkSQLExecutionContext(ContextFactory.java:43) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$getSparkSQLExecutionContext$8(OpenLineageSparkListener.java:221) - at java.util.HashMap.computeIfAbsent(HashMap.java:1127) - at java.util.Collections$SynchronizedMap.computeIfAbsent(Collections.java:2674) - at io.openlineage.spark.agent.OpenLineageSparkListener.getSparkSQLExecutionContext(OpenLineageSparkListener.java:220) - at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:143) - at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:135) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:102) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:119) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:103) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1588) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-21 11:32:37
-
-

*Thread Reply:* @Karatuğ Ozan BİRCAN are you running on Spark 3.2? If yes, then new release should have fixed your problem: https://github.com/OpenLineage/OpenLineage/issues/609

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Karatuğ Ozan BİRCAN - (karatugo@gmail.com) -
-
2022-04-21 11:33:15
-
-

*Thread Reply:* Spark 3.1.2 with Scala 2.12

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Karatuğ Ozan BİRCAN - (karatugo@gmail.com) -
-
2022-04-21 11:33:50
-
-

*Thread Reply:* In fact, I couldn't make it work in Spark 3.2. But I'll test it again. Thanks for the info.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vinith Krishnan US - (vinithk@nvidia.com) -
-
2022-05-20 16:15:47
-
-

*Thread Reply:* Has this been resolved? -I am facing the same issue with spark 3.2.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ben - (ben@meridian.sh) -
-
2022-04-21 11:51:33
-
-

Does anyone have thoughts on the difference between the sourceCode and sql job facets - and whether we’d expect to ever see both on a particular job?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-21 15:34:24
-
-

*Thread Reply:* I don't think that the facets are particularly strongly defined, but I would expect that it could be possible to see both on a pythonOperator that's executing SQL queries, depending on how the extractor was written

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ben - (ben@meridian.sh) -
-
2022-04-21 15:34:45
-
-

*Thread Reply:* ah sure, that makes sense

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Xiaoyong Zhu - (xiaoyzhu@outlook.com) -
-
2022-04-21 15:14:03
-
-

Just get to know open lineage and it's really a great project! One question for the granularity on Spark + Openlineage - is it possible to track column level lineage (rather than the table lineage that's currently there)? Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-21 16:17:59
-
-

*Thread Reply:* We're actively working on it - expect it in next OpenLineage release. https://github.com/OpenLineage/OpenLineage/pull/645

-
- - - - - - - -
-
Labels
- enhancement, integration/spark -
- -
-
Milestone
- <a href="https://github.com/OpenLineage/OpenLineage/milestone/4">0.8.0</a> -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Xiaoyong Zhu - (xiaoyzhu@outlook.com) -
-
2022-04-21 16:24:16
-
-

*Thread Reply:* nice -thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Xiaoyong Zhu - (xiaoyzhu@outlook.com) -
-
2022-04-21 16:25:19
-
-

*Thread Reply:* Assuming we don't need to do anything except using the next update? Or do you expect that we need to change quite a lot of configs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-21 17:44:46
-
-

*Thread Reply:* No, it should be automatic.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-24 14:37:33
-
-

Hey, Team - We are starting to get requests for other, non Microsoft data sources (e.g. Teradata) for the Spark Integration. We (I) don't have a lot of bandwidth to fill every request but I DO want to help these people new to OpenLineage get started.

- -

Has anyone on the team written up a blog post about extending open lineage or is this an area that we could collaborate on for the OpenLineage blog? Alternatively, is it a bad idea to write this down since the internals have changed a few times over the past six months?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-25 03:52:20
-
-

*Thread Reply:* Hey Will,

- -

while I would not consider myself in the team, I'm dabbling in OL, hitting walls and learning as I go. If I don't have enough experience to contribute, I'd be happy to at least proof-read and point out things which are not clear from a novice perspective. Let me know!

- - - -
- 👍 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-25 13:49:48
-
-

*Thread Reply:* I'll hold you to that @Mirko Raca 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-25 17:18:02
-
-

*Thread Reply:* I will support! I’ve done a few recent presentations on the internals of OpenLineage that might also be useful - maybe some diagrams can be reused.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-25 17:56:44
-
-

*Thread Reply:* Any chance you have links to those old presentations? Would be great to build off of an existing one and then update for some of the new naming conventions.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-25 18:00:26
-
-

*Thread Reply:* the most recent one was an astronomer webinar

- -

happy to share the slides with you if you want 👍 here’s a PDF:

- -
- - - - - - - -
- - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-25 18:00:44
-
-

*Thread Reply:* the other ones have not been public, unfortunately 😕

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-25 18:02:24
-
-

*Thread Reply:* architecture, object model, run lifecycle, naming conventions == the basics IMO

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-26 09:14:42
-
-

*Thread Reply:* Thank you so much, Ross! This is a great base to work from.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-26 14:49:04
-
-

Your periodical reminder that Github stars are one of those trivial things that make a significant difference for an OS project like ours. Have you starred us yet?

- -
- - - - - - - -
- - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-26 15:02:10
-
-

Hi All, I have a simple spark job from converting csv to parquet and I am using https://openlineage.io/integration/apache-spark/ to generate lineage events and posting to maquez but I see that both events (START & COMPLETE) have the same event except eventType, i thought we should see outputsarray in the complete event right?

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-27 00:36:05
-
-

*Thread Reply:* For a spark job like that, you'd have at least four events:

- -
  1. START event - This represents the SparkSQLExecutionStart
  2. START event #2 - This represents a JobStart event
  3. COMPLET event - This represents a JobEnd event
  4. COMPLETE event #2 - This represents a SparkSQLExectionEnd event -For CSV to Parquet, you should be seeing inputs and outputs that match across each event. OpenLineage scans the logical plan and reports back the inputs / outputs / metadata across the different facets for each event BECAUSE each event might give you some different information.
  5. -
- -

For example, the JobStart event might give you access to properties that weren't there before. The JobEnd event might give you information about how many rows were written.

- -

Marquez / OpenLineage expects that you collect all of the resulting events and then aggregate the results.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-27 21:51:07
-
-

*Thread Reply:* Hi @Will Johnson good evening. We are seeing an issue while using spark integaration and found that when we provide openlinegae.host property a value like <http://lineage.com/common/marquez> where my marquez api is running I see that the below line is modifying the host to become <http://lineage.com/api/v1/lineage> instead of <http://lineage.com/common/marquez/api/v1/lineage> which is causing the problem -https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/src/main/common/java/io/openlineage/spark/agent/EventEmitter.java#L49 -I see that it has been added 5 months ago and released it as part of 0.4.0, is there anyway that we can fix the line to be like below -this.lineageURI = - new URI( - hostURI.getScheme(), - hostURI.getAuthority(), - hostURI.getPath() + uriPath, - queryParams, - null);

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-28 14:31:42
-
-

*Thread Reply:* Can you open up a Github issue for this? I had this same issue and so our implementation always has to feature the /api/v1/lineage. The host config is literally the host. You're specifying a host and path. I'd be happy to see greater flexibility with the api endpoint but the /v1/ is important to know which version of OpenLineage's specification you're communicating with.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Arturo - (ggrmos@gmail.com) -
-
2022-04-27 14:12:38
-
-

Hi all, guys ... anyone have an example of a custom extractor with different source-destination, I'm trying to build an extractor from a custom operator like mysql_to_s3

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-27 15:10:24
-
-

*Thread Reply:* @Michael Collado made one for a recent webinar:

- -

https://gist.github.com/collado-mike/d1854958b7b1672f5a494933f80b8b58

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-27 15:11:38
-
-

*Thread Reply:* it's not exactly for an operator that has source-destination, but it shows how to format lineage events for a few different kinds of datasets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Arturo - (ggrmos@gmail.com) -
-
2022-04-27 15:51:32
-
-

*Thread Reply:* Thanks! I'm going to take a look

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-27 23:04:18
-
-

A release has been requested by @Howard Yoo and @Ross Turk pending the merging of PR 644. Are there any +1s?

- - - -
- 👍 Julien Le Dem, Maciej Obuchowski, Ross Turk, Conor Beverland -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-28 17:44:00
-
-

*Thread Reply:* Thanks for your input. The release is authorized. Look for it tomorrow!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-28 14:29:13
-
-

Hi All, We are seeing the below exception when we integrate the openlineage-spark into our spark job, can anyone share pointers -Exception uncaught: java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.SerializationConfig.hasExplicitTimeZone()Z at openlineage.jackson.datatype.jsr310.ser.InstantSerializerBase.formatValue(InstantSerializerBase.java:144) at openlineage.jackson.datatype.jsr310.ser.InstantSerializerBase.serialize(InstantSerializerBase.java:103) at openlineage.jackson.datatype.jsr310.ser.ZonedDateTimeSerializer.serialize(ZonedDateTimeSerializer.java:79) at openlineage.jackson.datatype.jsr310.ser.ZonedDateTimeSerializer.serialize(ZonedDateTimeSerializer.java:13) at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:727) at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:719) at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:155) at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480) at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:319) at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3906) at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3220) at io.openlineage.spark.agent.client.OpenLineageClient.executeAsync(OpenLineageClient.java:123) at io.openlineage.spark.agent.client.OpenLineageClient.executeSync(OpenLineageClient.java:85) at <a href="http://io.openlineage.spark.agent.client.OpenLineageClient.post">io.openlineage.spark.agent.client.OpenLineageClient.post</a>(OpenLineageClient.java:80) at <a href="http://io.openlineage.spark.agent.client.OpenLineageClient.post">io.openlineage.spark.agent.client.OpenLineageClient.post</a>(OpenLineageClient.java:75) at <a href="http://io.openlineage.spark.agent.client.OpenLineageClient.post">io.openlineage.spark.agent.client.OpenLineageClient.post</a>(OpenLineageClient.java:70) at io.openlineage.spark.agent.EventEmitter.emit(EventEmitter.java:67) at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:69) at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:90) at java.util.Optional.ifPresent(Optional.java:159) at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:90) at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:81) at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:80) at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91) at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92) at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92) at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87) at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302) at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-28 14:41:10
-
-

*Thread Reply:* What's the spark job that's running - this looks similar to an error that can happen when jobs have a very short lifecycle

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-28 14:47:27
-
-

*Thread Reply:* nothing in spark job, its just a simple csv to parquet conversion file

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-28 14:48:50
-
-

*Thread Reply:* ah yeah that's probably it - when the job is finished before the Openlineage integration can poll it for information this error is thrown. Since the job is very quick it creates a race condition

- - - -
- :gratitude_thank_you: raghanag -
- -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-05-03 17:16:39
-
-

*Thread Reply:* @John Thomas may i know how to solve this kind of issue?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-03 17:20:11
-
-

*Thread Reply:* This is probably an issue with the integration - for now you can either open an issue, or see if you're still getting a subset of events and take it as is. I'm not sure what you could do on your end aside from adding a sleep call or similar

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-05-03 17:21:17
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/src/main/common/java/io/openlineage/spark/agent/OpenLineageSparkListener.java#L151 you meant if we add a sleep in this method this will solve this

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-03 18:44:43
-
-

*Thread Reply:* oh no I meant making sure your jobs don't close too quickly

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-05-06 00:14:15
-
-

*Thread Reply:* Hi @John Thomas we figured out the error that it is indeed causing with conflicted versions and with shadowJar and shading, we are not seeing it anymore.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-29 18:40:41
-
-

@channel The latest release (0.8.1) of OpenLineage is now available, featuring a new TaskInstance listener API for Airflow 2.3+, an HTTP client in the openlineage-java library for emitting run events, support for HiveTableRelation as an input source in the Spark integration, a new SQL parser used by multiple integrations, and bug fixes. For more info, visit https://github.com/OpenLineage/OpenLineage/releases/tag/0.8.1

- - - -
- 🚀 Willy Lulciuc, John Thomas, Minkyu Park, Ross Turk, Marco Diaz, Conor Beverland, Kevin Mellott, Howard Yoo, Peter Hicks, Maciej Obuchowski, Mario Measic -
- -
- 🙌 Francis McGregor-Macdonald, Ross Turk, Marco Diaz, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-04-29 18:41:37
-
-

*Thread Reply:* Amazing work on the new sql parser @Maciej Obuchowski 💯 :firstplacemedal:

- - - -
- 👍 Ross Turk, Howard Yoo, Peter Hicks -
- -
- 🙌 Ross Turk, Howard Yoo, Peter Hicks, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-30 07:54:48
-
-

The May meeting of the TSC will be postponed because most of the TSC will be attending the Astronomer Spring Summit the week of May 9th. Details to follow along with a new meeting day/time for the meeting going forward (thanks to all who responded to the poll!).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hubert Dulay - (hubert.dulay@gmail.com) -
-
2022-05-01 09:25:23
-
-

Are there examples of using openlineage with streaming data pipelines? Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-05-03 04:12:09
-
-

*Thread Reply:* Hi @Hubert Dulay,

- -

while I'm not an expert, I can offer the following: -• Marquez has had the but what I got here - that API is not encouraged -• I personally don't find the run->job metaphor to work nicely with streaming transformation, but I'm using that in my current setup (until someone points me in a better direction 😉 ) -• I register each change of the stream processing as a new "run", which ends immediately - so duration information is lost, but current set of parameters is recorded. It's not pretty, I know. -Maybe stream processing is a scenario to be re-evaluated in OL meetings, or at least clarified?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hubert Dulay - (hubert.dulay@gmail.com) -
-
2022-05-03 21:19:06
-
-

*Thread Reply:* Thanks for the details

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 09:32:23
-
-

Hey OL! My company is in the process of migrating off of Palantir and into Databricks/Azure. There are a couple of business units not wanting to budge due to the built-in data lineage and code reference features Palantir has. I am tasked with researching an alternative data lineage solution and I quickly came across OL. I love what I have read and seen demos of so far and want to do a POC for my org of its capabilities. I was able to set up the Marquez server on a VM and get it talking to Databricks. I also have the iniit script installed on the cluster and I can see from the log4j logs it’s communicating fine (I think). However, I am embarrassed to admit I can’t figure out how the instrumentation works for the databricks notebooks. I ran a simple notebook that loads data, runs a simple transform, and saves the output somewhere but I don’t see any entries in my namespace I configured. I am sure I missed something very obvious somewhere, but are there examples of how to get a simple example into Marquez from databricks? Thanks so much for any guidance you can give!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-02 13:26:52
-
-

*Thread Reply:* Hi Kostikey - this blog has an example with Spark and jupyter, which might be a good place to start!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 14:58:29
-
-

*Thread Reply:* Hi @John Thomas, thanks for the reply. I think I am close but my cluster is unable to talk to the marquez server. After looking at log4j I see the following rows:

- -

22/05/02 18:43:39 INFO SparkContext: Registered listener io.openlineage.spark.agent.OpenLineageSparkListener -22/05/02 18:43:40 INFO EventEmitter: Init OpenLineageContext: Args: ArgumentParser(host=<http://135.170.226.91:8400>, version=v1, namespace=gus-namespace, jobName=default, parentRunId=null, apiKey=Optional.empty, urlParams=Optional[{}]) URI: <http://135.170.226.91:8400/api/v1/lineage>? -22/05/02 18:46:21 ERROR EventEmitter: Could not emit lineage [responseCode=0]: {"eventType":"START","eventTime":"2022-05-02T18:44:08.36Z","run":{"runId":"91fd4e13-52ac-4175-8956-c06d7dee97fc","facets":{"spark_version":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.8.1/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","spark-version":"3.2.1","openlineage_spark_version":"0.8.1"},"spark.logicalPlan":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.8.1/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","plan":[{"class":"org.apache.spark.sql.catalyst.plans.logical.ShowNamespaces","num-children":1,"namespace":0,"output":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"databaseName","dataType":"string","nullable":false,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":4,"jvmId":"eaa0543b_5e04_4f5b_844b_0e4598f019a7"},"qualifier":[]}]]},{"class":"org.apache.spark.sql.catalyst.analysis.ResolvedNamespace","num_children":0,"catalog":null,"namespace":[]}]},"spark_unknown":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.8.1/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","output":{"description":"Unable to serialize logical plan due to: Infinite recursion (StackOverflowError) ... - OpenLineageHttpException(code=0, message=java.lang.RuntimeException: java.util.concurrent.ExecutionException: openlineage.hc.client5.http.ConnectTimeoutException: Connect to <http://135.170.226.91:8400> [/135.170.226.91] failed: Connection timed out, details=java.util.concurrent.CompletionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: openlineage.hc.client5.http.ConnectTimeoutException: Connect to <http://135.170.226.91:8400> [/135.170.226.91] failed: Connection timed out) - at io.openlineage.spark.agent.EventEmitter.emit(EventEmitter.java:68) - at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:69) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:90) - at java.util.Optional.ifPresent(Optional.java:159) - at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:90) - at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:81) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:102) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:119) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:103) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1612) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) -the connection timeout is surprising because I can connect just fine using the example curl code from the same cluster:

- -

%sh -curl -X POST <http://135.170.226.91:8400/api/v1/lineage> \ - -H 'Content-Type: application/json' \ - -d '{ - "eventType": "START", - "eventTime": "2020-12-28T19:52:00.001+10:00", - "run": { - "runId": "d46e465b-d358-4d32-83d4-df660ff614dd" - }, - "job": { - "namespace": "gus2~-namespace", - "name": "my-job" - }, - "inputs": [{ - "namespace": "gus2-namespace", - "name": "gus-input" - }], - "producer": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client>" - }' -Spark config: -spark.openlineage.host <http://135.170.226.91:8400> -spark.openlineage.version v1 -spark.openlineage.namespace gus-namespace -Not sure what is going on, the EventEmitter init log looks like it's right but clearly something is off. Thanks so much for the help

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-02 15:03:40
-
-

*Thread Reply:* hmmm, interesting - if it's easy could you spin both up locally and check that it's just a communication issue? It helps with diagnosis

- -

It might also be a firewall issue, but your cURL should preclude that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 15:05:38
-
-

*Thread Reply:* Since it's Databricks I was having a hard time figuring out how to try locally. Other than just using plain 'ol spark on my laptop and a localhost Marquez...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-02 15:07:13
-
-

*Thread Reply:* hmm, that could be an interesting test to see if it's a databricks issue - the databricks integration is pretty much the same as the spark integration, just with a little bit of a wrapper and the init script

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 15:08:44
-
-

*Thread Reply:* yeah, i was going to try that but it just didnt seem like helpful troubleshooting for exactly that reason... but i may just do that anyways just so i can see something working 🙂 (morale booster)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-02 15:09:22
-
-

*Thread Reply:* oh totally! Network issues are a huge pain in the ass, and if you're still seeing issues locally with spark/mz then we'll know a lot more than we do now 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 15:11:19
-
-

*Thread Reply:* sounds good, i will give it a go!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-05-02 15:16:16
-
-

*Thread Reply:* @Kostikey Mustakas - I think spark.openlineage.version should be equal to 1 not v1.

- -

In addition, is http://135.170.226.91:8400 accessible to Databricks? Could you try doing a %sh command inside of a databricks notebook and see if you can ping that IP address (https://linux.die.net/man/8/ping)?

- -

For your Databricks cluster did you VNET inject it into an existing VNET? If it's in an existing VNET, you should confirm that the VM running marquez can access it. If it's in a non-VNET injected VNET, you probably need to redeploy to a VNET that has that VM or has connectivity to that VM.

-
-
linux.die.net
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 15:19:22
-
-

*Thread Reply:* Ya, know i meant to ask about that. Docs say 1 like you mention: https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark/databricks. I second guessed from this thread https://openlineage.slack.com/archives/C01CK9T7HKR/p1638848249159700.

-
- - -
- - - } - - Dinakar Sundar - (https://openlineage.slack.com/team/U02MQ8E22HF) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 15:23:42
-
-

*Thread Reply:* @Will Johnson, ping fails... this is surprising as the curl command mentioned above works fine.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julius Rentergent - (julius.rentergent@thetradedesk.com) -
-
2022-05-02 15:37:00
-
-

*Thread Reply:* I’m also trying to set up Databricks according to Running Marquez on AWS. Right now I’m stuck on the database part rather than the Marquez part — I can’t connect my EKS cluster to the RDS database which I described in more detail on the Marquez slack.

- -

@Kostikey Mustakas Sorry for the distraction, but I’m curious how you have set up your networking to make the API requests work with Databricks. -Good luck with your issue!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 15:47:17
-
-

*Thread Reply:* @Julius Rentergent We are using Azure and leverage Private Endpoints to connect resources in separate subscriptions. There is a Bastion proxy in place that we can map http traffic through and I have a Load Balancer Inbound NAT rule I setup that maps one our whitelisted port ranges (8400) to 5000.

- - - -
- :gratitude_thank_you: Julius Rentergent -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 20:15:01
-
-

*Thread Reply:* @Will Johnson a little progress maybe... I created a private endpoint and updated dns to point to it. Now I get a 404 Not Found error instead of a timeout

- - - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 20:16:41
-
-

*Thread Reply:* 22/05/03 00:09:24 ERROR EventEmitter: Could not emit lineage [responseCode=404]: {"eventType":"START","eventTime":"2022-05-03T00:09:22.498Z","run":{"runId":"f41575a0-e59d-4cbc-a401-9b52d2b020e0","facets":{"spark_version":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.8.1/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","spark-version":"3.2.1","openlineage_spark_version":"0.8.1"},"spark.logicalPlan":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.8.1/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","plan":[{"class":"org.apache.spark.sql.catalyst.plans.logical.ShowNamespaces","num-children":1,"namespace":0,"output":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"databaseName","dataType":"string","nullable":false,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":4,"jvmId":"aad3656d_8903_4db3_84f0_fe6d773d71c3"},"qualifier":[]}]]},{"class":"org.apache.spark.sql.catalyst.analysis.ResolvedNamespace","num_children":0,"catalog":null,"namespace":[]}]},"spark_unknown":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.8.1/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","output":{"description":"Unable to serialize logical plan due to: Infinite recursion (StackOverflowError) (through reference chain: org.apache.spark.sql.catalyst.expressions.AttributeReference[\"preCanonicalized\"] .... -OpenLineageHttpException(code=null, message={"code":404,"message":"HTTP 404 Not Found"}, details=null) - at io.openlineage.spark.agent.EventEmitter.emit(EventEmitter.java:68)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julius Rentergent - (julius.rentergent@thetradedesk.com) -
-
2022-05-27 00:03:30
-
-

*Thread Reply:* Following up on this as I encounter the same issue with the Openlineage Databricks integration. This issue seems quite malicious as it crashes the Spark Context and requires a restart.

- -

I have marquez running on AWS EKS; I’m using Openlineage 0.8.2 on Databricks 10.4 (Spark 3.2.1) and my Spark config looks like this: -spark.openlineage.host <https://internal-xxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxx.us-east-1.elb.amazonaws.com> -spark.openlineage.namespace default -spark.openlineage.version v1 &lt;- also tried "1" -I can run some simple read and write commands and successfully find the log4j events highlighted in the docs: -INFO SparkContext; -INFO OpenLineageContext; -INFO AsyncEventQueue for each time I run the cell -After doing this a few times I get The spark context has stopped and the driver is restarting. Your notebook will be automatically reattached. -stderr shows a bunch of things. log4j shows the same as for Kostikey: ERROR EventEmitter: [...] Unable to serialize logical plan due to: Infinite recursion (StackOverflowError)

- -

I have one more piece of information which I can’t make much sense of, but hopefully someone else can; if I include the port in the host, I can very reliably crash the Spark Context on the first attempt. So: -<https://internal-xxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxx.us-east-1.elb.amazonaws.com> &lt;- crashes after a couple of attempts, sometimes it takes me a while to reproduce it while repeatedly reading/writing the same datasets -<https://internal-xxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxx.us-east-1.elb.amazonaws.com:80> &lt;- crashes on first try -Any insights would be greatly appreciated! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julius Rentergent - (julius.rentergent@thetradedesk.com) -
-
2022-05-27 00:22:27
-
-

*Thread Reply:* I tried two more things: -• curl works, ping fails, just like in the previous report -• Databricks allows providing spark configs without quotes, whereas quotes are generally required for Spark. So I added the quotes to the host name, but now I’m getting: ERROR OpenLineageSparkListener: Unable to parse open lineage endpoint. Lineage events will not be collected

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2022-05-27 14:00:38
-
-

*Thread Reply:* @Kostikey Mustakas May I ask what is the reason for migration from Palantir? Sorry for this off-topic question!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-30 05:46:27
-
-

*Thread Reply:* @Julius Rentergent created issue on project github: https://github.com/OpenLineage/OpenLineage/issues/795

-
- - - - - - - -
-
Labels
- bug, integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julius Rentergent - (julius.rentergent@thetradedesk.com) -
-
2022-06-01 11:15:26
-
-

*Thread Reply:* Thank you @Maciej Obuchowski. -Just to clarify, the Spark Context crashes with and without port; it’s just that adding the port causes it to crash more quickly (on the 1st attempt).

- -

I will run some more experiments when I have time, and add the results to the ticket.

- -

Edit - added to issue:

- -

I ran some more experiments, this time with a fake host and on OpenLineage 0.9.0, and was not able to reproduce the issue with regards to the port; instead, the new experiments show that Spark 3.2 looks to be involved.

- -

On Spark 3.2.1 / Databricks 10.4 LTS: Using (fake) host http://ac7aca38330144df9.amazonaws.com:5000 crashes when the first notebook cell is evaluated with The spark context has stopped and the driver is restarting. - The same occurs when the port is removed.

- -

On Spark 3.1.2 / Databricks 9.1 LTS: Using (fake) host http://ac7aca38330144df9.amazonaws.com:5000 does not impede the cluster but, reasonably, produces for each lineage event ERROR EventEmitter: Could not emit lineage w/ exception io.openlineage.client.OpenLineageClientException: java.net.UnknownHostException - The same occurs when the port is removed.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-05-02 14:52:09
-
-

@channel The poll results are in, and the new day/time for the monthly TSC meeting is each second Thursday at 10 am PT. The next meeting will take place on Thursday, 5/19, at 10 am PT, due to a conflict with the Astronomer Spring Summit. Future meetings will take place on the second Thursday of each month. Calendar updates will be forthcoming. Thanks!

- - - -
- 🙌 Willy Lulciuc, Mynor Choc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-05-02 15:09:42
-
-

*Thread Reply:* @Michael Robinson - just to be sure, is the 5/19 meeting at 10 AM PT as well?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-05-02 15:14:11
-
-

*Thread Reply:* Yes, and I’ll update the msg for others. Thank you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-05-02 15:16:25
-
-

*Thread Reply:* Thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-05-02 21:45:39
-
-

Hii Team, as i saw marquez is building lineage by java code, from seed command, what should i do to connect with mysql (our database) with credentials and building a lineage for our data?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-05-03 12:40:55
-
-

@here How do we clear old jobs, datasets and namespaces from Marquez?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-05-04 07:04:48
-
-

*Thread Reply:* It seems we can't for now. This was the same question I had last week:

- -

https://github.com/MarquezProject/marquez/issues/1736

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-04 10:56:35
-
-

*Thread Reply:* Seems that it's really popular request 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-03 13:43:56
-
-

Hello, -I'm sending lineage events to astrocloud.datakin DB with the Marquez API. The event is sent- but the metadata for inputs and outputs isn't coming through. Below is an example of the event I'm sending. Not sure if this is the place for this question. Cross-posting to Marquez Slack. -{ - "eventTime": "2022-05-03T17:20:04.151087+00:00", - "run": { - "runId": "2dfc6dcd4011d2a1c3dc1e5861127e5b" - }, - "job": { - "namespace": "from-airflow", - "name": "Postgres_1_to_Snowflake_2.extract" - }, - "producer": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client>", - "inputs": [ - { - "name": "Postgres_1_to_Snowflake_2.extract", - "namespace": "from-airflow" - } - ] -} -Thanks.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-04 11:28:48
-
-

*Thread Reply:* @Mirko Raca pointed out that I was missing eventType.

- -

Mirko Raca : -"From a quick glance - you're missing "eventType": "START", attribute. It's also worth noting that metadata typically shows up after the second event (type COMPLETE)"

- -

thanks again.

- - - -
- 👍 Mirko Raca -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-05-06 05:01:34
-
-

Hii Team, could anyone tell me, to view lineage in marquez do we have to write metadata as a code, or does marquez has a feature to scan the sql code and build a lineage automatically?please clarify my doubt regarding this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Carlos Fernández Rodríguez - (jcfernandez@keedio.com) -
-
2022-05-06 05:26:16
-
-

*Thread Reply:* As far as I understand, OpenLineage has tools to extract metadata from sources. Depend on your source, you could find an integration, if it doesn't exists you should write your own integration (and collaborate with the project)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-05-06 12:59:06
-
-

*Thread Reply:* @Sandeep Bhat take a look at https://openlineage.io/integration - there is some info there on the different integrations that can be used to automatically pull metadata.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-05-06 13:00:39
-
-

*Thread Reply:* The Airflow integration, in particular, uses a SQL parser to determine input/output tables (in cases where the data store can't be queried for that info)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorik - (jorik@scivis.net) -
-
2022-05-12 05:13:01
-
-

Hi all. We are looking at using OpenLineage for capturing some lineage in our custom processing system. I think we got the lineage events understood, but we have often datasets that get appended, or get overwritten by an operation. Is there anything in openlineage that would facilitate making this distinction? (ie. if a set gets overwritten we would be interested in the lineage events from the last overwrite, if it gets appended we would like to have all of these in the display)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-05-12 05:48:43
-
-

*Thread Reply:* To my understanding - datasets model the structure, not the content. So, as long as your table doesn't change number of columns, it's the same thing.

- -

The catch-all would be to create a Dataset facet which would record the distinction between append/overwrite per run. But, while this is supported by the standard, Marquez does not handle custom facets at the moment (I'll happily be corrected).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorik - (jorik@scivis.net) -
-
2022-05-12 06:05:36
-
-

*Thread Reply:* Thanks, that makes sense. We're looking for a way to get the lineage of table contents. We may have to opt for new names on overwrite, or indeed extend a facet to flag these.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorik - (jorik@scivis.net) -
-
2022-05-12 06:06:44
-
-

*Thread Reply:* Use case is compliancy, where we need to show how a certain delivered data product (at a given point in time) was constructed. We have all our transforms/transfers as code, but there are a few parts where datasets get recreated in the process after fixes have been made, and I wouldn't want to bother the auditors with those stray paths

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-12 06:12:09
-
-

*Thread Reply:* We have LifecycleStateChangeDataset facet that captures this information. It's currently emitted when using Spark integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-12 06:13:25
-
-

*Thread Reply:* > But, while this is supported by the standard, Marquez does not handle custom facets at the moment (I'll happily be corrected). -It displays this information when it exists

- - - -
- 🙌 Mirko Raca -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jorik - (jorik@scivis.net) -
-
2022-05-12 06:13:29
-
-

*Thread Reply:* Oh that looks perfect! I completely missed that, thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-05-12 15:46:04
-
-

Are there any examples on how to use this facet ColumnLineageDatasetFacet.json?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-13 05:19:47
-
-

*Thread Reply:* Work with Spark is not yet fully merged

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-05-12 17:49:23
-
-

Hi All, I am trying to see where we can provide owner details when using openlineage-spark configuration, i see only namespace and other config parameters but not the owner. Can we add owner configuration also as part of openlineage-spark like spark.openlineage.owner? Owner will be used to even filter namespaces when showing the jobs or namespaces in Marquez UI.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-05-13 19:07:04
-
-

@channel The next OpenLineage Technical Steering Committee meeting is next Thursday, 5/19, at 10 am PT! Going forward, meetings will take place on the second Thursday of each month at 10 am PT. -Join us on Zoom: -https://astronomer.zoom.us/j/87156607114?pwd=a3B0K210dnRaQmdkaFdGMytBREZEQT09 -All are welcome! -Agenda: -• releases 0.7.1 & 0.8.1 -• column-level lineage -• open lineage -For notes and the agenda visit the wiki: https://tinyurl.com/openlineagetsc

- - - -
- 🙌 Maciej Obuchowski, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Libert - (yan@ioly.fr) -
-
2022-05-16 11:02:23
-
-

Hi all, we are considering using OL to send lineage events from various jobs and places in our company. Since there will be multiple producers, we would like to use Kafka as our main hub for communication. One of our sources will be Airflow (more particularly MWAA, ie airflow in its 2.2.2 version). Is there a way to configure the Airflow lineage backend to send event to kafka instead of Marquez directly? So far, from what I've seen in the docs and in here, the only way would be to create a simple proxy to stream the http events to Kafka. Is it still the case?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-16 11:31:17
-
-

*Thread Reply:* I think you can either use proxy backend: https://github.com/OpenLineage/OpenLineage/tree/main/proxy

- -

or configure OL client to send data to kafka: -https://github.com/OpenLineage/OpenLineage/tree/main/client/python#kafka

- - - -
- 👍 Yannick Libert -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Libert - (yan@ioly.fr) -
-
2022-05-16 12:15:59
-
-

*Thread Reply:* Thank you very much for the useful pointers. The proxy solutions could indeed work in our case but it implies creating another service in front of Kafka, and thus and another layer of complexity to the architecture. If there is another more "native" way of streaming event directly from the Airflow backend that'll be great to know

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-16 12:37:10
-
-

*Thread Reply:* The second link 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Libert - (yan@ioly.fr) -
-
2022-05-17 03:46:03
-
-

*Thread Reply:* Sure, we already implemented the python client for jobs outside airflow and it works great 🙂 -You are saying that there is a way to use this python client in conjonction with the MWAA lineage backend to relay the job events that come with the airflow integration (without including it in the DAGs)? -Our strategy is to use both the airflow backend to collect automatic lineage events without modifying any existing DAGs, and the in-code implementation to allow our data engineers to send their own events if they want to. -The second option works perfectly but the first one is where we struggle a bit, especially with MWAA.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-17 05:24:30
-
-

*Thread Reply:* If you can mount file to MWAA, then yes - it should work with config file option: https://github.com/OpenLineage/OpenLineage/tree/main/client/python#config-file

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Libert - (yan@ioly.fr) -
-
2022-05-17 05:40:45
-
-

*Thread Reply:* Brilliant! I'm going to test that. Thank you Maciej!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-05-17 15:20:58
-
-

A release has been requested. Are there any +1s? Three from committers will authorize. Thanks.

- - - -
- ➕ Maciej Obuchowski, Ross Turk, Willy Lulciuc, Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-05-18 10:33:03
-
-

The OpenLineage TSC meeting is tomorrow at 10am PT! https://openlineage.slack.com/archives/C01CK9T7HKR/p1652483224119229

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 16:23:56
-
-

Hey all, -Do custom extractors work with the taskflow api?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-18 16:34:25
-
-

*Thread Reply:* Hey Tyler - A custom extractor just needs to be able to assemble the runEvents and send the information out to the lineage backends.

- -

If the things you're sending/receiving with TaskFlow are accessible in terms of metadata in the environment the DAG is running in, then you should be able to make one that would work!

- -

This Webinar goes over creating custom extractors for reference.

- -

Does that answer your question?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-18 16:41:16
-
-

*Thread Reply:* Taskflow internally is just PythonOperator. If you'd write extractor that assumes something more than just it being PythonOperator then you'd probably make it work 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:15:52
-
-

*Thread Reply:* Thanks @John Thomas @Maciej Obuchowski, Your answers both make sense. I just keep running into this error in my logs: -[2022-05-18, 20:52:34 UTC] {__init__.py:97} WARNING - Unable to find an extractor. task_type=_PythonDecoratedOperator airflow_dag_id=Postgres_1_to_Snowflake_1_v3 task_id=Postgres_1 airflow_run_id=scheduled__2022-05-18T20:51:34.334045+00:00 -The picture is my custom extractor, it's not doing anything currently as this is just a test.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:16:05
-
-

*Thread Reply:* thanks again for the help yall

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-18 17:16:34
-
-

*Thread Reply:* did you set the environment variable with the path to your extractor?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:16:46
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:17:13
-
-

*Thread Reply:* i believe thats correct @John Thomas

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:18:35
-
-

*Thread Reply:* and the versions im using: -Astronomer Runtime 5.0.0 based on Airflow 2.3.0+astro.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-18 17:25:58
-
-

*Thread Reply:* this might not be the problem, but you should have only one of extract and extract_on_complete - which one are you meaning to use?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:32:26
-
-

*Thread Reply:* ahh thanks John, as of right now extract_on_complete.

- -

This is a similar setup as Michael had in the video.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-18 17:33:31
-
-

*Thread Reply:* if it's still not working I'm not really sure at this point - that's about what I had when I spun up my own custom extractor

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-18 17:39:44
-
-

*Thread Reply:* is there anything in logs regarding extractors?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:40:36
-
-

*Thread Reply:* just this: -[2022-05-18, 21:36:59 UTC] {__init__.py:97} WARNING - Unable to find an extractor. task_type=_PythonDecoratedOperator airflow_dag_id=competitive_oss_projects_git_to_snowflake task_id=Transform_git_logs_to_S3 airflow_run_id=scheduled__2022-05-18T21:35:57.694690+00:00

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:41:11
-
-

*Thread Reply:* @John Thomas Thanks, I appreciate your help.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-19 06:01:52
-
-

*Thread Reply:* No Failed to import messages?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-19 11:26:34
-
-

*Thread Reply: @Maciej Obuchowski None that I can see. Here is the full log: -``` Failed to verify remote log exists s3:///dag_id=Postgres_1_to_Snowflake_1_v3/run_id=scheduled2022-05-19T15:23:49.248097+00:00/task_id=Postgres_1/attempt=1.log. -Please provide a bucket_name instead of "s3:///dag_id=Postgres_1_to_Snowflake_1_v3/run_id=scheduled2022-05-19T15:23:49.248097+00:00/task_id=Postgres_1/attempt=1.log" - Falling back to local log -* Reading local file: /usr/local/airflow/logs/dagid=Postgres1toSnowflake1v3/runid=scheduled2022-05-19T15:23:49.248097+00:00/taskid=Postgres1/attempt=1.log -[2022-05-19, 15:24:50 UTC] {taskinstance.py:1158} INFO - Dependencies all met for <TaskInstance: Postgres1toSnowflake1v3.Postgres1 scheduled2022-05-19T15:23:49.248097+00:00 [queued]> -[2022-05-19, 15:24:50 UTC] {taskinstance.py:1158} INFO - Dependencies all met for <TaskInstance: Postgres1toSnowflake1v3.Postgres1 scheduled_2022-05-19T15:23:49.248097+00:00 [queued]>

- -

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1355} INFO -

- -

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1356} INFO - Starting attempt 1 of 1

- -

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

- -

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1376} INFO - Executing <Task(PythonDecoratedOperator): Postgres1> on 2022-05-19 15:23:49.248097+00:00 -[2022-05-19, 15:24:50 UTC] {standardtaskrunner.py:52} INFO - Started process 3957 to run task -[2022-05-19, 15:24:50 UTC] {standardtaskrunner.py:79} INFO - Running: ['airflow', 'tasks', 'run', 'Postgres1toSnowflake1v3', 'Postgres1', 'scheduled2022-05-19T15:23:49.248097+00:00', '--job-id', '96473', '--raw', '--subdir', 'DAGSFOLDER/pgtosnow.py', '--cfg-path', '/tmp/tmp9n7u3i4t', '--error-file', '/tmp/tmp9a55v9b'] -[2022-05-19, 15:24:50 UTC] {standardtaskrunner.py:80} INFO - Job 96473: Subtask Postgres1 -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/configuration.py:470 DeprecationWarning: The sqlalchemyconn option in [core] has been moved to the sqlalchemyconn option in [database] - the old setting has been used, but please update your config. -[2022-05-19, 15:24:50 UTC] {taskcommand.py:369} INFO - Running <TaskInstance: Postgres1toSnowflake1v3.Postgres1 scheduled2022-05-19T15:23:49.248097+00:00 [running]> on host 056ca0b6c7f5 -[2022-05-19, 15:24:50 UTC] {taskinstance.py:1568} INFO - Exporting the following env vars: -AIRFLOWCTXDAGOWNER=airflow -AIRFLOWCTXDAGID=Postgres1toSnowflake1v3 -AIRFLOWCTXTASKID=Postgres1 -AIRFLOWCTXEXECUTIONDATE=20220519T15:23:49.248097+00:00 -AIRFLOWCTXTRYNUMBER=1 -AIRFLOWCTXDAGRUNID=scheduled2022-05-19T15:23:49.248097+00:00 -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'executiondate' from the template is deprecated and will be removed in a future version. Please use 'dataintervalstart' or 'logicaldate' instead. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'nextds' from the template is deprecated and will be removed in a future version. Please use '{{ dataintervalend | ds }}' instead. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'nextdsnodash' from the template is deprecated and will be removed in a future version. Please use '{{ dataintervalend | dsnodash }}' instead. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'nextexecutiondate' from the template is deprecated and will be removed in a future version. Please use 'dataintervalend' instead. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'prevds' from the template is deprecated and will be removed in a future version. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'prevdsnodash' from the template is deprecated and will be removed in a future version. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'prevexecutiondate' from the template is deprecated and will be removed in a future version. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'prevexecutiondatesuccess' from the template is deprecated and will be removed in a future version. Please use 'prevdataintervalstartsuccess' instead. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'tomorrowds' from the template is deprecated and will be removed in a future version. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'tomorrowdsnodash' from the template is deprecated and will be removed in a future version. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'yesterdayds' from the template is deprecated and will be removed in a future version. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'yesterdaydsnodash' from the template is deprecated and will be removed in a future version. -[2022-05-19, 15:24:50 UTC] {python.py:173} INFO - Done. Returned value was: extract -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py:1369 DeprecationWarning: Passing 'executiondate' to 'TaskInstance.xcompush()' is deprecated. -[2022-05-19, 15:24:50 UTC] {init.py:97} WARNING - Unable to find an extractor. tasktype=PythonDecoratedOperator airflowdagid=Postgres1toSnowflake1v3 taskid=Postgres1 airflowrunid=scheduled2022-05-19T15:23:49.248097+00:00 -[2022-05-19, 15:24:50 UTC] {client.py:74} INFO - Constructing openlineage client to send events to https://api.astro-livemaps.datakin.com/ -[2022-05-19, 15:24:50 UTC] {taskinstance.py:1394} INFO - Marking task as SUCCESS. dagid=Postgres1toSnowflake1v3, taskid=Postgres1, executiondate=20220519T152349, startdate=20220519T152450, enddate=20220519T152450 -[2022-05-19, 15:24:50 UTC] {localtaskjob.py:156} INFO - Task exited with return code 0 -[2022-05-19, 15:24:50 UTC] {localtask_job.py:273} INFO - 1 downstream tasks scheduled from follow-on schedule check```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Josh Owens - (Josh@kickstand.work) -
-
2022-05-19 16:57:38
-
-

*Thread Reply:* @Maciej Obuchowski is our ENV var wrong maybe? Do we need to mention the file to import somewhere else that we may have missed?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-20 10:26:01
-
-

*Thread Reply:* @Josh Owens one thing I can think of is that you might have older openlineage integration version, as OPENLINEAGE_EXTRACTORS variable was added very recently: https://github.com/OpenLineage/OpenLineage/pull/694

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-20 11:58:28
-
-

*Thread Reply:* @Maciej Obuchowski, that was it! For some reason, my requirements.txt wasn't pulling the latest version of openlineage-airflow. Working now with 0.8.2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-20 11:59:01
-
-

*Thread Reply:* 🙌

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Raymond - (michael.raymond@cervest.earth) -
-
2022-05-19 05:32:06
-
-

Hi 👋, I'm looking at OpenLineage as a solution for fine-grained data lineage tracking. Could I clarify a couple of points?

- -

Where does one specify the version of an input dataset in the RunEvent? In the Marquez seed data I can see that it's recorded, but I'm not sure where it goes from looking at the OpenLineage schema. Or does it just assume the last version?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-19 05:59:59
-
-

*Thread Reply:* Currently, it assumes latest version. -There's an effort with DatasetVersionDatasetFacet to be able to specify it manually - or extract this information from cases like Iceberg or Delta Lake tables.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Raymond - (michael.raymond@cervest.earth) -
-
2022-05-19 06:14:59
-
-

*Thread Reply:* Ah ok. Is it Marquez assuming the latest version when it records the OpenLineage event?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-19 06:18:20
-
-

*Thread Reply:* yes

- - - -
- ✅ Michael Raymond -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Raymond - (michael.raymond@cervest.earth) -
-
2022-05-19 06:54:40
-
-

*Thread Reply:* Thanks, that's very helpful 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-19 15:23:33
-
-

Hi all, -I was testing https://github.com/MarquezProject/marquez/tree/main/examples/airflow#step-21-create-dag-counter, and the following error was observed in my airflow env:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-19 15:23:52
-
-

Anybody know why this is happening? Any comments would be welcomed.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-19 15:27:35
-
-

*Thread Reply:* @Howard Yoo What version of airflow?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-19 15:27:51
-
-

*Thread Reply:* it's 2.3

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-19 15:28:42
-
-

*Thread Reply:* (sorry, it's 2.4)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-19 15:29:28
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow Id refer to the docs again.

- -

"Airflow 2.3+ -Integration automatically registers itself for Airflow 2.3 if it's installed on Airflow worker's python. This means you don't have to do anything besides configuring it, which is described in Configuration section."

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-19 15:29:53
-
-

*Thread Reply:* Right, configuring I don't see any issues

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-19 15:30:56
-
-

*Thread Reply:* so you dont need:

- -

from openlineage.airflow import DAG

- -

in your dag files

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-19 15:31:41
-
-

*Thread Reply:* Okay... that makes sense then

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-19 15:32:47
-
-

*Thread Reply:* so if you need to import DAG it would just be: -from airflow import DAG

- - - -
- 👍 Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-19 15:56:19
-
-

*Thread Reply:* Thanks!

- - - -
- 👍 Tyler Farris -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-05-19 17:13:02
-
-

@channel OpenLineage 0.8.2 is now available! The project now supports credentialing from the Airflow Secrets Backend and for the Azure Databricks Credential Passthrough, detection of datasets wrapped by ExternalRDDs, bug fixes, and more. For the details, see: https://github.com/OpenLineage/OpenLineage/releases/tag/0.8.2

- - - -
- 🎉 Marco Diaz, Howard Yoo, Willy Lulciuc, Michael Collado, Ross Turk, Francis McGregor-Macdonald, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-05-19 22:18:42
-
-

Hi~ everyone Is there possible to let openlineage to support camel pipeline?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-20 10:23:55
-
-

*Thread Reply:* What changes do you mean by letting openlineage support? -Or, do you mean, to write Apache Camel integration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-05-22 19:54:17
-
-

*Thread Reply:* @Maciej Obuchowski Yes, let openlineage work as same as airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-05-22 19:56:47
-
-

*Thread Reply:* I think this is a very valuable thing. I wish openlineage can support some commonly used pipeline tools, and try to abstract out some general interfaces so that users can expand by themselves

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-23 05:20:30
-
-

*Thread Reply:* For Python, we have OL client, common libraries (well, at least beginning of them) and SQL parser

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-23 05:20:44
-
-

*Thread Reply:* As we support more systems, the general libraries will grow as well.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-05-20 13:50:53
-
-

I see a change in the metadata collected from Airflow jobs which I think was introduced with the combination of Airflow 2.3/OpenLineage 0.8.1. There's an airflow_version facet that contains an operator attribute.

- -

Previously that attribute had values such as: airflow.providers.postgres.operators.postgres.PostgresOperator but I now see that for the very same task the operator is now tracked as: airflow.models.taskinstance.TaskInstance

- -

( fwiw there's also a taskInfo attribute in there containing a json string which itself has a operator that is still set to PostgresOperator )

- -

Is this an already known issue?

- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-05-20 20:23:15
-
-

*Thread Reply:* This looks like a bug. we are probably not looking at the right instance in the TaskInstanceListener

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-05-21 14:17:19
-
-

*Thread Reply:* @Howard Yoo I filed: https://github.com/OpenLineage/OpenLineage/issues/767 for this

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-05-20 21:42:46
-
-

Would anyone happen to have a link to the Technical Steering Committee meeting recordings?

- -

I have quite a few people interested in seeing the overview of column lineage that Pawel provided during the Technical Steering Committee meeting on Thursday May 19th.

- -

The wiki does not include a link to the recordings: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- -

Are the recordings made public? Thank you for any links and guidance!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-05-20 21:55:09
-
-

That would be @Michael Robinson Yes the recordings are made public.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-05-20 22:05:27
-
-

@Will Johnson I’ll put this on the https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting|wiki soon, but here is the link to the recording: https://astronomer.zoom.us/rec/share/xUBW-n6G4u1WS89tCSXStx8BMl99rCfCC6jGdXLnkN6gMGn5G-_BC7pxHKKeELhG.0JFl88isqb64xX-3 -PW: 1VJ=K5&X

- - - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-05-21 09:42:21
-
-

*Thread Reply:* Thank you so much, Michael!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-23 15:00:10
-
-

Is there documentation/examples around creating custom facets?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-24 06:41:11
-
-

*Thread Reply:* In Python or Java?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-24 06:44:32
-
-

*Thread Reply:* In python just inherit BaseFacet and add _get_schema static method that would point to some place where you have your json schema of a facet. For example our DbtVersionRunFacet

- -

In Java you can take a look at Spark's custom facets.

-
- - - - - - - - - - - - - - - - -
- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-24 16:40:00
-
-

*Thread Reply:* Thanks, @Maciej Obuchowski, I was asking in regards to Python, sorry I should have clarified.

- -

I'm not sure what the disconnect is, but the facets aren't showing up in the inputs and outputs. The Lineage event is sent successfully to my astrocloud.

- -

below is the facet and extractor, any help is appreciated. Thanks!

- -

```import logging -from openlineage.airflow.extractors.base import BaseExtractor, TaskMetadata -from openlineage.client.run import InputDataset, OutputDataset -from typing import List, Optional -from openlineage.client.facet import BaseFacet -import attr

- -

log = logging.getLogger(name)

- -

@attr.s -class ManualLineageFacet(BaseFacet): - database: Optional[str] = attr.ib(default=None) - cluster: Optional[str] = attr.ib(default=None) - connectionUrl: Optional[str] = attr.ib(default=None) - target: Optional[str] = attr.ib(default=None) - source: Optional[str] = attr.ib(default=None) - _producer: str = attr.ib(init=False) - _schemaURL: str = attr.ib(init=False)

- -
@staticmethod
-def _get_schema() -&gt; str:
-    return {
-        "$schema": "<http://json-schema.org/schema#>",
-        "$defs": {
-            "ManualLineageFacet": {
-                "allOf": [
-                    {
-                        "type": "object",
-                        "properties": {
-                            "database": {
-                                "type": "string",
-                                "example": "Snowflake",
-                            },
-                            "cluster": {
-                                "type": "string",
-                                "example": "us-west-2",
-                            },
-                            "connectionUrl": {
-                                "type": "string",
-                                "example": "<http://snowflake>",
-                            },
-                            "target": {
-                                "type": "string",
-                                "example": "Postgres",
-                            },
-                            "source": {
-                                "type": "string",
-                                "example": "Stripe",
-                            },
-                            "description": {
-                                "type": "string",
-                                "example": "Description of inlet/outlet",
-                            },
-                            "_producer": {
-                                "type": "string",
-                            },
-                            "_schemaURL": {
-                                "type": "string",
-                            },
-                        },
-                    },
-                ],
-                "type": "object",
-            }
-        },
-    }
-
- -

class ManualLineageExtractor(BaseExtractor): - @classmethod - def getoperatorclassnames(cls) -> List[str]: - return ["PythonOperator", "_PythonDecoratedOperator"]

- -
def extract_on_complete(self, task_instance) -&gt; Optional[TaskMetadata]:
-
-    return TaskMetadata(
-        f"{task_instance.dag_run.dag_id}.{task_instance.task_id}",
-        inputs=[
-            InputDataset(
-                namespace="default",
-                name=self.operator.get_inlet_defs()[0]["name"],
-                inputFacets=ManualLineageFacet(
-                    database=self.operator.get_inlet_defs()[0]["database"],
-                    cluster=self.operator.get_inlet_defs()[0]["cluster"],
-                    connectionUrl=self.operator.get_inlet_defs()[0][
-                        "connectionUrl"
-                    ],
-                    target=self.operator.get_inlet_defs()[0]["target"],
-                    source=self.operator.get_inlet_defs()[0]["source"],
-                ),
-            )
-            if self.operator.get_inlet_defs()
-            else {},
-        ],
-        outputs=[
-            OutputDataset(
-                namespace="default",
-                name=self.operator.get_outlet_defs()[0]["name"],
-                outputFacets=ManualLineageFacet(
-                    database=self.operator.get_outlet_defs()[0]["database"],
-                    cluster=self.operator.get_outlet_defs()[0]["cluster"],
-                    connectionUrl=self.operator.get_outlet_defs()[0][
-                        "connectionUrl"
-                    ],
-                    target=self.operator.get_outlet_defs()[0]["target"],
-                    source=self.operator.get_outlet_defs()[0]["source"],
-                ),
-            )
-            if self.operator.get_outlet_defs()
-            else {},
-        ],
-        job_facets={},
-        run_facets={},
-    )
-
-def extract(self) -&gt; Optional[TaskMetadata]:
-    pass```
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-25 09:21:02
-
-

*Thread Reply:* _get_schema should return address to the schema hosted somewhere else - afaik sending object field where server expects string field might cause some problems

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-25 09:21:59
-
-

*Thread Reply:* can you register ManualLineageFacet as facets not as inputFacets or outputFacets?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-25 13:15:30
-
-

*Thread Reply:* Thanks for the advice @Maciej Obuchowski, I was able to get it working! -Also great talk today at the airflow summit.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-25 13:25:17
-
-

*Thread Reply:* Thanks 🙇

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno González - (brugms2@gmail.com) -
-
2022-05-24 06:26:25
-
-

Hey guys! I'm pretty new with OL but would like to start using it for a combination of data lineage in Airflow + data quality metrics collection. I was wondering if that was possible, but Ross clarified that in the deeper dive webinar from some weeks ago (great one by the way!).

- -

I'm referencing this comment from Julien to see if you have any updates or more examples apart from the one from great expectations. We have some custom operators and would like to push lineage and data quality metrics to Marquez using custom extractors. Any reference will be highly appreciated. Thanks in advance!

-
- - -
- - - } - - Julien Le Dem - (https://openlineage.slack.com/team/U01DCLP0GU9) -
- - - - - - - - - - - - - - - -
-
-
YouTube
- -
- - - } - - Astronomer - (https://www.youtube.com/c/Astronomer) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-24 06:35:05
-
-

*Thread Reply:* We're also getting data quality from dbt if you're running dbt test or dbt build -https://github.com/OpenLineage/OpenLineage/blob/main/integration/common/openlineage/common/provider/dbt.py#L399

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-24 06:37:15
-
-

*Thread Reply:* Generally, you'd need to construct DataQualityAssertionsDatasetFacet and/or DataQualityMetricsInputDatasetFacet and attach it to tested dataset

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno González - (brugms2@gmail.com) -
-
2022-05-24 13:23:34
-
-

*Thread Reply:* Thanks @Maciej Obuchowski!!!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-24 16:55:08
-
-

Hi all, https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow#development <-- does this still work? I did follow the instructions, but running pytest failed with error messages like -________________________________________________ ERROR collecting tests/extractors/test_bigquery_extractor.py ________________________________________________ -ImportError while importing test module '/Users/howardyoo/git/OpenLineage/integration/airflow/tests/extractors/test_bigquery_extractor.py'. -Hint: make sure your test modules/packages have valid Python names. -Traceback: -openlineage/airflow/utils.py:251: in import_from_string - module = importlib.import_module(module_path) -/opt/homebrew/Caskroom/miniconda/base/envs/airflow/lib/python3.9/importlib/__init__.py:127: in import_module - return _bootstrap._gcd_import(name[level:], package, level) -&lt;frozen importlib._bootstrap&gt;:1030: in _gcd_import - ??? -&lt;frozen importlib._bootstrap&gt;:1007: in _find_and_load - ??? -&lt;frozen importlib._bootstrap&gt;:986: in _find_and_load_unlocked - ??? -&lt;frozen importlib._bootstrap&gt;:680: in _load_unlocked - ??? -&lt;frozen importlib._bootstrap_external&gt;:850: in exec_module - ??? -&lt;frozen importlib._bootstrap&gt;:228: in _call_with_frames_removed - ??? -../../../airflow.master/airflow/providers/google/cloud/operators/bigquery.py:39: in &lt;module&gt; - from airflow.providers.google.cloud.hooks.bigquery import BigQueryHook, BigQueryJob -../../../airflow.master/airflow/providers/google/cloud/hooks/bigquery.py:46: in &lt;module&gt; - from googleapiclient.discovery import Resource, build -E ModuleNotFoundError: No module named 'googleapiclient'

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-24 16:55:09
-
-

...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-24 16:55:54
-
-

looks like just running the pytest wouldn't be able to run all the tests - as some of these dag tests seems to be requiring connectivities to google's big query, databases, etc..

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mardaunt - (miostat@yandex.ru) -
-
2022-05-25 16:32:08
-
-

👋 Hi everyone! -I didn't find this in the documentation. -Can open lineage show me which source columns the final DataFrame column came from? (Spark)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-25 16:59:47
-
-

*Thread Reply:* We're working on this feature - should be in the next release from OpenLineage side

- - - -
- 🙌 Mardaunt -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mardaunt - (miostat@yandex.ru) -
-
2022-05-25 17:06:12
-
-

*Thread Reply:* Thanks! I will keep an eye on updates.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2022-05-25 21:08:39
-
-

Hi all, showcase time:

- -

We have implemented a native OpenLineage endpoint and metadata writer in our Keboola all-in-one data platform. -The reason was that for more complex data pipeline scenarios it is beneficial to display the lineage in more detail. Additionally, we hope that OpenLineage as a standard will catch up and open up the ability to push lineage data into other data governance tools than Marquez. -The implementation started as an internal POC of tweaking our metadata into OpenLineage /lineage format and resulted into a native API endpoint and later on an app within Keboola platform ecosystem - feeding platform job metadata in a regular cadence. -We furthermore use a namespace for each keboola project so users can observe the data through their whole data mesh setup (multi-project architecture). -Please reach me out if you have any questions!

- -
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
- - -
- 🙌 Maciej Obuchowski, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-26 06:05:33
-
-

*Thread Reply:* Looks great! Thanks for sharing!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Gopi Krishnan Rajbahadur - (gopikrishnanrajbahadur@gmail.com) -
-
2022-05-26 10:13:26
-
-

Hi OpenLineage team,

- -

I am Gopi Krishnan Rajbahadur, one of the core members of OpenDatalogy project (a project that we are currently trying to sandbox as a part of LF-AI). Our OpenDatalogy project focuses on providing a process that allows users of publicly available datasets (e.g., CIFAR-10) to ensure license compliance. In addition, we also aim to provide a public repo that documents the final rights and obligations associated with common publicly available datasets, so that users of these datasets can use them compliantly in their AI models and software.

- -

One of the key aspects of conducting dataset license compliance analysis involves tracking the lineage and provenance of the dataset (as we highlight in this paper here: https://arxiv.org/abs/2111.02374). We think that in this regard, our projects (i.e., OpenLineage and OpenDatalogy) could work together to use the existing OpenLineage standard and also collaborate to adopt/modify/enhance and use OpenLineage to track and document the lineage of a publicly available dataset. On that note, we are also working with the SPDX community to make the lineage and provenance of a dataset be tracked as a part of the SPDX BOM that is in the works for representing AI software (AI SBOM).

- -

We think our projects could mutually benefit from collaborating with each other. Our project's Github could be found here: https://github.com/OpenDataology/OpenDataology. Any feedback that you have about our project would be greatly appreciated. Also, as we are trying to sandbox our project, if you could also show us your support we would greatly appreciate it!

- -

Look forward to hearing back from you

- -

Sincerely, -Gopi

-
-
arXiv.org
- - - - - - - - - - - - - - - - - -
-
- - - - - - - -
-
Stars
- 3 -
- -
-
Last updated
- 3 days ago -
- - - - - - - - -
- - - -
- 👀 Howard Yoo, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ilqar Memmedov - (ccmilgar@gmail.com) -
-
2022-05-30 04:25:10
-
-

Hi guys, sorry for basics. -I did some PoC for OpenLineage usage for gathering metrics on Spark job, especially for table creation, alter and drop -I detect that Drop/Alter table statements is not trigger listener to post lineage data, Is it normal behaviour?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-30 05:38:41
-
-

*Thread Reply:* Might be that case if you're using Spark 3.2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-30 05:38:54
-
-

*Thread Reply:* There were some changes to those operators

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-30 05:39:09
-
-

*Thread Reply:* If you're not using 3.2, please share more details 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilqar Memmedov - (ccmilgar@gmail.com) -
-
2022-05-30 07:58:58
-
-

*Thread Reply:* Yeap, im using spark version 3.2.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilqar Memmedov - (ccmilgar@gmail.com) -
-
2022-05-30 07:59:35
-
-

*Thread Reply:* is it open issue, or i have some option to force them to be sent?)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilqar Memmedov - (ccmilgar@gmail.com) -
-
2022-05-30 07:59:58
-
-

*Thread Reply:* btw thank you for quick response @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-30 08:00:34
-
-

*Thread Reply:* Yes, we have issue for AlterTable at least

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-06-01 02:52:14
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/616 -> that’s the issue for altering tables in Spark 3.2. -@Ilqar Memmedov Did you mean drop table or drop columns? I am not aware of any drop table issue.

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/tnazarew">@tnazarew</a> -
- -
-
Labels
- enhancement, integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilqar Memmedov - (ccmilgar@gmail.com) -
-
2022-06-01 06:03:38
-
-

*Thread Reply:* @Paweł Leszczyński drop table statement.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilqar Memmedov - (ccmilgar@gmail.com) -
-
2022-06-01 06:05:58
-
-

*Thread Reply:* For reproduce it, i just create simple spark job. -Create table as select from other, -Select data from table, and then drop entire table.

- -

Lineage data was posted only for "Create table as select" part

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-06-01 05:16:01
-
-

Hi~all, I have a question about lineage. I am now running airflow 2.3.1 and have started a latest marquez service by docker-compose. I found that using the example DAG of airflow can only see the job information, but not the lineage of the job. How can I configure it to see the lineage ?

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-03 14:20:16
-
-

*Thread Reply:* hi xiang 👋 lineage in airflow depends on the operator. some operators have extractors as part of the integration, but when they are missing you only see job information in Marquez.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-03 14:20:51
- -
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-06-01 05:23:54
-
-

Another problem is that if I declare a skip task(e.g. DummyOperator) in the DAG, it will never appear in the job list. I think this is a problem, because even if it can not run, it should be able to see it as a metadata object.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-01 10:19:33
-
-

@channel The next OpenLineage Technical Steering Committee meeting is on Thursday, June 9 at 10 am PT. Join us on Zoom: https://us02web.zoom.us/j/81831865546?pwd=RTladlNpc0FTTDlFcWRkM2JyazM4Zz09 -All are welcome! -Agenda:

- -
  1. a recent blog post about Snowflake
  2. the Great Expectations integration
  3. the dbt integration
  4. Open discussion -Notes: https://tinyurl.com/openlineagetsc -Is there a topic you think the community should discuss at this or a future meeting? DM me to add items to the agenda.
  5. -
- - - -
- 👀 Howard Yoo, Francis McGregor-Macdonald -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-04 09:45:41
-
-

@channel OpenLineage 0.9.0 is now available, featuring column-level lineage in the Spark integration, bug fixes and more! For the details, see: https://github.com/OpenLineage/OpenLineage/releases/tag/0.9.0 and https://github.com/OpenLineage/OpenLineage/compare/0.8.2...0.9.0. Thanks to all the contributors who made this release possible, including @Paweł Leszczyński for authoring the column-level lineage PRs and new contributor @JDarDagran!

- - - -
- 👍 Howard Yoo, Jarek Potiuk, Maciej Obuchowski, Ross Turk, Minkyu Park, pankaj koti, Jorik, Li Ding, Faouzi, Howard Yoo, Mardaunt -
- -
- 🎉 pankaj koti, Faouzi, Howard Yoo, Sheeri Cabral (Collibra), Mardaunt -
- -
- ❤️ Faouzi, Howard Yoo, Mardaunt -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-06-06 16:14:52
-
-

Hey, all. Working on a PR to OpenLineage. I'm curious about file naming conventions for facets. Im noticing that there are two conventions being used:

- -

• In OpenLineage.spec.facets; ex. ExampleFacet.json -• In OpenLineage.integration.common.openlineage.common.schema; ex. example-facet.json. -Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-08 08:02:58
-
-

*Thread Reply:* I think internal naming is more important 🙂

- -

I guess, for now, try to match what the local directory has.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-06-08 10:59:39
-
-

*Thread Reply:* Thanks @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-06-07 03:24:03
-
-

Hi Team, we are seeing DatasetName as the Custom query when we run a spark job which queries Oracle DB using JDBC with a Custom Query and the custom query is having newline syntax in it which is causing the NodeId ID_PATTERN match to fail. How to give custom dataset name when we use custom queries?

- -

Marquez API regex ref: https://github.com/MarquezProject/marquez/blob/main/api/src/main/java/marquez/service/models/NodeId.java#L44 -ERROR [2022-06-07 06:11:49,592] io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: 3648e87216d7815b -! java.lang.IllegalArgumentException: node ID (dataset:oracle:thin:_//&lt;host-name&gt;:1521:( -! SELECT -! RULE.RULE_ID, -! ASSG.ASSIGNED_OBJECT_ID, ASSG.ORG_ID, ASSG.SPLIT_PCT, -! PRTCP.PARTICIPANT_NAME, PRTCP.START_DATE, PRTCP.END_DATE -! FROM RULE RULE, -! ASSG ASSG, -! PRTCP PRTCP -! WHERE -! RULE.RULE_ID = ASSG.RULE_ID(+) -! --AND RULE.RULE_ID = 300100207891651 -! AND PRTCP.PARTICIPANT_ID = ASSG.ASSIGNED_OBJECT_ID -! -- and RULE.created_by = ' 1=1 ' -! and 1=1 -! )) must start with 'dataset', 'job', or 'run'

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Zachariah V - (manish.zack@gmail.com) -
-
2022-06-08 07:48:16
-
-

Hi Team, -We have a spark job xyz that uses OpenLineageListener which posts Lineage events to Marquez server. But we are seeing some unknown jobs in the Marquez UI : -• xyz.collect_limit -• xyz.execute_insert_into_hadoop_fs_relation_command -What jobs are these (collect_limit, execute_insert_into_hadoop_fs_relation_command ) ? -How do we get the lineage listener to post only our job (xyz) ?

- - - -
- 👍 Pradeep S -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-08 11:00:41
-
-

*Thread Reply:* Those jobs are actually what Spark does underneath 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-08 11:00:57
-
-

*Thread Reply:* Are you using Delta Lake btw?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Moiz - (moiz.groups@gmail.com) -
-
2022-06-08 12:02:39
-
-

*Thread Reply:* No, this is not Delta Lake. It is a normal Spark app .

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-06-08 13:58:05
-
-

*Thread Reply:* @Maciej Obuchowski i think David posted about this before. https://openlineage.slack.com/archives/C01CK9T7HKR/p1636011698055200

-
- - -
- - - } - - David Virgil - (https://openlineage.slack.com/team/U02K9U58X7F) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-08 14:27:46
-
-

*Thread Reply:* I agree that it looks bad on UI, but I also think integration is going good job here. The eventual "aggregation" should be done by event consumer.

- -

If anything, we should filter some 'useless' nodes like collect_limit since they add nothing.

- -

We have an issue for doing this to specifically delta lake operations, as they are the biggest offenders: https://github.com/OpenLineage/OpenLineage/issues/628

-
- - - - - - - -
-
Milestone
- <a href="https://github.com/OpenLineage/OpenLineage/milestone/6">0.10.0</a> -
- - - - - - - - - - -
- - - -
- 👍 George Zachariah V -
- -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-06-08 14:33:09
-
-

*Thread Reply:* @Maciej Obuchowski but we only see these 2 jobs in the namespace, no other jobs were part of the lineage metadata, are we doing something wrong?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-06-08 16:09:15
-
-

*Thread Reply:* @Michael Robinson On this note, may we know how to form a lineage if we have different set of API's before calling the spark job (already integrated with OpenLineageSparkListener), we want to see how the different set of params pass thru these components before landing into the spark job. If we use openlineage client to post the lineage events into the Marquez, do we need to mention the same Run UUID across the lineage events for the run or is there any other way to do this? Can you pls advise?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-08 22:51:38
-
-

*Thread Reply:* I think I understand what you are asking -

- -

The runID is used to correlate different state updates (i.e., start, fail, complete, abort) across the lifespan of a run. So if you are trying to add additional metadata to the same job run, you’d use the same runID.

- -

So you’d generate a runID and send a START event, then in the various components you could send OTHER events containing the same runID + params you want to study in facets, then at the end you would send a COMPLETE.

- -

(I think there should be an UPDATE event type in the spec for this sort of thing.)

- - - -
- 👍 George Zachariah V, raghanag -
- -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-06-08 22:59:39
-
-

*Thread Reply:* thanks @Ross Turk but what i am looking for is lets say for example, if we have 4 components in the system then we want to show the 4 components as job icons in the graph and the datasets between them would show the input/output parameters that these components use. -A(job) --> DS1(dataset) --> B(job) --> DS2(dataset) --> C(job) --> DS3(dataset) --> D(job)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-08 23:04:37
-
-

*Thread Reply:* then you would need to have separate Jobs for each, with inputs and outputs defined

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-08 23:06:03
-
-

*Thread Reply:* so there would be a Run of job B that shows DS1 as an input and DS2 as an output

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-06-08 23:06:18
-
-

*Thread Reply:* got it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-08 23:06:34
-
-

*Thread Reply:* (fyi: I know openlineage but my understanding stops at spark 😄)

- - - -
- 👍 raghanag -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-06-10 12:27:58
-
-

*Thread Reply:* > The eventual “aggregation” should be done by event consumer. -@Maciej Obuchowski Are there any known client side libraries that support this aggregation already ? In case of spark applications running as part of ETL pipelines, most of the times our end user is interested in seeing only the aggregated view where all jobs spawned as part of a single application are rolled up into 1 job.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-10 12:32:14
-
-

*Thread Reply:* I believe Microsoft @Will Johnson has something similar to that, but it's probably proprietary.

- -

We'd love to have something like it, but AFAIK it affects only some percentage of Spark jobs and we can only do so much.

- -

With exception of Delta Lake/Databricks, where it affects every job, and we know some nodes that could be safely filtered client side.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-06-11 23:38:27
-
-

*Thread Reply:* @Maciej Obuchowski Microsoft ❤️ OSS!

- -

Apache Atlas doesn't have the same model as Marquez. It only knows of effectively one entity that represents the complete asset.

- -

@Mark Taylor designed this solution available now on Github to consolidate OpenLineage messages

- -

https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator/blob/d6514f2[…]/Function.Domain/Helpers/OlProcessing/OlMessageConsolodation.cs

- -

In addition, we do some filtering only based on inputs and outputs to limit the messages AFTER it has been emitted.

-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-06-19 09:37:06
-
-

*Thread Reply:* thank you !

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-08 10:54:32
-
-

@channel The next OpenLineage TSC meeting is tomorrow! https://openlineage.slack.com/archives/C01CK9T7HKR/p1654093173961669

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Maciej Obuchowski, Sheeri Cabral (Collibra), Willy Lulciuc, raghanag, Mardaunt -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Moravec - (jkb.moravec@gmail.com) -
-
2022-06-09 13:04:00
-
-

*Thread Reply:* Hi, is the link correct? The meeting room is empty

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-09 16:04:23
-
-

*Thread Reply:* sorry about that, thanks for letting us know

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mark Beebe - (mark_j_beebe@progressive.com) -
-
2022-06-13 15:13:59
-
-

Hello all, after sending dbt openlineage events to Marquez, I am now looking to use the Marquez API to extract the lineage information. I am able to use python requests to call the Marquez API to get other information such as namespaces, datasets, etc., but I am a little bit confused about what I need to enter to get the lineage. I included screenshots for what the API reference shows regarding retrieving the lineage where it shows that a nodeId is required. However, this is where I seem to be having problems. It is not exactly clear where the nodeId needs to be set or what the nodeId needs to include. I would really appreciate any insights. Thank you!

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-13 18:49:37
-
-

*Thread Reply:* Hey @Mark Beebe!

- -

In this case, nodeId is going to be either a dataset or a job. You need to tell Marquez where to start since there is likely to be more than one graph. So you need to get your hands on an identifier for that starting node.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-13 18:50:07
-
-

*Thread Reply:* You can do this in a few ways (that I can think of). First, by looking for a namespace, then querying for the datasets in that namespace:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-13 18:53:43
-
-

*Thread Reply:* Or you can search, if you know the name of the dataset:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-13 18:53:54
-
-

*Thread Reply:* aaaaannnnd that’s actually all the ways I can think of.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mark Beebe - (mark_j_beebe@progressive.com) -
-
2022-06-14 08:11:30
-
-

*Thread Reply:* That worked, thank you so much!

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-06-14 05:52:39
-
-

Hi all, I need to send the lineage information from spark integration directly to a kafka topic. Java client seems to have a KafkaTransport, is it planned to have this support from inside the spark integration as well?

- - - -
- 👀 Francis McGregor-Macdonald -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-14 10:35:48
-
-

Hi all, I’m working on a blog post about the Spark integration and would like to credit @tnazarew and @Sbargaoui for their contributions. Anyone know these contributors’ names? Are you on here? Thanks for any leads.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-14 10:37:01
-
-

*Thread Reply:* tnazarew - Tomasz Nazarewicz

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-14 10:37:14
-
-

*Thread Reply:* 🙌

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-15 12:46:45
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-06-14 13:58:07
-
-

Has anyone tried getting the OpenLineage Spark integration working with GCP Dataproc ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Peter Hanssens - (peter@cloudshuttle.com.au) -
-
2022-06-15 15:49:17
-
-

Hi Folks, -DataEngBytes is a community data engineering conference here in Australia and will be hosted on the 27th and 29th of September. Our CFP is open for just under a month and tickets are on sale now: -Call for paper: https://sessionize.com/dataengbytes-2022/ -Tickets: https://www.tickettailor.com/events/dataengbytes/713307 -Promo video -https://youtu.be/1HE_XNLvHss

-
-
sessionize.com
- - - - - - - - - - - - - - - -
-
-
tickettailor.com
- - - - - - - - - - - - - - - - - -
-
-
YouTube
- -
- - - } - - DataEngAU - (https://www.youtube.com/c/DataEngAU) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👀 Ross Turk, Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-17 16:23:32
-
-

A release of OpenLineage has been requested pending the merging of #856. Three +1s will authorize a release today. -@Willy Lulciuc @Michael Collado @Ross Turk @Maciej Obuchowski @Paweł Leszczyński @Mandy Chessell @Daniel Henneberger @Drew Banin @Julien Le Dem @Ryan Blue @Will Johnson @Zhamak Dehghani

-
- - - - - - - - - - - - - - - - -
- - - -
- ➕ Willy Lulciuc, Maciej Obuchowski, Michael Collado -
- -
- ✅ Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Chase Christensen - (christensenc3526@gmail.com) -
-
2022-06-22 17:09:18
-
-

👋 Hi everyone!

- - - -
- 👋 Conor Beverland, Ross Turk, Maciej Obuchowski, Michael Robinson, George Zachariah V, Willy Lulciuc, Dinakar Sundar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Lee - (chenzuoli709@gmail.com) -
-
2022-06-23 21:54:05
-
-

hi

- - - -
- 👋 Maciej Obuchowski, Sheeri Cabral (Collibra), Willy Lulciuc, Michael Robinson, Dinakar Sundar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-25 07:34:32
-
-

@channel OpenLineage 0.10.0 is now available! We added SnowflakeOperatorAsync extractor support to the Airflow integration, an InMemoryRelationInputDatasetBuilder for InMemory datasets to the Spark integration, a static code analysis tool to run in CircleCI on Python modules, a copyright to all source files, and a debugger called PMD to the build process. -Changes we made include skipping FunctionRegistry.class serialization in the Spark integration, installing the new rust-based SQL parser by default in the Airflow integration, improving the integration tests for the Airflow integration, reducing event payload size by excluding local data and including an output node in start events, and splitting the Spark integration into submodules. -Thanks to all the contributors who made this release possible! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.10.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.9.0...0.10.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Maciej Obuchowski, Filipe Comparini Vieira, Manuel, Dinakar Sundar, Ross Turk, Paweł Leszczyński, Willy Lulciuc, Adisesha Reddy G, Conor Beverland, Francis McGregor-Macdonald, Jam Car -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:29:29
-
-

Why has put dataset been deprecated? How do I add an initial data set via api?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:39:16
-
-

*Thread Reply:* I think you’re reference the deprecation of the DatasetAPI in Marquez? A milestone for the Marquez is to only collect metadata via OpenLineage events. This includes metadata for datasets , jobs , and runs . The DatasetAPI won’t be removed until support for collecting dataset metadata via OpenLineage has been added, see https://github.com/OpenLineage/OpenLineage/issues/323

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/mobuchowski">@mobuchowski</a> -
- -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:40:28
-
-

*Thread Reply:* Once the spec supports dataset metadata, we’ll outline steps in the Marquez project to switch to using the new dataset event type

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:43:20
-
-

*Thread Reply:* The DatasetAPI was also deprecated to avoid confusion around which API to use

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:41:38
-
-

🥺

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:42:21
-
-

So how would you propose I create the initial node if I am trying to do a POC?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:44:49
-
-

*Thread Reply:* Do you want to register just datasets? Or are you extracting metadata for a job that would include input / output datasets? (outside of Airflow of course)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:45:09
-
-

*Thread Reply:* Sorry didn't notice you over here ! lol

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:45:53
-
-

*Thread Reply:* So ideally I would like to map out our current data flow from on prem to aws

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:47:39
-
-

*Thread Reply:* What do you mean by mapping to AWS? Like send OL events to a service on AWS that would process the lineage metadata?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:48:14
-
-

*Thread Reply:* no, just visualize the current migration flow.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:48:53
-
-

*Thread Reply:* Ah I see, youre doing a infra migration from on prem to AWS 👌

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:49:08
-
-

*Thread Reply:* really AWS is irrelevant. Source sink -> migration scriipts -> s3 -> additional processing -> final sink

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:49:19
-
-

*Thread Reply:* correct

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:49:45
-
-

*Thread Reply:* right right. so you want to map out that flow and visualize it in Marquez? (or some other meta service)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:50:05
-
-

*Thread Reply:* yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:50:26
-
-

*Thread Reply:* which I think I can do once the first nodes exist

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:51:18
-
-

*Thread Reply:* But I don't know how to get that initial node. I tried using the input facet at job start , that didn't do it. I also can't get the sql context that is in these examples.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:51:54
-
-

*Thread Reply:* really just want to re-create food_devlivery using my own biz context

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:52:14
-
-

*Thread Reply:* Have you looked over our workshops and this example? (assuming you’re using python?)

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:53:49
-
-

*Thread Reply:* that goes over the py client with some OL examples, but really calling openlineage.emit(...) method with RunEvents and specifying Marquez as the backend will get you up and running!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:54:32
-
-

*Thread Reply:* Don’t forget to configure the transport for the client

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:54:45
-
-

*Thread Reply:* sweet. Thank you! I'll take a look. Also.. Just came across datakin for the first time. very nice 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:55:25
-
-

*Thread Reply:* thanks! …. but we’re now part of astronomer.io 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:55:48
-
-

*Thread Reply:* making airflow oh-so-easy-to-use one DAG at a time

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:55:52
-
-

*Thread Reply:* saw that too !

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:56:03
-
-

*Thread Reply:* you’re on top of it!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:56:28
-
-

*Thread Reply:* ha. Thanks again!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:42:40
-
-

This would be outside of Airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-06-28 18:43:22
-
-

Hello, -Is OpenLineage planning to add support for inlets and outlets for Airflow integration? I am working on a project that relies on it and was hoping to contribute to this feature if its something that is in the talks. -I saw an open issue here

- -

I am willing to work on it. My plan was to just support Files and Tables entities (for inlets and outlets). -Pass the inlets and outlets info into extract_metadata function here and then convert Airflow entities into TaskMetaData entities here.

- -

Does this sound reasonable?

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:59:38
-
-

*Thread Reply:* Honestly, I’ve been a huge fan of using / falling back on inlets and outlets since day 1. AND if you’re willing to contribute this support, you get a +1 from me (I’ll add some minor comments to the issue) /cc @Julien Le Dem

- - - -
- 🙌 Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:59:59
-
-

*Thread Reply:* would be great to get @Maciej Obuchowski thoughts on this as well

- - - -
- 👍 Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-08 12:40:39
-
-

*Thread Reply:* I have created a draft PR for this here. -Please let me know if the changes make sense.

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-08 12:42:30
-
-

*Thread Reply:* I think this effort: https://github.com/OpenLineage/OpenLineage/pull/904 ultimately makes more sense, since it will allow getting lineage on Airflow 2.3+ too

-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
- ✅ Fenil Doshi -
- -
- 👀 Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-08 18:12:47
-
-

*Thread Reply:* I have made the changes in-line to the mentioned comments here. -Does this look good?

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-12 09:35:22
-
-

*Thread Reply:* I think it looks good! Would be great to have tests for this feature though.

- - - -
- 👍 Fenil Doshi, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-15 21:56:50
-
-

*Thread Reply:* I have added the tests! Would really appreciate it if someone can take a look and let me know if anything else needs to be done. -Thank you for the support! 😄

- - - -
- 👀 Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-18 06:48:03
-
-

*Thread Reply:* One change and I think it will be good for now.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-18 06:48:07
-
-

*Thread Reply:* Have you tested it manually?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-20 13:22:04
-
-

*Thread Reply:* Thanks a lot for the review! Appreciate it 🙌 -Yes, I tested it manually (for Airflow versions 2.1.4 and 2.3.3) and it works 🎉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-20 13:24:55
-
-

*Thread Reply:* I think this is such a useful feature to have, thank you! Would you mind adding a little example to the PR of how to use it? Like a little example DAG or something? ( either in a comment or edit the PR description )

- - - -
- 👍 Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-20 15:20:32
-
-

*Thread Reply:* Yes, Sure! I will add it in the PR description

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-21 05:30:56
-
-

*Thread Reply:* I think it would be easy to convert to integration test then if you provided example dag

- - - -
- 👍 Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-27 12:20:43
-
-

*Thread Reply:* ping @Fenil Doshi if possible I would really love to see the example DAG on there 🙂 🙏

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-27 12:26:22
-
-

*Thread Reply:* Yes, I was going to but the PR got merged so did not update the description. Should I just update the description of merged PR? Or should I add it somewhere in the docs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-27 12:42:29
-
-

*Thread Reply:* ^ @Ross Turk is it easy for @Fenil Doshi to contribute doc for manual inlet definition on the new doc site?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-27 12:48:32
-
-

*Thread Reply:* It is easy 🙂 it's just markdown: https://github.com/openlineage/docs/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-27 12:49:23
-
-

*Thread Reply:* @Fenil Doshi feel free to create new page here and don't sweat where to put it, we'll still figuring the structure of it out and will move it then

- - - -
- 👍 Ross Turk, Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-27 13:12:31
-
-

*Thread Reply:* exactly, yes - don’t be worried about the doc quality right now, the doc site is still in a pre-release state. so whatever you write will be likely edited or moved before it becomes official 👍

- - - -
- 👍 Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-27 20:37:34
-
-

*Thread Reply:* I added documentations here - https://github.com/OpenLineage/docs/pull/16

- -

Also, have added an example for it. 🙂 -Let me know if something is unclear and needs to be updated.

-
- - - - - - - - - - - - - - - - -
- - - -
- ✅ Conor Beverland -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-28 12:50:54
-
-

*Thread Reply:* Thanks! very cool.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-28 12:52:22
-
-

*Thread Reply:* Does Airflow check the types of the inlets/outlets btw?

- -

Like I wonder if a user could directly define an OpenLineage DataSet ( which might even have various other facets included on it ) and specify it in the inlets/outlets ?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-28 12:54:56
-
-

*Thread Reply:* Yeah, I was also curious about using the models from airflow.lineage.entities as opposed to openlineage.client.run.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-28 12:55:42
-
-

*Thread Reply:* I am accustomed to creating OpenLineage entities like this:

- -

taxes = Dataset(namespace="<postgres://foobar>", name="schema.table")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-28 12:56:45
-
-

*Thread Reply:* I don’t dislike the airflow.lineage.entities models especially, but if we only support one of them…

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-28 12:58:18
-
-

*Thread Reply:* yeah, if Airflow allows that class within inlets/outlets it'd be nice to support both imo.

- -

Like we would suggest users to use openlineage.client.run.Dataset but if a user already has DAGs that use Table then they'd still work in a best efforts way.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-28 13:03:07
-
-

*Thread Reply:* either Airflow depends on OpenLineage or we can probably change those entities as part of AIP-48 overhaul to more openlineage-like ones

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-28 17:18:35
-
-

*Thread Reply:* hm, not sure I understand the dependency issue. isn’t this extractor living in openlineage-airflow?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-08-15 09:49:02
-
-

*Thread Reply:* I gave manual lineage a try with native OL Datasets specified in the Airflow inlets/outlets and it seems to work! Had to make some small tweaks which I have attempted here: https://github.com/OpenLineage/OpenLineage/pull/1015

- -

( I left the support for converting the Airflow Table to Dataset because I think that's nice to have also )

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:44:24
-
-

food_delivery example example.etl_categories node

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:44:40
-
-

how do I recreate that using Openlineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:45:52
-
-

*Thread Reply:* Ahh great question! I actually just updated the seeding cmd for Marquez to do just this (but in java of course)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:46:15
-
-

*Thread Reply:* Give me a sec to send you over the diff…

- - - -
- ❤️ Mike brenes -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:56:35
-
-

*Thread Reply:* … continued here https://openlineage.slack.com/archives/C01CK9T7HKR/p1656456734272809?thread_ts=1656456141.097229&cid=C01CK9T7HKR

-
- - -
- - - } - - Willy Lulciuc - (https://openlineage.slack.com/team/U01DCMDFHBK) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-06-28 20:05:33
-
-

I'm very new to DBT but wanted to give it a try with OL. I had a couple of questions when going through the DBT tutorial here: https://docs.getdbt.com/guides/getting-started/learning-more/getting-started-dbt-core

- -
  1. An earlier part of the tutorial has you build a model in a single sql file: https://docs.getdbt.com/guides/getting-started/learning-more/getting-started-dbt-core#build-your-first-model When I did this and ran dbt-ol I got a lineage graph like this:
  2. -
- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-06-28 20:07:11
-
-

then a later part of the tutorial has you split that same example into multiple models and when I run it again I get the graph like:

- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-06-28 20:08:54
-
-

^ I'm just kind of curious if it's working as expected? And/or could it be possible to parse the DBT .sql so that the lineage in the first case would still show those staging tables?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-29 10:04:14
-
-

*Thread Reply:* I think you should declare those as sources? Or do you need something different?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-06-29 21:15:33
-
-

*Thread Reply:* I'll try to experiment with this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-06-28 20:09:19
-
-
  1. I see that DBT has a concept of adding tests to your models. Could those add data quality facets in OL ?
  2. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-29 10:02:17
-
-

*Thread Reply:* this should already be working if you run dbt-ol test or dbt-ol build

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-06-29 21:15:25
-
-

*Thread Reply:* oh, nice!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
shweta p - (shweta.pbs@gmail.com) -
-
2022-07-04 02:48:35
-
-

Hi everyone, i am trying openlineage-dbt. It works perfectly on locally when i try to publish the events to Marquez...but when i run the same commands from mwaa...i dont see those events triggered..i amnt able to view any logs if there is any error. How do i debug the issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-07-06 14:26:59
-
-

*Thread Reply:* Maybe @Maciej Obuchowski knows? You need to check, it's using the dbt-ol command and that the configuration is available. (environment variables or conf file)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-06 15:31:20
-
-

*Thread Reply:* Maybe some aws networking stuff? I'm not really sure how mwaa works internally (or, at all - never used it)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-06 15:35:06
-
-

*Thread Reply:* anyway, any logs/errors should be in the same space where your task logs are

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-06 05:32:28
-
-

Agenda items are requested for the next OpenLineage Technical Steering Committee meeting on July 14. Reply in thread or ping me with your item(s)!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-06 10:21:50
-
-

*Thread Reply:* What is the status on the Flink / Streaming decisions being made for OpenLineage / Marquez?

- -

A few months ago, Flink was being introduced and it was said that more thought was needed around supporting streaming services in OpenLineage.

- -

It would be very helpful to know where the community stands on how streaming data sources should work in OpenLineage.

- - - -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-06 11:08:01
-
-

*Thread Reply:* @Will Johnson added your item

- - - -
- 👍 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-06 10:19:44
-
-

Request for Creating a New OpenLineage Release

- -

Hello #general, as per the Governance guide (https://github.com/OpenLineage/OpenLineage/blob/main/GOVERNANCE.md#openlineage-project-releases), I am asking that we generate a new release based on the latest commit by @Maciej Obuchowski (c92a93cdf3df636a02984188563d019474904b2b) which fixes a critical issue running OpenLineage on Azure Databricks.

- -

Having this release made available to the general public on Maven would allow us to enable the hundred+ users of the solution to run OpenLineage on the latest LTS versions of Databricks. In addition, it would enable the Microsoft team to integrate the amazing column level lineage feature contributed by @Paweł Leszczyński with our solution for Microsoft Purview.

- - - -
- 👍 Maciej Obuchowski, Jakub Dardziński, Ross Turk, Willy Lulciuc, Will Johnson, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-07 10:33:41
-
-

@channel The next OpenLineage Technical Steering Committee meeting is on Thursday, July 14 at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom -All are welcome! -Agenda:

- -
  1. Announcements/recent talks
  2. Release 0.10.0 overview
  3. Flink integration retrospective
  4. Discuss: streaming services in Flink integration
  5. Open discussion -Notes: https://bit.ly/OLwiki -Is there a topic you think the community should discuss at this or a future meeting? Reply or DM me to add items to the agenda.
  6. -
-
-
Zoom Video
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Cecchi - (david_cecchi@cargill.com) -
-
2022-07-11 10:30:34
-
-

*Thread Reply:* would appreciate a TSC discussion on OL philosophy for Streaming in general and where/if it fits in the vision and strategy for OL. fully appreciate current maturity, moreso just validating how OL is being positioned from a vision perspective. as we consider aligning enterprise lineage solution around OL want to make sure we're not making bad assumptions. neat discussion might be "imagine that Confluent decided to make Stream Lineage OL compliant/capable - are we cool with that and what are the implications?".

- - - -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-12 12:36:17
-
-

*Thread Reply:* @Michael Robinson could I also have a quick 5m to talk about plans for a documentation site?

- - - -
- 👍 Michael Robinson, Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-12 12:46:29
-
-

*Thread Reply:* @David Cecchi @Ross Turk Added your items to the agenda. Thanks and looking forward to the discussion!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Cecchi - (david_cecchi@cargill.com) -
-
2022-07-12 15:08:48
-
-

*Thread Reply:* this is great - will keep an eye out for recording. if it got tabled due to lack of attendance will pick it up next TSC.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-07-12 16:12:43
-
-

*Thread Reply:* I think OpenLineage should have some representation at https://impactdatasummit.com/2022

- -

I’m happy to help craft the abstract, look over slides, etc. (I could help present, but all I’ve done with OpenLineage is one tutorial, so I’m hardly an expert).

- -

CfP closes 31 Aug so there’s plenty of time, but if you want a 2nd set of eyes on things, we can’t just wait until the last minute to submit 😄

-
-
impactdatasummit.com
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-07 12:04:09
-
-

How to create custom facets without recompiling OpenLineage?

- -

I have a customer who is interested in using OpenLineage but wants to extend the facets WITHOUT recompiling OL / maintaining a clone of OL with their changes.

- -

Do we have any examples of how someone might create their own jar but using the OpenLineage CustomFacetBuilder and then have that jar's classes be injected into OpenLineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-07 12:04:55
-
-

*Thread Reply:* @Michael Collado would you have any thoughts on how to extend the Facets without having to alter OpenLineage itself?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-07-07 15:16:45
-
-

*Thread Reply:* This is described here. Notably: -> Custom implementations are registered by following Java's ServiceLoader conventions. A file called io.openlineage.spark.api.OpenLineageEventHandlerFactory must exist in the application or jar's META-INF/service directory. Each line of that file must be the fully qualified class name of a concrete implementation of OpenLineageEventHandlerFactory. More than one implementation can be present in a single file. This might be useful to separate extensions that are targeted toward different environments - e.g., one factory may contain Azure-specific extensions, while another factory may contain GCP extensions.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-07-07 15:17:55
-
-

*Thread Reply:* This example is present in the test package - https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/tes[…]ervices/io.openlineage.spark.api.OpenLineageEventHandlerFactory

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-07 20:19:01
-
-

*Thread Reply:* @Michael Collado you are amazing! Thank you so much for pointing me to the docs and example!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-07 19:27:47
-
-

@channel @Will Johnson -OpenLineage 0.11.0 is now available! -We added: -• an HTTP option to override timeout and properly close connections in openlineage-java lib, -• dynamic mapped tasks support to the Airflow integration, -• a SqlExtractor to the Airflow integration, -• PMD to Java and Spark builds in CI. -We changed: -• when testing extractors in the Airflow integration, the extractor list length assertion is now dynamic, -• templates are rendered at the start of integration tests for the TaskListener in the Airflow integration. -Thanks to all the contributors who made this release possible! -For the bug fixes and more details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.11.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.10.0...0.11.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👍 Chandru TMBA, John Thomas, Maciej Obuchowski, Fenil Doshi -
- -
- 👏 John Thomas, Willy Lulciuc, Ricardo Gaspar -
- -
- 🙌 Will Johnson, Maciej Obuchowski, Sergio Sicre -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-07-11 07:06:36
-
-

Hi all, I am using openlineage-spark in my project where I lock the dependency versions in gradle.lockfile. After release 0.10.0, this is not working. Is this a known limitation of switching to splitting the integration into submodules?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-14 06:18:29
-
-

*Thread Reply:* Can you expand on what's not working exactly?

- -

This is not something we're aware of.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-07-19 04:09:39
-
-

*Thread Reply:* @Maciej Obuchowski Sure, I have my own library where I am creating a shadowJar. This includes the open lineage library into the new uber jar. This worked fine till 0.9.0 but now building the shadowJar gives this error -Could not determine the dependencies of task ':shadowJar'. -&gt; Could not resolve all dependencies for configuration ':runtimeClasspath'. - &gt; Could not find spark:app:0.10.0. - Searched in the following locations: - - <https://repo.maven.apache.org/maven2/spark/app/0.10.0/app-0.10.0.pom> - If the artifact you are trying to retrieve can be found in the repository but without metadata in 'Maven POM' format, you need to adjust the 'metadataSources { ... }' of the repository declaration. - Required by: - project : &gt; io.openlineage:openlineage_spark:0.10.0 - &gt; Could not find spark:shared:0.10.0. - Searched in the following locations: - - <https://repo.maven.apache.org/maven2/spark/shared/0.10.0/shared-0.10.0.pom> - If the artifact you are trying to retrieve can be found in the repository but without metadata in 'Maven POM' format, you need to adjust the 'metadataSources { ... }' of the repository declaration. - Required by: - project : &gt; io.openlineage:openlineage_spark:0.10.0 - &gt; Could not find spark:spark2:0.10.0. - Searched in the following locations: - - <https://repo.maven.apache.org/maven2/spark/spark2/0.10.0/spark2-0.10.0.pom> - If the artifact you are trying to retrieve can be found in the repository but without metadata in 'Maven POM' format, you need to adjust the 'metadataSources { ... }' of the repository declaration. - Required by: - project : &gt; io.openlineage:openlineage_spark:0.10.0 - &gt; Could not find spark:spark3:0.10.0. - Searched in the following locations: - - <https://repo.maven.apache.org/maven2/spark/spark3/0.10.0/spark3-0.10.0.pom> - If the artifact you are trying to retrieve can be found in the repository but without metadata in 'Maven POM' format, you need to adjust the 'metadataSources { ... }' of the repository declaration. - Required by: - project : &gt; io.openlineage:openlineage_spark:0.10.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-19 05:00:02
-
-

*Thread Reply:* Can you try 0.11? I think we might already fixed that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-07-19 05:50:03
-
-

*Thread Reply:* Tried with that as well. Doesn't work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-07-19 05:56:50
-
-

*Thread Reply:* Same error with 0.11.0 as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-19 08:11:13
-
-

*Thread Reply:* I think I see - we removed internal dependencies from maven's pom.xml but we also publish gradle metadata: https://repo1.maven.org/maven2/io/openlineage/openlineage-spark/0.11.0/openlineage-spark-0.11.0.module

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-19 08:11:34
-
-

*Thread Reply:* we should remove the dependencies or disable the gradle metadata altogether, it's not required

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-19 08:16:18
-
-

*Thread Reply:* @Varun Singh For now I think you can try ignoring gradle metadata: https://docs.gradle.org/current/userguide/declaring_repositories.html#sec:supported_metadata_sources

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-07-19 14:18:45
-
-

*Thread Reply:* @Varun Singh did you find out how to build shadowJar successful with release 0.10.0. I can build shadowJar with 0.9.0, but not higher version. If your problem already resolved, could you share some suggestion. thanks ^^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-07-20 03:44:40
-
-

*Thread Reply:* @Hanbing Wang I followed @Maciej Obuchowski's instructions (Thank you!) and added this to my build.gradle file: -repositories { - mavenCentral() { - metadataSources { - mavenPom() - ignoreGradleMetadataRedirection() - } - } -} -I am able to build the jar now. I am not proficient in gradle so don't know if this is the right way to do this. Please correct me if I am wrong.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-07-20 05:26:04
-
-

*Thread Reply:* Also, I am not able to see the 3rd party dependencies in the dependency lock file, but they are present in some folder inside the jar (relocated in subproject's build file). But this is a different problem ig

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-07-20 18:45:50
-
-

*Thread Reply:* Thanks @Varun Singh for the very helpful info. I will also try update build.gradle and rebuild shadowJar again.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-13 01:10:01
-
-

Java Question: Why Can't I Find a Class on the Class Path? / How the heck does the ClassLoader know where to find a class?

- -

Are there any java pros that would be willing to share alternatives to searching if a given class exists or help explain what should change in the Kusto package to make it work for the behaviors as seen in Kafka and SQL DW relation visitors? ---- Details --- -@Hanna Moazam and I are trying to introduce two new Azure data sources into OpenLineage's Spark integration. The https://github.com/Azure/azure-kusto-spark package is nearly done but we're getting tripped up on some Java concepts. In order to know if we should add the KustoRelationVisitor to the input dataset visitors, we need to see if the Kusto jar is installed on the spark / databricks cluster. In this case, the com.microsoft.kusto.spark.datasource.DefaultSource is a public class but it cannot be found using the KustRelationVisitor.class.getClassLoader().loadClass("class name") methods as seen in:

- -

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/[…]nlineage/spark/agent/lifecycle/plan/SqlDWDatabricksVisitor.java -• https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/[…]penlineage/spark/agent/lifecycle/plan/KafkaRelationVisitor.java -At first I thought it was the Azure packages but then I tried to do the same approach with a simple java library

- -

I instantiate a spark-shell like this -spark-shell --master local[4] \ ---conf spark.driver.extraClassPath=/mnt/repos/SparkListener-Basic/lib/build/libs/custom-listener.jar \ ---conf spark.extraListeners=listener.MyListener ---jars /mnt/repos/wjtestlib/lib/build/libs/lib.jar -With lib.jar containing a class that looks like this: -```package wjtestlib;

- -

public class WillLibrary { - public boolean someLibraryMethod() { - return true; - } -} -And the custom listener is very simple. -public class MyListener extends org.apache.spark.scheduler.SparkListener {

- -

private static final Logger log = LoggerFactory.getLogger("MyLogger");

- -

public MyListener() { - log.info("INITIALIZING"); - }

- -

@Override - public void onJobStart(SparkListenerJobStart jobStart) { - log.info("MYLISTENER: ON JOB START"); - try{ - log.info("Trying wjtestlib.WillLibrary"); - MyListener.class.getClassLoader().loadClass("wjtestlib.WillLibrary"); - log.info("Got wjtestlib.WillLibrary"); - } catch(ClassNotFoundException e){ - log.info("Could not get wjtestlib.WillLibrary"); - }

- -
try{
-  <a href="http://log.info">log.info</a>("Trying wjtestlib.WillLibrary using Class.forName");
-  Class.forName("wjtestlib.WillLibrary", false, this.getClass().getClassLoader());
-  <a href="http://log.info">log.info</a>("Got wjtestlib.WillLibrary using Class.forName");
-} catch(ClassNotFoundException e){
-  <a href="http://log.info">log.info</a>("Could not get wjtestlib.WillLibrary using Class.forName");
-}
-
- -

} -} -And I still a result indicating it cannot find the class. -2022-07-12 23:58:22,048 INFO MyLogger: MYLISTENER: ON JOB START -2022-07-12 23:58:22,048 INFO MyLogger: Trying wjtestlib.WillLibrary -2022-07-12 23:58:22,057 INFO MyLogger: Could not get wjtestlib.WillLibrary -2022-07-12 23:58:22,058 INFO MyLogger: Trying wjtestlib.WillLibrary using Class.forName -2022-07-12 23:58:22,065 INFO MyLogger: Could not get wjtestlib.WillLibrary using Class.forName``` -Are there any java pros that would be willing to share alternatives to searching if a given class exists or help explain what should change in the Kusto package to make it work for the behaviors as seen in Kafka and SQL DW relation visitors?

- -

Thank you for any guidance.!

-
- - - - - - - -
-
Stars
- 58 -
- -
-
Language
- Scala -
- - - - - - - - -
- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-07-13 08:50:15
-
-

*Thread Reply:* Could you unzip the created jar and verify that classes you’re trying to use are present? Perhaps there’s some relocate in shadowJar plugin, which renames the classes. Making sure the classes are present in jar good point to start.

- -

Then you can try doing classForName just from the spark-shell without any listeners added. The classes should be available there.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-13 11:42:25
-
-

*Thread Reply:* Thank you for the reply Pawel! Hanna and I just wrapped up some testing.

- -

It looks like Databricks AND open source spark does some magic when you install a library OR use --jars on the spark-shell. In both Databricks and Apache Spark, the thread running the SparkListener cannot see the additional libraries installed unless they're on the original / main class path.

- -

• Confirmed the uploaded jars are NOT shaded / renamed. -• The databricks class path ($CLASSPATH) is focused on /databricks/jars -• The added libraries are in /local_disk0/tmp and are not found in $CLASSPATH. -• The sparklistener only recognizes $CLASSPATH. -• Using a classloader with an object like spark does not find our installed class: spark.getClass().getClassLoader().getResource("com/microsoft/kusto/spark/datasource/KustoSourceOptions.class") -• When we use a classloader on a class we installed and imported, it DOES find the class. myImportedClass.getClass().getClassLoader().getResource("com/microsoft/kusto/spark/datasource/KustoSourceOptions.class") -@Michael Collado and @Maciej Obuchowski have you seen any challenges with using --jars on the spark-shell and detecting if the class is installed?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-13 12:02:05
-
-

*Thread Reply:* We run tests using --packages for external stuff like Delta - which is the same as --jars , but getting them from maven central, not local disk, and it works, like in KafkaRelationVisitor.

- -

What if you did it like it? By that I mean adding it to your code with compileOnly in gradle or provided in maven, compiling with it, then using static method to check if it loads?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-13 12:02:36
-
-

*Thread Reply:* > • When we use a classloader on a class we installed and imported, it DOES find the class. myImportedClass.getClass().getClassLoader().getResource("com/microsoft/kusto/spark/datasource/KustoSourceOptions.class") -Isn't that this actual scenario?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-13 12:36:47
-
-

*Thread Reply:* Thank you for the reply, Maciej!

- -

I will try the compileOnly route tonight!

- -

Re: myImportedClass.getClass().getClassLoader().getResource("com/microsoft/kusto/spark/datasource/KustoSourceOptions.class")

- -

I failed to mention that this was only achieved in the interactive shell / Databricks notebook. It never worked inside the SparkListener UNLESS we installed the Kusto jar on the databricks class path.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-07-14 06:43:47
-
-

*Thread Reply:* The difference between --jars and --packages is that for packages all transitive dependencies will be handled. But this does not seem to be the case here.

- -

More doc can be found here: (https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management)

- -

When starting a SparkContext, all the jars available on the classpath should be listed and put into Spark logs. So that’s the place one can check if the jar is loaded or not.

- -

If --conf spark.driver.extraClassPath is working, you can add multiple jar files there (they must be separated by commas).

- -

Other examples of adding multiple jars to spark classpath can be found here -> https://sparkbyexamples.com/spark/add-multiple-jars-to-spark-submit-classpath/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 11:20:02
-
-

*Thread Reply:* @Paweł Leszczyński thank you for the reply! Hanna and I experimented with jars vs extraClassPath.

- -

When using jars, the spark listener does NOT find the class using a classloader.

- -

When using extraClassPath, the spark listener DOES find the class using a classloader.

- -

When using --jars, we can see in the spark logs that after spark starts (and after the spark listener is already established?) there are Spark.AddJar commands being executed.

- -

@Maciej Obuchowski we also experimented with doing a compileOnly on OpenLineage's spark listener, it did not change the behavior. OpenLineage still failed to identify that I had the kusto-spark-connector.

- -

I'm going to reach out to Databricks to see if there is any guidance on letting the SparkListener be aware of classes added via their libraries / --jar method on the spark-shell.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-14 11:22:01
-
-

*Thread Reply:* So, this is only relevant to Databricks now? Because I don't understand what do you do different than us with Kafka/Iceberg/Delta

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-14 11:22:48
-
-

*Thread Reply:* I'm not the spark/classpath expert though - maybe @Michael Collado have something to add?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 11:24:12
-
-

*Thread Reply:* @Maciej Obuchowski that's a super good question on Iceberg. How do you instantiate a spark job with Iceberg installed?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 11:26:04
-
-

*Thread Reply:* It is still relevant to apache spark because I can't get OpenLineage to find the installed package UNLESS I use extraClassPath.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-14 11:29:13
-
-

*Thread Reply:* Basically, by adding --packages org.apache.iceberg:iceberg_spark_runtime_3.1_2.12:0.13.0

- -

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/tes[…]a/io/openlineage/spark/agent/SparkContainerIntegrationTest.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 11:29:51
-
-

*Thread Reply:* Trying with --pacakges right now.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 11:54:37
-
-

*Thread Reply:* Using --packages wouldn't let me find the Spark relation's default source:

- -

Spark Shell command -spark-shell --master local[4] \ ---conf spark.driver.extraClassPath=/customListener-1.0-SNAPSHOT.jar \ ---conf spark.extraListeners=listener.MyListener \ ---jars /WillLibrary.jar \ ---packages com.microsoft.azure.kusto:kusto_spark_3.0_2.12:3.0.0 -Code inside customListener:

- -

try{ - <a href="http://log.info">log.info</a>("Trying Kusto DefaultSource"); - MyListener.class.getClassLoader().loadClass("com.microsoft.kusto.spark.datasource.DefaultSource"); - <a href="http://log.info">log.info</a>("Got Kusto DefaultSource!!!!"); - } catch(ClassNotFoundException e){ - <a href="http://log.info">log.info</a>("Could not get Kusto DefaultSource"); - } -Logs indicating it still can't find the class when using --packages. -2022-07-14 10:47:35,997 INFO MyLogger: MYLISTENER: ON JOB START -2022-07-14 10:47:35,997 INFO MyLogger: Trying wjtestlib.WillLibrary -2022-07-14 10:47:36,000 INFO 2022-07-14 10:47:36,052 INFO MyLogger: Trying LogicalRelation -2022-07-14 10:47:36,053 INFO MyLogger: Got logical relation -2022-07-14 10:47:36,053 INFO MyLogger: Trying Kusto DefaultSource -2022-07-14 10:47:36,064 INFO MyLogger: Could not get Kusto DefaultSource -😢

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-14 11:59:07
-
-

*Thread Reply:* what if you load your listener using also packages?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-14 12:00:38
-
-

*Thread Reply:* That's how I'm doing it locally using spark.conf: -spark.jars.packages com.google.cloud.bigdataoss:gcs_connector:hadoop3-2.2.2,io.delta:delta_core_2.12:1.0.0,org.apache.iceberg:iceberg_spark3_runtime:0.12.1,io.openlineage:openlineage_spark:0.9.0

- - - -
- 👀 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 12:20:47
-
-

*Thread Reply:* @Maciej Obuchowski - You beautiful bearded man! -🙏 -2022-07-14 11:14:21,266 INFO MyLogger: Trying LogicalRelation -2022-07-14 11:14:21,266 INFO MyLogger: Got logical relation -2022-07-14 11:14:21,266 INFO MyLogger: Trying org.apache.iceberg.catalog.Catalog -2022-07-14 11:14:21,295 INFO MyLogger: Got org.apache.iceberg.catalog.Catalog!!!! -2022-07-14 11:14:21,295 INFO MyLogger: Trying Kusto DefaultSource -2022-07-14 11:14:21,361 INFO MyLogger: Got Kusto DefaultSource!!!! -I ended up setting my spark-shell like this (and used --jars for my custom spark listener since it's not on Maven).

- -

spark-shell --master local[4] \ ---conf spark.extraListeners=listener.MyListener \ ---packages org.apache.iceberg:iceberg_spark_runtime_3.1_2.12:0.13.0,com.microsoft.azure.kusto:kusto_spark_3.0_2.12:3.0.0 \ ---jars customListener-1.0-SNAPSHOT.jar -So, now I just need to figure out how Databricks differs from this approach 😢

- - - -
- 😂 Maciej Obuchowski, Jakub Dardziński, Hanna Moazam -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-07-14 12:21:35
-
-

*Thread Reply:* This is an annoying detail about Java ClassLoaders and the way Spark loads extra jars/packages

- -

Remember Java's ClassLoaders are hierarchical - there are parent ClassLoaders and child ClassLoaders. Parents can't see their children's classes, but children can see their parent's classes.

- -

When you use --spark.driver.extraClassPath , you're adding a jar to the main application ClassLoader. But when you use --jars or --packages, you're instructing the Spark application itself to load the extra jars into its own ClassLoader - a child of the main application ClassLoader that the Spark code creates and manages separately. Since your listener class is loaded by the main application ClassLoader, it can't see any classes that are loaded by the Spark child ClassLoader. Either both jars need to be on the driver classpath or both jars need to be loaded by the --jar or --packages configuration parameter

- - - -
- 🙌 Will Johnson, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-07-14 12:26:15
-
-

*Thread Reply:* In Databricks, we were not able to simply use the --packages argument to load the listener, which is why we have that init script that copies the jar into the classpath that Databricks uses for application startup (the main ClassLoader). You need to copy your visitor jar into the same location so that both jars are loaded by the same ClassLoader and can see each other

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-07-14 12:29:09
-
-

*Thread Reply:* (as an aside, this is one of the major drawbacks of the java agent approach and one reason why all the documentation recommends using the spark.jars.packages configuration parameter for loading the OL library - it guarantees that any DataSource nodes loaded by the Spark ClassLoader can be seen by the OL library and we don't have to use reflection for everything)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 12:30:25
-
-

*Thread Reply:* @Michael Collado Thank you so much for the reply. The challenge is that Databricks has their own mechanism for installing libraries / packages.

- -

https://docs.microsoft.com/en-us/azure/databricks/libraries/

- -

These packages are installed on databricks AFTER spark is started and the physical files are located in a folder that is different than the main classpath.

- -

I'm going to reach out to Databricks and see if we can get any guidance on this 😢

-
-
docs.microsoft.com
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 12:31:32
-
-

*Thread Reply:* Unfortunately, I can't ask users to install their packages on Databricks in a non-standard way (e.g. via an init script) because no one will follow that recommendation.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-07-14 12:32:46
-
-

*Thread Reply:* yeah, I'd prefer if we didn't need an init script to get OL on Databricks either 🤷‍♂️:skintone4:

- - - -
- 🤣 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-17 01:03:02
-
-

*Thread Reply:* Quick update: -• Turns out using a class loader from a Scala spark listener does not have this problem. -• https://stackoverflow.com/questions/7671888/scala-classloaders-confusion -• I'm trying to use URLClassLoader as recommended by a few MSFT folks and point it at the /local_disk0/tmp folder. -• https://stackoverflow.com/questions/17724481/set-classloader-different-directory -• I'm not having luck so far but hoping I can reason about it tomorrow and Monday. This is blocking us from adding additional data sources that are not pre-installed on databricks 😢

-
-
Stack Overflow
- - - - - - - - - - - - - - - - - -
-
-
Stack Overflow
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-18 05:45:59
-
-

*Thread Reply:* Can't help you now, but I'd love if you dumped the knowledge you've gained through this process into some doc on new OpenLineage doc site 🙏

- - - -
- 👍 Hanna Moazam -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-07-18 05:48:15
-
-

*Thread Reply:* We'll definitely put all of it together as a reference for others, and hopefully have a solution by the end of it too

- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-13 12:06:24
-
-

@channel The next OpenLineage TSC meeting is tomorrow at 10 am PT! https://openlineage.slack.com/archives/C01CK9T7HKR/p1657204421157959

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Willy Lulciuc, Maciej Obuchowski -
- -
- 💯 Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
David Cecchi - (david_cecchi@cargill.com) -
-
2022-07-13 16:32:12
-
-

check this out folks - marklogic datahub flow lineage into OL/marquez with jobs and runs and more. i would guess this is a pretty narrow use case but it went together really smoothly and thought i'd share sometimes it's just cool to see what people are working on

- -
- - - - - - - -
- - -
- 🍺 Willy Lulciuc, Conor Beverland, Maciej Obuchowski, Paweł Leszczyński -
- -
- ❤️ Willy Lulciuc, Conor Beverland, Julien Le Dem, Michael Robinson, Maciej Obuchowski, Minkyu Park -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-13 16:40:48
-
-

*Thread Reply:* Soo cool, @David Cecchi 💯💯💯. I’m not familiar with marklogic, but pretty awesome ETL platform and the lineage graph looks 👌! Did you have to write any custom integration code? Or where you able to use our off the self integrations to get things working? (Also, thanks for sharing!)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Cecchi - (david_cecchi@cargill.com) -
-
2022-07-13 16:57:29
-
-

*Thread Reply:* team had to write some custom stuff but it's all framework so it can be repurposed not rewritten over and over. i would see this as another "Platform" in the context of the integrations semantic OL uses, so no, we didn't start w/ an existing solution. just used internal hooks and then called lineage APIs.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-13 17:02:53
-
-

*Thread Reply:* Ah totally make sense. Would you be open to a brief presentation and/or demo in a future OL community meeting? The community is always looking to hear how OL is used in the wild, and this seems aligned with that (assuming you can talk about the implementation at a high-level)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-13 17:05:35
-
-

*Thread Reply:* No pressure, of course 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Cecchi - (david_cecchi@cargill.com) -
-
2022-07-13 17:08:50
-
-

*Thread Reply:* ha not feeling any pressure. familiar with the intentions and dynamic. let's keep that on radar - i don't keep tabs on community meetings but mid/late august would be workable. and to be clear, this is being used in the wild in a sandbox 🙂.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-13 17:12:55
-
-

*Thread Reply:* Sounds great, and a reasonable timeline! (cc @Michael Robinson can follow up). Even if it’s in a sandbox, talking about the level of effort helps with improving our APIs or sharing with others how smooth it can be!

- - - -
- 👍 David Cecchi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-13 17:18:27
-
-

*Thread Reply:* chiming in as well to say this is really cool 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-07-13 18:26:28
-
-

*Thread Reply:* Nice! Would this become a product feature in Marklogic Data Hub?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mark Chiarelli - (mark.chiarelli@marklogic.com) -
-
2022-07-14 11:07:42
-
-

*Thread Reply:* MarkLogic is a multi-model database and search engine. This implementation triggers off the MarkLogic Datahub Github batch records created when running the datahub flows. Just a toe in the water so far.

-
- - - - - - - -
-
Location
- San Carlos, CA USA -
- -
-
URL
- <http://developer.marklogic.com> -
- -
-
Repositories
- 23 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-14 20:31:18
-
-

@Ross Turk, in the OL community meeting today, you presented the new doc site (awesome!) that isn’t up (yet!), but I’ve been talk with @Julien Le Dem about the usage of _producer and would like to add a section on the use / function of _producer in OL events. Feel like the new doc site would be a great place to add this! Let me know when’s a good time to start crowd sourcing content for the site

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-14 20:37:25
-
-

*Thread Reply:* That sounds like a good idea to me. Be good to have some guidance on that.

- -

The repo is open for business! Feel free to add the page where you think it fits.

-
- - - - - - - -
-
Website
- <https://docs.openlineage.io> -
- -
-
Stars
- 1 -
- - - - - - - - -
- - - -
- ❤️ Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-14 20:42:09
-
-

*Thread Reply:* OK! Let’s do this!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-14 20:59:36
-
-

*Thread Reply:* @Ross Turk, feel free to assign to me https://github.com/OpenLineage/docs/issues/1!

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-14 20:39:26
-
-

Hey everyone! As Willy says, there is a new documentation site for OpenLineage in the works.

- -

It’s not quite ready to be, uh, a proper reference yet. But it’s not too far away. Help us get there by submitting issues, making page stubs, and adding sections via PR.

- -

https://github.com/openlineage/docs/

-
- - - - - - - -
-
Website
- <https://docs.openlineage.io> -
- -
-
Stars
- 1 -
- - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-14 20:43:09
-
-

*Thread Reply:* Thanks, @Ross Turk for finding a home for more technical / how-to docs… long overdue 💯

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-14 21:22:09
-
-

*Thread Reply:* BTW you can see the current site at http://openlineage.io/docs/ - merges to main will ship a new site.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-14 21:23:32
-
-

*Thread Reply:* great, was using <a href="http://docs.openlineage.io">docs.openlineage.io</a> … we’ll eventually want the docs to live under the docs subdomain though?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-14 21:25:32
-
-

*Thread Reply:* TBH I activated GitHub Pages on the repo expecting it to live at openlineage.github.io/docs, thinking we could look at it there before it's ready to be published and linked in to the website

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-14 21:25:39
-
-

*Thread Reply:* and it came live at openlineage.io/docs 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-14 21:26:06
-
-

*Thread Reply:* nice and sounds good 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-14 21:26:31
-
-

*Thread Reply:* still do not understand why, but I'll take it as a happy accident. we can move to docs.openlineage.io easily - just need to add the A record in the LF infra + the CNAME file in the static dir of this repo

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
shweta p - (shweta.pbs@gmail.com) -
-
2022-07-15 09:10:46
-
-

Hi #general, how do i link the tasks of airflow which may not have any input or output datasets as they are running some conditions. the dataset is generated only on the last task

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
shweta p - (shweta.pbs@gmail.com) -
-
2022-07-15 09:11:25
-
-

In the lineage, though there is option to link the parent , it doesnt show up the lineage of job -> job

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
shweta p - (shweta.pbs@gmail.com) -
-
2022-07-15 09:11:43
-
-

does it need to be job -> dataset -> job only ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-15 14:41:30
-
-

*Thread Reply:* yes - openlineage is job -> dataset -> job. particularly, the model is designed to observe the movement of data

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-15 14:43:41
-
-

*Thread Reply:* the spec is based around run events, which are observed states of job runs. jobs are observed to see how they affect datasets, and that relationship is what OpenLineage traces

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilya Davidov - (idavidov@marpaihealth.com) -
-
2022-07-18 11:32:06
-
-

👋 Hi everyone!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilya Davidov - (idavidov@marpaihealth.com) -
-
2022-07-18 11:32:51
-
-

i am looking for some information regarding openlineage integration with AWS Glue jobs/workflows

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilya Davidov - (idavidov@marpaihealth.com) -
-
2022-07-18 11:33:32
-
-

i am wondering if it possible and someone already give a try and maybe documented it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john.thomas@astronomer.io) -
-
2022-07-18 15:16:54
-
-

*Thread Reply:* This thread covers glue in some detail: https://openlineage.slack.com/archives/C01CK9T7HKR/p1637605977118000?threadts=1637605977.118000&cid=C01CK9T7HKR|https://openlineage.slack.com/archives/C01CK9T7HKR/p1637605977118000?threadts=1637605977.118000&cid=C01CK9T7HKR

-
- - -
- - - } - - Francis McGregor-Macdonald - (https://openlineage.slack.com/team/U02K353H2KF) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john.thomas@astronomer.io) -
-
2022-07-18 15:17:49
-
-

*Thread Reply:* TL;Dr: you can use the spark integration to capture some lineage, but it's not comprehensive

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Cecchi - (david_cecchi@cargill.com) -
-
2022-07-18 16:29:02
-
-

*Thread Reply:* i suspect there will be opportunities to influence AWS to be a "fast follower" if OL adoption and buy-in starts to feel authentically real in non-aws portions of the stack. i discussed OL casually with AWS analytics leadership (Rahul Pathak) last winter and he seemed curious and open to this type of idea. to be clear, ~95% chance he's forgotten that conversation now but hey it's still something.

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2022-07-18 19:34:32
-
-

*Thread Reply:* There are a couple of aws people here (including me) following.

- - - -
- 👍 David Cecchi, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mikkel Kringelbach - (mikkel@theoremlp.com) -
-
2022-07-19 18:01:46
-
-

Hi all, I have been playing around with Marquez for a hackday. I have been able to get some lineage information loaded in (using the local docker version for now). I have been trying set the location (for the link) and description information for a job (the text saying "Nothing to show here") but I haven't been able to figure out how to do this using the /lineage api. Any help would be appreciated.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-19 20:11:38
-
-

*Thread Reply:* I believe what you want is the DocumentationJobFacet. It adds a description property to a job.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-19 20:13:03
-
-

*Thread Reply:* You can see a Python example here, in the Airflow integration: https://github.com/OpenLineage/OpenLineage/blob/65a5f021a1ba3035d5198e759587737a05b242e1/integration/airflow/openlineage/airflow/adapter.py#L217

- - - -
- :gratitude_thank_you: Mikkel Kringelbach -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-19 20:13:18
-
-

*Thread Reply:* (looking for a curl example…)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mikkel Kringelbach - (mikkel@theoremlp.com) -
-
2022-07-19 20:25:49
-
-

*Thread Reply:* I see, so there are special facet keys which will get translated into something special in the ui, is that correct?

- -

Are these documented anywhere?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-19 20:27:55
-
-

*Thread Reply:* Correct - info from the various OpenLineage facets are used in the Marquez UI.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-19 20:28:28
-
-

*Thread Reply:* I couldn’t find a curl example with a description field, but I did generate this one with a sql field:

- -

{ - "job": { - "name": "order_analysis.find_popular_products", - "facets": { - "sql": { - "query": "DROP TABLE IF EXISTS top_products;\n\nCREATE TABLE top_products AS\nSELECT\n product,\n COUNT(order_id) AS num_orders,\n SUM(quantity) AS total_quantity,\n SUM(price ** quantity) AS total_value\nFROM\n orders\nGROUP BY\n product\nORDER BY\n total_value desc,\n num_orders desc;", - "_producer": "https: //github.com/OpenLineage/OpenLineage/tree/0.11.0/integration/airflow", - "_schemaURL": "<https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/SqlJobFacet>" - } - }, - "namespace": "workshop" - }, - "run": { - "runId": "13460e52-a829-4244-8c45-587192cfa009", - "facets": {} - }, - "inputs": [ - ... - ], - "outputs": [ - ... - ], - "producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.11.0/integration/airflow>", - "eventTime": "2022-07-20T00: 23: 06.986998Z", - "eventType": "COMPLETE" -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-19 20:28:58
-
-

*Thread Reply:* The facets (at least, those in the core spec) are here: https://github.com/OpenLineage/OpenLineage/tree/65a5f021a1ba3035d5198e759587737a05b242e1/spec/facets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-19 20:29:19
-
-

*Thread Reply:* it’s designed so that facets can exist outside the core, in other repos, as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mikkel Kringelbach - (mikkel@theoremlp.com) -
-
2022-07-19 22:25:39
-
-

*Thread Reply:* Thank you for sharing these, I was able to get the sql query highlighting to work. But I failed to get the location link or the documentation to work. My facet attempt looked like: -{ - "facets": { - "description": "test-description-job", - "sql": { - "query": "SELECT QUERY", - "_schema": "<https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/SqlJobFacet>" - }, - "documentation": { - "documentation": "Test docs?", - "_schema": "<https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/DocumentationJobFacet>" - }, - "link": { - "type": "", - "url": "<a href="http://www.google.com/test_url">www.google.com/test_url</a>", - "_schema": "<https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/SourceCodeLocationJobFacet>" - } - } -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mikkel Kringelbach - (mikkel@theoremlp.com) -
-
2022-07-19 22:36:55
-
-

*Thread Reply:* I got the documentation link to work by renaming the property from documentation -> description . I still haven't been able to get the external link to work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-20 10:33:36
-
-

Hey all. I've been doing a cleanup of issues on GitHub. If I've closed your issue that you think is still relevant, please reopen it and let us know.

- - - -
- 🙌 Jakub Dardziński, Michael Collado, Will Johnson, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-07-21 16:09:08
-
-

Is https://databricks.com/blog/2022/06/08/announcing-the-availability-of-data-lineage-with-unity-catalog.html - are they using OpenLineage? I know there’s been a lot of work to make sure OpenLineage integrates with Databricks, even earlier this year.

-
-
Databricks
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-21 16:25:47
-
-

*Thread Reply:* There’s a good integration between OL and Databricks for pulling metadata out of running Spark clusters. But there’s not currently a connection between OL and the Unity Catalog.

- -

I think it would be cool to see some discussions start to develop around it 👍

- - - -
- 👍 Sheeri Cabral (Collibra), Julius Rentergent -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-07-21 16:26:44
-
-

*Thread Reply:* Absolutely. I saw some mention of APIs and access, and was wondering if maybe they used OpenLineage as a framework, which would be awesome.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-07-21 16:30:55
-
-

*Thread Reply:* (and since Azure Databricks uses it - https://openlineage.io/blog/openlineage-microsoft-purview/ I wasn’t sure about Unity Catalog)

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-07-21 16:56:24
-
-

*Thread Reply:* We're in the early stages of discussion regarding an OpenLineage integration for Unity. You showing interest would help increase the priority of that on the DB side.

- - - -
- 👍 Sheeri Cabral (Collibra), Will Johnson, Thijs Koot -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Thijs Koot - (thijs.koot@gmail.com) -
-
2022-07-27 11:41:48
-
-

*Thread Reply:* I'm interested in Databricks enabling an openlineage endpoint, serving as a catalogue. Similar to how they provide hosted MLFlow. I can mention this to our Databricks reps as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Joao Vicente - (joao.diogo.vicente@gmail.com) -
-
2022-07-23 04:09:55
-
-

Hi all -I am trying to find the state of columnLineage in OL -I see a proposal and some examples in https://github.com/OpenLineage/OpenLineage/search?q=columnLineage&type=|https://github.com/OpenLineage/OpenLineage/search?q=columnLineage&type= but I can't find it in the spec. -Can anyone shed any light why this would be the case?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Joao Vicente - (joao.diogo.vicente@gmail.com) -
-
2022-07-23 04:12:26
-
-

*Thread Reply:* Link to spec where I looked https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.json

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Joao Vicente - (joao.diogo.vicente@gmail.com) -
-
2022-07-23 04:37:11
-
-

*Thread Reply:* My bad. I realize now that column lineage has been implemented as a facet, hence not visible in the main spec https://github.com/OpenLineage/OpenLineage/search?q=ColumnLineageDatasetFacet&type=|https://github.com/OpenLineage/OpenLineage/search?q=ColumnLineageDatasetFacet&type=

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-07-26 19:37:54
-
-

*Thread Reply:* It is supported in the Spark integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-07-26 19:39:13
-
-

*Thread Reply:* @Paweł Leszczyński could you add the Column Lineage facet here in the spec? https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md#standard-facets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-24 16:24:15
-
-

SundayFunday

- -

Putting together some internal training for OpenLineage and highlighting some of the areas that have been useful to me on my journey with OpenLineage. Many thanks to @Michael Collado, @Maciej Obuchowski, and @Paweł Leszczyński for the continued technical support and guidance.

- -
- - - - - - - -
- - -
- ❤️ Hanna Moazam, Ross Turk, Minkyu Park, Atif Tahir, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-24 16:26:59
-
-

*Thread Reply:* @Ross Turk I still want to contribute something like this to the OpenLineage docs / new site but the bar for an internal doc is lower in my mind 😅

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-25 11:49:54
-
-

*Thread Reply:* 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-25 11:50:54
-
-

*Thread Reply:* @Will Johnson happy to help you with docs, when the time comes! sketching outline --> editing, whatever you need

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-07-26 19:39:56
-
-

*Thread Reply:* This looks nice by the way.

- - - -
- ❤️ Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sylvia Seow - (sylviaseow@gmail.com) -
-
2022-07-26 09:06:28
-
-

hi all, really appreciate if anyone could help. I have been trying to create a poc project with openlineage with dbt. attached will be the pip list of the openlineage packages that i have. However, when i run "dbt-ol"command, it prompted as öpen as file, instead of running as a command. the regular dbt run can be executed without issue. i would want i had done wrong or if any configuration that i have missed. Thanks a lot

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-26 10:39:57
-
-

*Thread Reply:* do you have proper execute permissions?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-26 10:41:09
-
-

*Thread Reply:* not sure how that works on windows, but it just looks like it does not recognize dbt-ol as executable

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sylvia Seow - (sylviaseow@gmail.com) -
-
2022-07-26 10:43:00
-
-

*Thread Reply:* yes i have admin rights. how to make this as executable?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sylvia Seow - (sylviaseow@gmail.com) -
-
2022-07-26 10:43:25
-
-

*Thread Reply:* btw do we have a sample docker image where dbt-ol can run?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-26 17:33:08
-
-

*Thread Reply:* I have also never tried on Windows 😕 but you might try python3 dbt-ol run?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sylvia Seow - (sylviaseow@gmail.com) -
-
2022-07-26 21:03:43
-
-

*Thread Reply:* will try that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-26 16:41:04
-
-

Running a single unit test on the Spark Integration - How it works with the different modules?

- -

Prior to splitting up the OpenLineage spark integration, I could run a command like the one below to test a single test or even a single test method. Now I get a failure and it's pointing to the app: module. Can anyone share the right syntax for running a unit test with the current package structure? Thank you!!

- -

```wj@DESKTOP-ECF9QME:~/repos/OpenLineageWill/integration/spark$ ./gradlew test --tests io.openlineage.spark.agent.OpenLineageSparkListenerTest

- -

> Task :app:test FAILED

- -

SUCCESS: Executed 0 tests in 872ms

- -

FAILURE: Build failed with an exception.

- -

** What went wrong: -Execution failed for task ':app:test'. -> No tests found for given includes: io.openlineage.spark.agent.OpenLineageSparkListenerTest

- -

** Try: -> Run with --stacktrace option to get the stack trace. -> Run with --info or --debug option to get more log output. -> Run with --scan to get full insights.

- -

** Get more help at https://help.gradle.org

- -

Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.

- -

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

- -

See https://docs.gradle.org/7.4/userguide/command_line_interface.html#sec:command_line_warnings

- -

BUILD FAILED in 2s -18 actionable tasks: 4 executed, 14 up-to-date```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-07-27 01:54:31
-
-

*Thread Reply:* This may be a result of splitting Spark integration into multiple submodules: app, shared, spark2, spark3, spark32, etc. If the test case is from shared submodule (this one looks like that), you could try running: -./gradlew :shared:test --tests io.openlineage.spark.agent.OpenLineageSparkListenerTest

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-07-27 03:18:42
-
-

*Thread Reply:* @Paweł Leszczyński, I tried running that command, and I get the following error:

- -

```> Task :shared:test FAILED

- -

FAILURE: Build failed with an exception.

- -

** What went wrong: -Execution failed for task ':shared:test'. -> No tests found for given includes: io.openlineage.spark.agent.OpenLineageSparkListenerTest

- -

** Try: -> Run with --stacktrace option to get the stack trace. -> Run with --info or --debug option to get more log output. -> Run with --scan to get full insights.

- -

** Get more help at https://help.gradle.org

- -

Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.

- -

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

- -

See https://docs.gradle.org/7.4/userguide/command_line_interface.html#sec:command_line_warnings

- -

BUILD FAILED in 971ms -6 actionable tasks: 2 executed, 4 up-to-date```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-07-27 03:24:41
-
-

*Thread Reply:* When running build and test for all the submodules, I can see outputs for tests in different submodules (spark3, spark2 etc), but for some reason, I cannot find any indication that the tests in -OpenLineage/integration/spark/app/src/test/java/io/openlineage/spark/agent/lifecycle/plan -are being run at all.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-07-27 03:42:43
-
-

*Thread Reply:* That’s interesting. Let’s ask @Tomasz Nazarewicz about that.

- - - -
- 👍 Hanna Moazam -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-07-27 03:57:08
-
-

*Thread Reply:* For reference, I attached the stdout and stderr messages from running the following: -./gradlew :shared:spotlessApply &amp;&amp; ./gradlew :app:spotlessApply &amp;&amp; ./gradlew clean build test

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2022-07-27 04:27:23
-
-

*Thread Reply:* I'll look into it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2022-07-28 05:17:36
-
-

*Thread Reply:* Update: some test appeared to not be visible after split, that's fixed but now I have to solevr some dependency issues

- - - -
- 🙌 Hanna Moazam, Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-07-28 05:19:16
-
-

*Thread Reply:* That's great, thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-07-29 06:05:55
-
-

*Thread Reply:* Hi Tomasz, thanks so much for looking into this. Is this your PR (https://github.com/OpenLineage/OpenLineage/pull/953) that fixes the whole issue, or is there still some work to do to solve the dependency issues you mentioned?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2022-07-29 06:07:58
-
-

*Thread Reply:* I'm still testing it, should've changed it to draft, sorry

- - - -
- 👍 Hanna Moazam, Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-07-29 06:08:59
-
-

*Thread Reply:* No worries! If I can help with testing or anything please let me know!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2022-07-29 06:09:29
-
-

*Thread Reply:* Will do! Thanks :)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-08-02 11:06:31
-
-

*Thread Reply:* Hi @Tomasz Nazarewicz, if possible, could you please share an estimated timeline for resolving the issue? We have 3 PRs which we are either waiting to open or to update which are dependent on the tests.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2022-08-02 13:45:34
-
-

*Thread Reply:* @Hanna Moazam hi, it's quite difficult to do that because the issue is that all the tests are passing when I execute ./gradlew app:test -but one is failing with ./gradlew app:build

- -

but if it fixes your problem I can disable this test for now and make a PR without it, then you can maybe unblock your stuff and I will have more time to investigate the issue.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-08-02 14:54:45
-
-

*Thread Reply:* Oh that's a strange issue. Yes that would be really helpful if you can, because we have some tests we implemented which we need to make sure pass as expected.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-08-02 14:54:52
-
-

*Thread Reply:* Thank you for your help Tomasz!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2022-08-03 06:12:07
-
-

*Thread Reply:* @Hanna Moazam https://github.com/OpenLineage/OpenLineage/pull/980 here is the pull request with the changes

-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 Hanna Moazam -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2022-08-03 06:12:26
-
-

*Thread Reply:* its waiting for review currently

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-08-03 06:20:41
-
-

*Thread Reply:* Thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-26 18:44:47
-
-

Is there any doc yet about column level lineage? I see a spec for the facet here: https://github.com/openlineage/openlineage/issues/148

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-07-26 19:41:13
-
-

*Thread Reply:* The doc site would benefit from a page about it. Maybe @Paweł Leszczyński?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-07-27 01:59:27
-
-

*Thread Reply:* Sure, it’s already on my list, will do

- - - -
- :gratitude_thank_you: Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-07-29 07:55:40
-
-

*Thread Reply:* https://openlineage.io/docs/integrations/spark/spark_column_lineage

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- ✅ Conor Beverland -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-26 20:03:55
-
-

maybe another question for @Paweł Leszczyński: I was watching the Airflow summit talk that you and @Maciej Obuchowski did ( very nice! ). How is this exposed? I'm wondering if it shows up as an edge on the graph in Marquez? ( I guess it may be tracked as a parent run and if so probably does not show on the graph directly at this time? )

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-07-27 04:08:18
-
-

*Thread Reply:* To be honest, I have never seen that in action and would love to have that in our documentation.

- -

@Michael Collado or @Maciej Obuchowski: are you able to create some doc? I think one of you was working on that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-27 04:24:19
-
-

*Thread Reply:* Yes, parent run

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
shweta p - (shweta.pbs@gmail.com) -
-
2022-07-27 01:29:05
-
-

Hi #general, there has been a issue with airflow+dbt+openlineage. This was working fine with openlineage-dbt v0.11.0 but there has been some change to the typeextensions due to which i had to upgrade to latest dbt (from 1.0.0 to 1.1.0) and now the dbt-ol is failing with schema version support (the version generated is v5 vs dbt-ol supports only v4). Has anyone else been able to fix this

- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-27 04:47:18
-
-

*Thread Reply:* Will take a look

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-27 04:47:40
-
-

*Thread Reply:* But generally this support message is just a warning

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-27 10:04:20
-
-

*Thread Reply:* @shweta p any actual error you've found? -I've tested it with dbt-bigquery on 1.1.0 and it works despite warning:

- -

➜ small OPENLINEAGE_URL=<http://localhost:5050> dbt-ol build -Running OpenLineage dbt wrapper version 0.11.0 -This wrapper will send OpenLineage events at the end of dbt execution. -14:03:16 Running with dbt=1.1.0 -14:03:17 Found 2 models, 3 tests, 0 snapshots, 0 analyses, 191 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics -14:03:17 -14:03:17 Concurrency: 2 threads (target='dev') -14:03:17 -14:03:17 1 of 5 START table model dbt_test1.my_first_dbt_model .......................... [RUN] -14:03:21 1 of 5 OK created table model dbt_test1.my_first_dbt_model ..................... [CREATE TABLE (2.0 rows, 0 processed) in 3.31s] -14:03:21 2 of 5 START test unique_my_first_dbt_model_id ................................. [RUN] -14:03:22 2 of 5 PASS unique_my_first_dbt_model_id ....................................... [PASS in 1.55s] -14:03:22 3 of 5 START view model dbt_test1.my_second_dbt_model .......................... [RUN] -14:03:24 3 of 5 OK created view model dbt_test1.my_second_dbt_model ..................... [OK in 1.38s] -14:03:24 4 of 5 START test not_null_my_second_dbt_model_id .............................. [RUN] -14:03:24 5 of 5 START test unique_my_second_dbt_model_id ................................ [RUN] -14:03:25 5 of 5 PASS unique_my_second_dbt_model_id ...................................... [PASS in 1.38s] -14:03:25 4 of 5 PASS not_null_my_second_dbt_model_id .................................... [PASS in 1.42s] -14:03:25 -14:03:25 Finished running 1 table model, 3 tests, 1 view model in 8.44s. -14:03:25 -14:03:25 Completed successfully -14:03:25 -14:03:25 Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5 -Artifact schema version: <https://schemas.getdbt.com/dbt/manifest/v5.json> is above dbt-ol supported version 4. This might cause errors. -Emitting OpenLineage events: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00&lt;00:00, 274.42it/s] -Emitted 10 openlineage events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-27 20:39:21
-
-

When will the next version of OpenLineage be available tentatively?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-27 20:41:44
-
-

*Thread Reply:* I think it's safe to say we'll see a release by the end of next week

- - - -
- :gratitude_thank_you: Fenil Doshi -
- -
- 👍 Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yehuda Korotkin - (yehudak@elementor.com) -
-
2022-07-28 04:02:06
-
-

👋 Hi everyone! -Yesterday was a great presentation by @Julien Le Dem that talked about OpenLineage and did grate comparison between OL and Open-Telemetry, (i wrote a small summary here: https://bit.ly/3z5caOI )

- -

Julian’s charm sparked inside me curiosity especially regarding OL in streaming. -I saw the design/architecture of OL I got some questions/discussions that I would like to understand better.

- -

In the context of streaming jobs reporting “start job” - “end job” might be more relevant in the context of a batch mode. -or do you mean reporting start job/end job should be processed each event?

  • and this will be equivalent to starting job each row in a table via UDF, for example.
  • -
- -

Thank you in advance

-
-
linkedin.com
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski, Michael Robinson, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-28 08:50:44
-
-

*Thread Reply:* Welcome to the community!

- -

We talked about this exact topic in the most recent community call. -https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting#MonthlyTSCmeeting-Nextmeeting:Nov10th2021(9amPT)

- -

Discussion: streaming in Flink integration -• Has there been any evolution in the thinking on support for streaming? - ◦ Julien: start event, complete event, snapshots in between limited to certain number per time interval - ◦ Paweł: we can make the snapshot volume configurable -• Does Flink support sending data to multiple tables like Spark? - ◦ Yes, multiple outputs supported by OpenLineage model - ◦ Marquez, the reference implementation of OL, combines the outputs

- - - -
- 🙏 Yehuda Korotkin -
- -
- ❤️ Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-28 09:56:05
-
-

*Thread Reply:* > or do you mean reporting start job/end job should be processed each event? -We definitely want to avoid tracking every single event 🙂

- -

One thing worth mentioning is that OpenLineage events are meant to be cumulative - the streaming jobs start, run, and eventually finish or restart. In the meantime, we capture additional events "in the middle" - for example, on Apache Flink checkpoint, or every few minutes - where we can emit additional information connected to the state of the job.

- - - -
- 🙏 Yehuda Korotkin -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yehuda Korotkin - (yehudak@elementor.com) -
-
2022-07-28 11:11:17
-
-

*Thread Reply:* @Will Johnson and @Maciej Obuchowski Thank you for your answer

- -

jobs start, run, and eventually finish or restart

- -

This is the perspective that I have a hard time understanding in the context of streaming.

- -

The classic streaming job should always be on it should not be “finish” event (Except failure). -usually, streaming data is “dripping”.

- -

It is possible to understand if the job starts/ends in the resolution of the running application and represents when the application begin and when it failed.

- -

if you do start/stop events from the checkpoints on Flink it might be the wrong representation instead use the concept of event-driven for example reporting state.

- -

What do you think?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yehuda Korotkin - (yehudak@elementor.com) -
-
2022-07-28 11:11:36
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-28 12:00:34
-
-

*Thread Reply:* The idea is that jobs usually get upgraded - for example, you change Apache Flink version, increase resources, or change the structure of a job - that's the difference for us. The stop events make sense, because if you for example changed SQL of your Flink SQL job, you probably would want this to be captured - from X to Y job was running with older SQL version well, but after change, the second run started and throughput dropped to 10% of the previous one.

- -

> if you do start/stop events from the checkpoints on Flink it might be the wrong representation instead use the concept of event-driven for example reporting state. -But this is an misunderstanding 🙂 -The information exposed from a checkpoints are in addition to start and stop events.

- -

We want to get information from running job - I just argue that sometimes end of a streaming job is also relevant.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-28 12:01:16
-
-

*Thread Reply:* The checkpoint would be captured as a new eventType: RUNNING - do I miss something why you want to add StateFacet?

- - - -
- 👍 Yehuda Korotkin -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yehuda Korotkin - (yehudak@elementor.com) -
-
2022-07-28 14:24:03
-
-

*Thread Reply:* About argue - it’s depends on what the definition of job in streaming mode, i agree that if you already have ‘job’ you want to know about the job more information.

- -

each event that entering the sub process (job) should do REST call “Start job” and “End job” ?

- -

Nope, I just represented two possible ways that i thought, - or StateFacet - or add new Event type eg. RUNNING 😉

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-28 09:14:28
-
-

Hi everyone, I’d like to request a release to publish the new Flink integration (thanks, @Maciej Obuchowski) and an important fix to the Spark integration (thanks, @Paweł Leszczyński). As per our policy here, 3 +1s from committers will authorize an immediate release. Thanks!

- - - -
- ➕ Maciej Obuchowski, Paweł Leszczyński, Willy Lulciuc, Will Johnson, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-28 17:30:33
-
-

*Thread Reply:* Thanks for the +1s. We will initiate the release by Tuesday.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Barak F - (fargoun@gmail.com) -
-
2022-07-28 10:30:15
-
-

Static code annotations for OpenLineage: hi everyone, i heard yesterday a great lecture by @Julien Le Dem on OpenLineage, and as i'm very interested in this area, i wanted to raise a question: are there any plans to have OpenLineage-like annotations on actual code (e.g. Spark, AirFlow, arbitrary code) to allow deducing some of the lineage informtion from static code analysis?

- -

The reason i'm asking this is because while OpenLineage does a great job of integrating with multiple platforms (AirFlow, Dbt, Spark), some companies still have a lot of legacy-related data processing stack that will probably not get full OpenLineage (as it's a one-off, and the companies themselves will probably won't implement OpenLineage support for their custom frameworks). -Having some standard way to annotate code with information like: "reads from X; writes to Y; Job name regexp: Z", may allow writing a "generic" OpenLineage colelctor that can go over the source code, collect this configuration information and then use it when constructing the lineage graph (even though it won't be as complete and full as the full OpenLineage info).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-03 08:30:15
-
-

*Thread Reply:* I think this is an interesting idea, however, just the static analysis does not convey any runtime information.

- -

We're doing something similar within Airflow now, but as a fallback mechanism: https://github.com/OpenLineage/OpenLineage/pull/914

- -

You can manually annotate DAG with information instead of writing extractor for your operator. This still gives you runtime information. Similar features might get added to other integrations, especially with such a vast scope as Airflow has - but I think it's unlikely we'd work on a feature for just statically traversing code without runtime context.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Barak F - (fargoun@gmail.com) -
-
2022-08-03 14:25:31
-
-

*Thread Reply:* Thanks for the detailed response @Maciej Obuchowski! It seems like this solution is specific only to AirFlow, and i wonder why wouldn't we generalize this outside of just AirFlow? My thinking is that there are other areas where there is vast scope (e.g. arbitrary code that does data manipulations), and without such an option, the only path is to provide full runtime information via building your own extractor, which might be a bit hard/expensive to do. -If i understand your response correctly, then you assume that OpenLineage can get wide enough "native" support across the stack without resorting to a fallback like 'static code analysis'. Is that your base assumption?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Petr Hajek - (petr.hajek@profinit.eu) -
-
2022-07-29 04:36:03
-
-

Hi all, does anybody have an experience extracting Airflow lineage using Marquez as documented here https://www.astronomer.io/guides/airflow-openlineage/#generating-and-viewing-lineage-data ? -We tested it on our Airflow instance with Marquez hoping to get the standard .json files describing lineage in accord with open-lineage model as described in https://json-schema.org/draft/2020-12/schema. -But there seems to be only one GET method related to lineage export in Marquez API library called "Get a lineage graph". This produces quite different .json structure than what we know from open-lineage. Could anybody help if there is a chance to get open-lineage .json structure from Marquez?

-
-
astronomer.io
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-29 12:58:38
-
-

*Thread Reply:* The query API has a different spec than the reporting API, so what you’d get from Marquez would look different from what Marquez receives.

- -

Few ideas:

- -
  1. you could send the lineage to a pipedream endpoint to inspect, if you’re just trying to experiment
  2. you could grab them from the lineage table in Marquez’s postgres
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Petr Hajek - (petr.hajek@profinit.eu) -
-
2022-07-30 16:29:24
-
-

*Thread Reply:* ok, now I understand, thank you

- - - -
- 👍 Jan Kopic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-03 08:25:57
-
-

*Thread Reply:* FYI we want to have something like that too: https://github.com/MarquezProject/marquez/issues/1927

- -

But if you need just the raw events endpoint, without UI, then Marquez might be overkill for your needs

-
- - - - - - - -
-
Comments
- 2 -
- -
-
Milestone
- <a href="https://github.com/MarquezProject/marquez/milestone/4">Roadmap</a> -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2022-07-30 13:44:13
-
-

Hi @everyone , we are trying to extract lineage information and import into amundsen .please point us right direction to move - based on the documentation -> Databricks + marquez + amundsen is this the only way to move on ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john.thomas@astronomer.io) -
-
2022-07-30 13:49:25
-
-

*Thread Reply:* Short of implementing an open lineage endpoint in Amundsen, yes that's the right approach.

- -

The Lineage endpoint in Marquez can output the whole graph centered on a node ID, and you can use the jobs/datasets apis to grab lists of each for reference

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Barak F - (fargoun@gmail.com) -
-
2022-07-31 00:35:06
-
-

*Thread Reply:* Is your lineage information coming via OpenLineage? if so - you can quickly use the Amundsen scripts in order to load data into Amundsen, for example, see this script here: https://github.com/amundsen-io/amundsendatabuilder/blob/master/example/scripts/sample_data_loader.py

- -

Where is your lineage coming from?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2022-08-01 20:17:22
-
-

*Thread Reply:* yes @Barak F we are using open lineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Barak F - (fargoun@gmail.com) -
-
2022-08-02 01:26:18
-
-

*Thread Reply:* So, have you tried using Amundsen data builder scripts to load the lineage information into Amundsen? (maybe you'll have to "play" with those a bit)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-03 08:24:58
-
-

*Thread Reply:* AFAIK there is OpenLineage extractor: https://www.amundsen.io/amundsen/databuilder/#openlineagetablelineageextractor

- -

Not sure it solves your issue though 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2022-08-05 04:46:45
-
-

*Thread Reply:* thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-01 17:08:46
-
-

@channel -OpenLineage 0.12.0 is now available! -We added: -• an Apache Flink integration, -• support for Spark 3.3.0, -• the ability to extend column level lineage mechanism, -• an ErrorMessageRunFacet to the OpenLineage spec, -• SQLCheckExtractors, a RedshiftSQLExtractor & RedshiftDataExtractor to the Airflow integration, -• a dataset builder to the AlterTableCommand class in the Spark integration. -We changed: -• the filtering of Delta events to reduce noise, -• the flow of metadata in the Airflow integration to allow metadata from Airflow through inlets and outlets. -Thanks to all the contributors who made this release possible! -For the bug fixes and more details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.12.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.11.0...0.12.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/ (edited)

- - - -
- ❤️ Minkyu Park, Harel Shein, Willy Lulciuc, Peter Hicks, Fenil Doshi, Maciej Obuchowski, Howard Yoo, Paul Wilson Villena, Jarek Potiuk, Dinakar Sundar, Shubham Mehta, Sharanya Santhanam, Sheeri Cabral (Collibra) -
- -
- 🎉 Minkyu Park, Peter Hicks, Fenil Doshi, Howard Yoo, Jarek Potiuk, Paweł Leszczyński, Ryan Peterson -
- -
- 🚀 Minkyu Park, Howard Yoo, Jarek Potiuk -
- -
- 🙌 Minkyu Park, Willy Lulciuc, Maciej Obuchowski, Howard Yoo, Jarek Potiuk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-02 10:12:01
-
-

What is the right way of handling/parsing facets on the server side?

- -

I see the generated server side stubs are generic : https://github.com/OpenLineage/OpenLineage/blob/main/client/java/generator/src/main/java/io/openlineage/client/Generator.java#L131 and dont have any resolved facet information. -Marquez seems to have duplicated the OL model with https://github.com/MarquezProject/marquez/blob/main/api/src/main/java/marquez/service/models/LineageEvent.java#L71 and converts the incoming OL events to a “LineageEvent” for appropriate handling. Is there a cleaner approach where in the known facets can be generated in io.openlineage.server?

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-02 12:28:11
-
-

*Thread Reply:* I think the reason for server model being very generic is because new facets can be added later (also as custom facets) - and generally server wants to accept all valid events and get the facet information that it can actually use, rather than reject event because it has unknown field.

- -

Server model was added here after some discussion in Marquez which is relevant - I think @Michael Collado @Willy Lulciuc can add to that

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-02 15:54:24
-
-

*Thread Reply:* Thanks for the response. I realize the server stubs were created to support flexibility , but it also makes the parsing logic on server side a bit more complex as we need to maintain code on the server side to look for specific facets & their properties from maps or like maquez duplicate the OL model on our end with the facets we care about. Wanted to know whats the guidance around managing this server side. @Willy Lulciuc @Michael Collado Any suggestions ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-02 18:27:27
-
-

Agenda items are requested for the next OpenLineage Technical Steering Committee meeting on August 11 at 10am PT. Reply in thread or ping me with your item(s)!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-08-03 04:16:22
-
-

Hi all, -I am trying out the openlineage spark integration and can't find any column lineage information included with the events. I tried it out with an input dataset where I renamed one of the columns but the columnLineage facet was not present. Can anyone suggest some other examples where it might show up?

- -

Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-03 04:45:36
-
-

*Thread Reply:* @Paweł Leszczyński do we collect column level lineage on renames?

- - - -
-
-
-
- - - - - - - - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-08-05 05:55:12
-
-

*Thread Reply:* I’ve created an issue for column lineage in case of renaming: -https://github.com/OpenLineage/OpenLineage/issues/993

-
- - - - - - - -
-
Labels
- integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-08-08 09:37:43
-
-

*Thread Reply:* Thanks @Paweł Leszczyński!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-08-03 12:58:44
-
-

Hey everyone! I am looking into Fivetran a bit, and it occurs to me that the NAMING.md document does not have an opinion about how to deal with entire systems as datasets. More in 🧵.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-08-03 13:00:22
-
-

*Thread Reply:* Fivetran is a tool that copies data from source systems to target databases. One of these source systems might be SalesForce, for example.

- -

This copying results in thousands of SQL queries run against the target database for each sync. I don’t think each of these queries should map to an OpenLineage job, I think the entire synchronization should. Maybe I’m wrong here.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-08-03 13:01:00
-
-

*Thread Reply:* But if I’m right, that means that there needs to be a way to specify “SalesForce Account #45123452233” as a dataset.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-08-03 13:01:44
-
-

*Thread Reply:* or it ends up just being a job with outputs and no inputs…but that’s not very illuminating

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-08-03 13:02:27
-
-

*Thread Reply:* or is that good enough?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-04 10:31:11
-
-

*Thread Reply:* You are looking at a pretty big topic here 🙂

- -

Basically you're asking what is a job in OpenLineage - and it's not fully answered yet.

- -

I think the discussion is kinda relevant to this proposed facet and I kinda replied there: https://github.com/OpenLineage/OpenLineage/issues/812#issuecomment-1205337556

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-08-04 15:50:22
-
-

*Thread Reply:* my 2 cents on this is that in the Salesforce example, the system is to complex to capture as a single dataset. and so maybe different objects within a salesforce account (org/account/opportunity/etc…) could be treated as individual datasets. But as @Maciej Obuchowski pointed out, this is quite a large topic 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-08-08 13:46:31
-
-

*Thread Reply:* I guess it depends on whether you actually care about the table/column level lineage for an operation like “copy salesforce to snowflake”.

- -

I can see it being a nuisance having all of that on a lineage graph. OTOH, I can see it being useful to know that a datum can be traced back to a specific endpoint at SFDC.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-08-08 13:46:55
-
-

*Thread Reply:* this is a design decision, IMO.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-04 11:30:00
-
-

@channel The next OpenLineage Technical Steering Committee meeting is on Thursday, August 11 at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom -All are welcome! -Agenda:

- -
  1. Announcements
  2. Docs site update
  3. Release 0.11.0 and 0.12.0 overview
  4. Extractors: examples and how to write them
  5. Open discussion -Notes: https://bit.ly/OLwiki -Is there a topic you think the community should discuss at this or a future meeting? Reply or DM me to add items to the agenda. (edited)
  6. -
-
-
Zoom Video
- - - - - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski, Harel Shein, Paul Wilson Villena -
- -
- 👀 Francis McGregor-Macdonald -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Chris Coulthrust - (coulthrust@gmail.com) -
-
2022-08-06 12:06:47
-
-

👋 Hi everyone!

- - - -
- 👋 Jakub Dardziński, Michael Robinson, Ross Turk, Harel Shein, Willy Lulciuc, Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-10 11:00:01
-
-

@channel The next OpenLineage TSC meeting is tomorrow! https://openlineage.slack.com/archives/C01CK9T7HKR/p1659627000308969

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👀 Howard Yoo -
- -
- ❤️ Minkyu Park -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-10 22:34:29
-
-

*Thread Reply:* I am so sad I'm going to miss this month's meeting 😰 Looking forward to the recording!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 06:19:58
-
-

*Thread Reply:* We missed you too @Will Johnson 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Raj Mishra - (hax0755@gmail.com) -
-
2022-08-11 18:50:18
-
-

Hi everyone! I have a REST endpoint that I use for other pipelines that can POST their RunEvent and I forward that to marquez. I'm expecting a JSON which has the RunEvent details, which also has the input or output dataset depending upon the EventType. I can see the Run details always shows up on the marquez UI, but the dataset has issues. I can see the dataset listed but when I can click on it, just shows "something went wrong." I don't see any details of that dataset. -{ - "eventType": "START", - "eventTime": "2022-08-09T19:49:24.201361Z", - "run": { - "runId": "d46e465b-d358-4d32-83d4-df660ff614dd" - }, - "job": { - "namespace": "TEST-NAMESPACE", - "name": "test-job" - }, - "inputs": [ - { - "namespace": "TEST-NAMESPACE", - "name": "my-test-input", - "facets": { - "schema": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client>", - "_schemaURL": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/spec/OpenLineage.json#/definitions/SchemaDatasetFacet>", - "fields": [ - { - "name": "a", - "type": "INTEGER" - }, - { - "name": "b", - "type": "TIMESTAMP" - }, - { - "name": "c", - "type": "INTEGER" - }, - { - "name": "d", - "type": "INTEGER" - } - ] - } - } - } - ], - "producer": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client>" -} -In above payload, the input data set is never created on marquez. I can only see the Run details, but input data set is just empty. Does the input data set needs to created first and then only the RunEvent can be created?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 06:09:57
-
-

*Thread Reply:* From the first look, you're missing outputsfield in your event - this might break something

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 06:10:20
-
-

*Thread Reply:* If not, then Marquez logs might help to see something

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Raj Mishra - (hax0755@gmail.com) -
-
2022-08-12 13:12:56
-
-

*Thread Reply:* Does the START event needs to have an output?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 13:19:24
-
-

*Thread Reply:* It can have empty output 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 13:32:43
-
-

*Thread Reply:* well, in your case you need to send COMPLETE event

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 13:33:44
-
-

*Thread Reply:* Internally, Marquez does not create dataset version until you complete event. It makes sense when your semantics are transactional - you can still read from previous dataset version until it's finished writing.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 13:34:06
-
-

*Thread Reply:* After I send COMPLETE event with the same information I can see the dataset.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Raj Mishra - (hax0755@gmail.com) -
-
2022-08-12 13:56:37
-
-

*Thread Reply:* Thanks for the explanation @Maciej Obuchowski So, if I understand this correct. I won't see the my-test-input dataset till I have the COMPLETE event with input and output?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 14:34:51
-
-

*Thread Reply:* @Raj Mishra Yes and no 🙂

- -

Basically your COMPLETE event does not need to contain any input and output datasets at all - OpenLineage model is cumulative, so it's enough to have datasets on either start or complete. -That also means you can add different datasets in different moment of a run lifecycle - for example, you know inputs, but not outputs, so you emit inputs on START , but not COMPLETE.

- -

Or, the job is modifying the same dataset it reads from (which happens surprisingly often), Then, you want to collect various input metadata from the dataset before modifying it - most likely you won't have them on COMPLETE 🙂

- -

In this example I've added my-test-input on START and my-test-input2 on COMPLETE :

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Raj Mishra - (hax0755@gmail.com) -
-
2022-08-12 14:47:56
-
-

*Thread Reply:* @Maciej Obuchowski Thank you so much! This is great explanation.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-11 20:28:40
-
-

Effectively handling file datasets on server side. We have a common usecase where dataset of type is produced/consumed per day. On the Lineage UI/server side it would be ideal to treat all files of this pattern as 1 dataset Vs 1 dataset per daily file. Any suggestions ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-11 20:35:33
-
-

*Thread Reply:* Would adding support for alias/grouping as a config on OL client side be valuable to other users ? i.e OL client could pass down an Alias/grouping facet Or should this be treated purely a server side feature

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 06:11:21
-
-

*Thread Reply:* Agreed 🙂

- -

How do you produce this dataset? Spark integration? Are you using any system like Apache Iceberg/Delta Lake or just writing raw files?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-12 12:59:48
-
-

*Thread Reply:* these are raw files written from Spark or map reduce jobs. And downstream Spark jobs read these raw files to produce tables

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 13:27:34
-
-

*Thread Reply:* written using Spark dataframe API, like -df.write.format("parquet").save("/tmp/spark_output/parquet") - or RDD?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 13:27:59
-
-

*Thread Reply:* the actual API used matters, because we're handling different cases separately

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-12 13:29:48
-
-

*Thread Reply:* I see. Let me look that up to be absolutely sure

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-12 19:21:41
-
-

*Thread Reply:* It is like. this : df.write.format("parquet").save("/tmp/spark_output/parquet")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-15 12:43:45
-
-

*Thread Reply:* @Maciej Obuchowski curious what you had in mind with respect to RDDs & Dataframes. Also what if we cannot integrate OL with the frameworks that produce this dataset , but only those that consume from the already produced datasets. Is there a way we could still capture the dataset appropriately ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-16 05:30:57
-
-

*Thread Reply:* @Sharanya Santhanam the naming should be consistent between reading and writing, so it wouldn't change much of you can't integrate OL into writers. For the rest, can you create an issue on OL GitHub so someone can pick it up? I'm at vacation now.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-16 15:08:41
-
-

*Thread Reply:* Sounds good , Ty !

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-08-12 06:02:00
-
-

Hi, Minor Suggestion: -This line https://github.com/OpenLineage/OpenLineage/blob/46efab1e7c2a0aa5ebe8d11185fe8d5225[…]/app/src/main/java/io/openlineage/spark/agent/EventEmitter.java is printing variables like api key and other parameters in the logs. Wouldn't it be more appropriate to use log.debug instead? -I'll create an issue if others agree

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 06:09:11
-
-

*Thread Reply:* yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 06:09:32
-
-

*Thread Reply:* please do create 🙂

- - - -
- ✅ Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-08-15 09:01:47
-
-

dumb question but, is it easy to run all the OpenLineage tests locally? ( and if so how? 🙂 )

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-17 13:54:19
-
-

*Thread Reply:* it's per project. -java based: ./gradlew test -python based: https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow#development

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-18 23:45:30
-
-

Spark Integration: The Order of Processing Events in the Async Event Queue

- -

Hey, OpenLineage team, I'm working on a PR (https://github.com/OpenLineage/OpenLineage/pull/849/) that is going to store information given in different spark events (e.g. SparkListenerSQLExecutionStart, SparkListenerJobStart).

- -

However, I want to avoid holding all this data once the execution of the job is complete. As a result, I want to remove the data once I receive a SparkListenerSQLExecutionEnd.

- -

However, can I be guaranteed that the ExecutionEnd event will be processed AFTER the JobStart event? Is it possible that I can take too long to process the the JobStart event that the ExecutionEnd executes prior to the JobStart finishing?

- -

I know we do something similar to this with sparkSqlExecutionRegistry (https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/mai[…]n/java/io/openlineage/spark/agent/OpenLineageSparkListener.java) but do we have any docs to help explain how the AsyncEventQueue orders and consumes events for a listener?

- -

Thank you so much for any insights

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:38:10
-
-

*Thread Reply:* Hey Will! A bunch of folks are on vacation or out this week. Sorry for the delay, I am personally not sure but if it's not too urgent you can have an answer when knowledgable folks are back.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-19 20:21:18
-
-

*Thread Reply:* Hah! No worries, @Julien Le Dem! I can definitely wait for the lucky people who are enjoying the last few weeks of summer unlike the rest of us 😋

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-29 05:31:32
-
-

*Thread Reply:* @Paweł Leszczyński might want to look at that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-08-19 01:53:56
-
-

Hi, -I try to find out if openLineage spark support pyspark (Non-sql) use cases? -Is there any doc I could get more details about non-sql openLineage support? -Thanks a lot

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 12:30:08
-
-

*Thread Reply:* Hello Hanbing, the spark integration works for PySpark since pyspark is wrapped into regular spark operators.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-08-19 13:49:35
-
-

*Thread Reply:* @Julien Le Dem Thanks a lot for your help. I searched around, but I couldn't find any doc introduce how pyspark supported in openLineage. -My company want to integrate with openLineage-spark, I am working on figure out what info does OpenLineage make available for non-sql and does it at least have support for logging the logical plan?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:26:48
-
-

*Thread Reply:* Yes, it does send the logical plan as part of the event

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:27:32
-
-

*Thread Reply:* This configuration here should work as well for pyspark https://openlineage.io/docs/integrations/spark/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:28:11
-
-

*Thread Reply:* --conf "spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:28:26
-
-

*Thread Reply:* you need to add the jar, set the listener and pass your OL config

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:31:11
-
-

*Thread Reply:* Actually I'm demoing this at 27:10 right here 🙂 https://pretalx.com/bbuzz22/talk/FHEHAL/

-
-
pretalx
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:32:11
-
-

*Thread Reply:* you can see the parameters I'm passing to the pyspark command line in the video

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-08-19 18:35:50
-
-

*Thread Reply:* @Julien Le Dem Thanks for the info, Let me take a look at the video now.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:40:10
-
-

*Thread Reply:* The full demo starts at 24:40. It shows lineage connected together in Marquez coming from 3 different sources: Airflow, Spark and a custom integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-22 14:32:53
-
-

Hi everyone, a release has been requested by @Harel Shein. As per our policy here, 3 +1s from committers will authorize an immediate release. Thanks! -Unreleased commits: https://github.com/OpenLineage/OpenLineage/compare/0.12.0...HEAD

-
- - - - - - - - - - - - - - - - -
- - - -
- ➕ Willy Lulciuc, Michael Robinson, Minkyu Park, Jakub Dardziński, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-08-22 14:38:58
-
-

*Thread Reply:* @Michael Robinson can we start posting the “Unreleased” section in the changelog along with the release request? That way, we / the community will know what will be in the upcoming release

- - - -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-22 15:00:37
-
-

*Thread Reply:* The release is approved. Thanks @Willy Lulciuc, @Minkyu Park, @Harel Shein

- - - -
- 🙌 Willy Lulciuc, Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-22 16:18:30
-
-

@channel -OpenLineage 0.13.0 is now available! -We added: -• BigQuery check support -• RUNNING EventType in the spec and Python client -• databases and schemas to SQL extractors -• an event forwarding feature via HTTP -• Azure Cosmos Handler to the Spark integration -• support for OL datasets in manual lineage inputs/outputs -• ownership facets. -We changed: -• use RUNNING EventType in Flink integration for currently running jobs -• convert task object into JSON encodable when creating Airflow version facet. -Thanks to all the contributors who made this release possible! -For the bug fixes and more details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.13.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.12.0...0.13.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/ (edited)

- - - -
- 🎉 Harel Shein, Ross Turk, Jarek Potiuk, Sheeri Cabral (Collibra), Willy Lulciuc, Howard Yoo, Howard Yoo, Ernie Ostic, Francis McGregor-Macdonald -
- -
- ✅ Sheeri Cabral (Collibra), Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-08-23 03:55:24
-
-

*Thread Reply:* Cool! Are the new ownership facets populated by the Airflow integration ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
AMRIT SARKAR - (sarkaramrit2@gmail.com) -
-
2022-08-24 08:23:35
-
-

Hi everyone, excited to work with OpenLineage. I am new to both OpenLineage and Data Lineage in general. Are there working examples/blog posts around actually integrating OpenLineage with existing graph DBs like Neo4J, Neptune etc? (I understand the service layer in between) I understand we have Amundsen with sample open lineage sample data - databuilder/example/sample_data/openlineage/sample_openlineage_events.ndjson. Thanks in advance.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-25 18:15:59
-
-

*Thread Reply:* There is not that I know of besides the Amundsen integration example you pointed at. -A basic idea to do such a thing would be to implement an OpenLineage endpoint (receive the lineage events through http posts) and convert them to a format the graph db understand. If others in the community have ideas, please chime in

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
AMRIT SARKAR - (sarkaramrit2@gmail.com) -
-
2022-09-01 13:48:09
-
-

*Thread Reply:* Understood, thanks a lot Julien. Make sense.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-08-25 17:30:46
-
-

Hey all, can I ask for a release for OpenLineage?

- - - -
- 👍 Harel Shein, Minkyu Park, Michael Robinson, Michael Collado, Ross Turk, Julien Le Dem, Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-08-25 17:32:44
-
-

*Thread Reply:* @Michael Robinson ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-25 17:34:04
-
-

*Thread Reply:* Thanks, Harel. 3 +1s from committers is all we need to make this happen today.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2022-08-25 17:52:40
-
-

*Thread Reply:* 🙏

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-25 18:09:51
-
-

*Thread Reply:* Thanks, all. The release is authorized

- - - -
- 🎉 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-25 18:16:44
-
-

*Thread Reply:* can you also state the main purpose for this release?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-25 18:25:49
-
-

*Thread Reply:* I believe (correct me if wrong, @Harel Shein) that this is to make available a fix of a bug in the compare functionality

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2022-08-25 18:27:53
-
-

*Thread Reply:* ParentRunFacet from the airflow integration is not compliant to OpenLineage spec and this release includes the fix of that so that the marquez can handle parent run/job information.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-25 18:49:30
-
-

@channel -OpenLineage 0.13.1 is now available! -We fixed: -• Rename all parentRun occurrences to parent from Airflow integration #1037 @fm100 -• Do not change task instance during on_running event #1028 @JDarDagran -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.13.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.13.0...0.13.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🎉 Harel Shein, Minkyu Park, Ross Turk, Michael Collado, Howard Yoo -
- -
- ❤️ Minkyu Park, Ross Turk, Howard Yoo -
- -
- 🥳 Minkyu Park, Ross Turk, Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-26 18:58:17
-
-

Hi, I am new to openlineage. Any one know how to enable spark column level lineage? I saw the code comment, it said default is disabled, thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-08-26 19:26:22
-
-

*Thread Reply:* What version of Spark are you using? it should be enabled by default for Spark 3 -https://openlineage.io/docs/integrations/spark/spark_column_lineage

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-26 20:21:12
-
-

*Thread Reply:* Thanks. Good to here that. I am use 0.9.+ . I will try again

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-29 13:14:01
-
-

*Thread Reply:* I tested 0.9.+ 0.12.+ with spark 3.0 and 3.2 version. There still do not have dataset facet columnlineage. This is strange. I saw the column lineage design proposals 148. It should support from 0.9.+ Do I miss something?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-29 13:14:41
-
-

*Thread Reply:* @Harel Shein

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-30 00:56:18
-
-

*Thread Reply:* @Jason it depends on the data source. What sort of data are you trying to read? Is it in a hive metastore? Is it on an S3 bucket? Is it a delta file format?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-30 13:51:03
-
-

*Thread Reply:* I tried read hive megastore on s3 and cave file on local. All are miss the columnlineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-31 00:33:17
-
-

*Thread Reply:* @Jason - Sorry, you'll have to translate a bit for me. Can you share a snippet of code you're using to do the read and write? Is it a special package you need to install or is it just using the hadoop standard for S3? https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-31 20:00:47
-
-

*Thread Reply:* spark.read \ - .option("header", "true") \ - .option("inferschema", "true") \ - .csv("data/input/batch/wikidata.csv") \ - .write \ - .mode('overwrite') \ - .csv("data/output/batch/python-sample.csv")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-31 20:01:21
-
-

*Thread Reply:* This is simple code run on my local for testing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-31 21:41:31
-
-

*Thread Reply:* Which version of OpenLineage are you running? You might look at the code on the main branch. This looks like a HadoopFSRelation which I implemented for column lineage but the latest release (0.13.1) does not include it yet.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-31 21:42:05
-
-

*Thread Reply:* Specifically this commit is what implemented it. -https://github.com/OpenLineage/OpenLineage/commit/ce30178cc81b63b9930be11ac7500ed34808edd3

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-31 22:02:16
-
-

*Thread Reply:* I see. I use 0.13.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-09-01 12:04:41
-
-

*Thread Reply:* @Jason we have our monthly release coming up now, so it should be included in 0.14.0 when released today/tomorrow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-09-01 12:52:52
-
-

*Thread Reply:* Great. Thanks Harel.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Raj Mishra - (hax0755@gmail.com) -
-
2022-08-28 17:46:38
-
-

Hi! I have ran into some issues and wanted to clarify my doubts. -• Why are input schema changes(column delete, new columns) doesn't show up on the UI. I have changed the input schema for the same job, but I'm not seeing getting updated on the UI. -• Why is there only ever 1 input schema version. Every change I make in input schema, I only see output schema has multiple versions but only 1 version for input schema. -• Is there a reason why can't we see the input schema till the COMPLETE event is posted? -I have used the examples from here. https://openlineage.io/getting-started/ -curl -X POST <http://localhost:5000/api/v1/lineage> \ - -H 'Content-Type: application/json' \ - -d '{ - "eventType": "START", - "eventTime": "2020-12-28T19:52:00.001+10:00", - "run": { - "runId": "d46e465b-d358-4d32-83d4-df660ff614dd" - }, - "job": { - "namespace": "my-namespace", - "name": "my-job" - }, - "inputs": [{ - "namespace": "my-namespace", - "name": "my-input" - }], - "producer": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client>" - }' -curl -X POST <http://localhost:5000/api/v1/lineage> \ - -H 'Content-Type: application/json' \ - -d '{ - "eventType": "COMPLETE", - "eventTime": "2020-12-28T20:52:00.001+10:00", - "run": { - "runId": "d46e465b-d358-4d32-83d4-df660ff614dd" - }, - "job": { - "namespace": "my-namespace", - "name": "my-job" - }, - "outputs": [{ - "namespace": "my-namespace", - "name": "my-output", - "facets": { - "schema": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client>", - "_schemaURL": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/spec/OpenLineage.json#/definitions/SchemaDatasetFacet>", - "fields": [ - { "name": "a", "type": "VARCHAR"}, - { "name": "b", "type": "VARCHAR"} - ] - } - } - }], - "producer": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client>" - }' -Changing the inputs schema for START doesn't change the schema input version and doesn't update the UI. -Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-29 05:29:52
-
-

*Thread Reply:* Reading dataset - which input dataset implies - does not mutate the dataset 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-29 05:30:14
-
-

*Thread Reply:* If you change the dataset, it would be represented as some other job with this datasets in the outputs list

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Raj Mishra - (hax0755@gmail.com) -
-
2022-08-29 12:42:55
-
-

*Thread Reply:* So, changing the input dataset will always create new output data versions? Sorry I have trouble understanding this, but if the input is changing, shouldn't the input data set will have different versions?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-01 08:35:42
-
-

*Thread Reply:* @Raj Mishra if input is changing, there should be something else in your data infrastructure that changes this dataset - and it should emit this dataset as output

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-08-29 12:21:52
-
-

Hi Everyone, new here. i went thourhg the docs and examples. cant seem to understand how can i model views on top of base tables if not from a data processing job but rather via modeling something static that is coming from some software internals. i.e. i want to issue the lineage my self rather it will learn it dynamically from some Airflow DAG or spark DAG

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-29 12:35:32
-
-

*Thread Reply:* I think you want to emit raw events using python or java client: https://openlineage.io/docs/client/python

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-29 12:35:46
-
-

*Thread Reply:* (docs in progress 😉)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-08-30 02:07:02
-
-

*Thread Reply:* can you give a hind what should i look for for modeling a dataset on top of other dataset? potentially also map columns?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-08-30 02:12:50
-
-

*Thread Reply:* i can only see that i can have a dataset as input to a job run and not for another dataset

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-01 08:34:35
-
-

*Thread Reply:* Not sure I understand - jobs process input datasets into output datasets. There is always something that can be modeled into a job that consumes input and produces output.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-01 10:30:51
-
-

*Thread Reply:* so openlineage force me to put a job between datasets? does not fit our use case

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-01 10:31:09
-
-

*Thread Reply:* unless we can some how easily hide the process that does that on the graph.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-29 20:41:19
-
-

QQ, I saw that spark Column level lineage start with open lineage 0.9.+ version with spark 3.+, Does it mean it needs to run lower than open lineage 0.9 if our spark is 2.3 or 2.4?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-30 04:44:06
-
-

*Thread Reply:* I don't think it will work for Spark 2.X.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-30 13:42:20
-
-

*Thread Reply:* Is there have plan to support spark 2.x?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-30 14:00:38
-
-

*Thread Reply:* Nope - on the other hand we plan to drop any support for it, as it's unmaintained for quite a bit and vendors are dropping support for it too - afaik Databricks in April 2023.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-30 17:19:43
-
-

*Thread Reply:* I see. Thanks. Amazon Emr still support spark 2.x

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-30 01:15:10
-
-

Spark Integration: Handling Data Source V2 API datasets

- -

Is it expected that a DataSourceV2 relation has a start event with inputs and outputs but a complete event with only outputs? Based on @Michael Collado’s previous comments, I think it's fair to say YES this is expected and we just need to handle it. https://openlineage.slack.com/archives/C01CK9T7HKR/p1645037070719159?thread_ts=1645036515.163189&cid=C01CK9T7HKR

- -

@Hanna Moazam and I noticed this behavior when we looked at the Cosmos Db visitor and then reproduced it for the Iceberg visitor. We traced it down to the fact that the AbstractQueryPlanInputDatasetBuilder (which is the parent of DataSourceV2RelationInputDatasetBuilder) has an isDefinedAt that only includes SparkListenerJobStart and SparkListenerSQLExecutionStart

- -

This means an Iceberg COMPLETE event will NEVER contain inputs because the isDefinedAt will always be false (since COMPLETE only fires for JobEnd and ExecutionEnd events). Does that sound correct (@Paweł Leszczyński)?

- -

It seems that Delta tables (or at least Delta on Databricks) does not follow this same code path and as a result our complete events includes outputs AND inputs.

-
- - -
- - - } - - Michael Collado - (https://openlineage.slack.com/team/U01NNCBCP6K) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-01 05:56:13
-
-

*Thread Reply:* At least for Iceberg I've done it, since I want to emit DatasetVersionDatasetFacet for input dataset only at START - and after I finish writing the dataset might have different version than before writing.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-01 05:58:59
-
-

*Thread Reply:* Same should be for output AFAIK - output version should be emitted only on COMPLETE, since the version changes after I finish writing.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-01 09:52:30
-
-

*Thread Reply:* Ah! Okay, so this still requires us to truly combine START and COMPLETE to get a TOTAL picture of the entire run. Is that fair?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-01 10:30:41
-
-

*Thread Reply:* Yes

- - - -
- 👍 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-01 10:31:21
-
-

*Thread Reply:* As usual, thank you Maciej for the responses and insights!

- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-31 22:19:44
-
-

QQ team, I use spark sql with openlineage namespace weblog: spark.sql(“select ** from weblog where dt=‘1’”).write.orc(“…”) there have two issues 1, there have no upstream dataset weblog on Marquez UI. 2, there have new namespace s3-cdp-prod-hive created. It should the bucket of s3. Am I missing something? Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-09-07 14:13:34
-
-

*Thread Reply:* Anyone can help for it? Does I miss something

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-31 22:21:57
-
-

Here is the Marquez UI

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-01 07:34:24
-
-

Hi everyone, I’m opening up a vote on this month’s OpenLineage release. 3 +1s from committers will authorize. Additions include support for KustoRelationHandler in Kusto (Azure Data Explorer) and for ABFSS and Hadoop Logical Relation, both in the Spark integration. All commits can be found here: https://github.com/OpenLineage/OpenLineage/compare/0.13.1...HEAD. Thanks in advance!

- - - -
- ➕ Maciej Obuchowski, Ross Turk, Paweł Leszczyński, Will Johnson, Hanna Moazam -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-01 13:18:59
-
-

*Thread Reply:* Thanks. The release is authorized. It will be initiated within 2 business days.

- - - -
- 🙌 Will Johnson, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-05 07:57:02
-
-

Is there a reference on how to deploy openlineage on a Non AWS infrastructure ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-08 10:31:44
-
-

*Thread Reply:* Which integration are you looking to implement?

- -

And what environment are you looking to deploy it on? The Cloud? On-Prem?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-08 10:40:11
-
-

*Thread Reply:* We are planning to deploy on premise with Kerberos as authentication for postgres

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-08 11:27:06
-
-

*Thread Reply:* Ah! Are you planning on running Marquez as well and that is your main concern or are you planning on building your own store of OpenLineage Events and using the SQL integration to generate those events?

- -

https://github.com/OpenLineage/OpenLineage/tree/main/integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-08 11:33:44
-
-

*Thread Reply:* I am looking to deploy Marquez on-prem with onprem postgres as back-end with Kerberos authentication.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-08 11:34:32
-
-

*Thread Reply:* Is the the right forum for Marquez as well or there is different slack channel for Marquez available

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-08 11:46:35
-
-

*Thread Reply:* https://bit.ly/MarquezSlack

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-08 11:47:14
-
-

*Thread Reply:* There is another slack channel just for Marquez! That might be a better spot with more dedicated Marquez developers.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-06 15:52:32
-
-

@channel -OpenLineage 0.14.0 is now available! -We added: -• Support ABFSS and Hadoop Logical Relation in Column-level lineage #1008 @wjohnson -• Add Kusto relation visitor #939 @hmoazam -• Add ColumnLevelLineage facet doc #1020 @julienledem -• Include symlinks dataset facet #935 @pawel-big-lebowski -• Add support for dbt 1.3 beta’s metadata changes #1051 @mobuchowski -• Support Flink 1.15 #1009 @mzareba382 -• Add Redshift dialect to the SQL integration #1066 @mobuchowski -We changed: -• Make the timeout configurable in the Spark integration #1050 @tnazarew -We fixed: -• Add a dialect parameter to Great Expectations SQL parser calls #1049 @collado-mike -• Fix Delta 2.1.0 with Spark 3.3.0 #1065 @pawel-big-lebowski -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.14.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.13.1...0.14.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- ❤️ Willy Lulciuc, Howard Yoo, Alexander Wagner, Hanna Moazam, Minkyu Park, Grayson Stream, Paweł Leszczyński, Maciej Obuchowski, Conor Beverland, Jason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-09-06 15:54:30
-
-

*Thread Reply:* Thanks for breaking up the changes in the release! Love the new format 💯

- - - -
- 🙌 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-07 09:05:35
-
-

Hello all, I’m requesting a patch release to fix a bug in the Spark integration. Currently, OpenlineageSparkListener fails when no openlineage.timeout is provided. PR #1069 by @Paweł Leszczyński, merged today, will fix it. As per our policy here, 3 +1s from committers will authorize an immediate release.

- - - -
- ➕ Paweł Leszczyński, Maciej Obuchowski, Howard Yoo, Willy Lulciuc, Ross Turk, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-09-07 10:00:11
-
-

*Thread Reply:* Is PR #1069 all that’s going in 0.14.1 ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-07 10:27:39
-
-

*Thread Reply:* There’s also 1058. 1069 is urgently needed. We can technically wait…

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-07 10:30:31
-
-

*Thread Reply:* (edited prior message because I’m not sure how accurately I was describing the issue)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-09-07 10:39:32
-
-

*Thread Reply:* Thanks for clarifying!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-07 10:50:29
-
-

*Thread Reply:* Thanks, all. The release is authorized.

- - - -
- ❤️ Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-07 11:04:39
-
-

*Thread Reply:* 1058 also fixes some bugs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-08 01:55:41
-
-

Hello all, question: Views on top of base table is also a use case for lineage and there is no job in between. i dont seem to find a way to have a dataset on top of others to represent a view on top of tables. is there a way to do that without a job in between?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-08 04:41:07
-
-

*Thread Reply:* Usually there is something creating the view, for example dbt materialization: https://docs.getdbt.com/docs/building-a-dbt-project/building-models/materializations

- -

Besides that, there is this proposal that did not get enough love yet https://github.com/OpenLineage/OpenLineage/issues/323

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-08 04:53:23
-
-

*Thread Reply:* but we are not working iwth dbt. we try to model lineage of our internal view/tables hirarchy which is related to a propriety application of ours. so we like OpenLineage that lets me explicily model stuff and not only via scanning some DW. but in that case we dont want a job in between.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-08 04:58:47
-
-

*Thread Reply:* this PR does not seem to support lineage between datasets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-08 12:49:48
-
-

*Thread Reply:* This is something core to the OpenLineage design - the lineage relationships are defined as dataset-job-dataset, not dataset-dataset.

- -

In OpenLineage, something observes the lineage relationship being created.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-08 12:50:13
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
- 🙌 Will Johnson, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-08 12:51:15
-
-

*Thread Reply:* It’s a bit different from some other lineage approaches, but OL is intended to be a push model. A job is observed as it runs, metadata is pushed to the backend.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-08 12:54:27
-
-

*Thread Reply:* so in this case, according to openlineage 🙂, the job would be whatever runs within the pipeline that creates the view. very operational point of view.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-11 12:27:42
-
-

*Thread Reply:* but what about the view definition use case? u have lineage of columns in view/base table relation ships

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-11 12:28:05
-
-

*Thread Reply:* how would you model that in OpenLineage? would you create a dummy job ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-11 12:31:57
-
-

*Thread Reply:* would you say that because this is my use case i might better choose some other lineage tool?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-11 12:33:04
-
-

*Thread Reply:* for the context: i am not talking about some view and table definitions in some warehouse e.g. SF but its internal data processing mechanism with propriety view/tables definition (in Flink SQL) and we want to push this metadata for visibility

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-12 17:20:13
-
-

*Thread Reply:* Ah, gotcha. Yeah, I would say it’s probably best to create a job in this case. You can send the view definition using a sourcecodefacet, so it will be collected as well. You’d want to send START and STOP events for it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-12 17:22:03
-
-

*Thread Reply:* regarding the PR linked before, you are right - I wonder if someday the spec should have a way to express “the system was made aware that these datasets are related, but did not observe the relationship being created so it can’t tell you i.e. how long it took or whether it changed over time”

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-09 10:25:21
-
-

@channel -OpenLineage 0.14.1 is now available! -We fixed: -• Fix Spark integration issues including error when no openlineage.timeout #1069 @pawel-big-lebowski -Bug fixes were also included in this release. -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.14.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.14.0...0.14.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Maciej Obuchowski, Willy Lulciuc, Howard Yoo, Francis McGregor-Macdonald, AMRIT SARKAR -
- -
-
-
-
- - - - - -
-
- - - - -
- -
data_fool - (data.fool.me@gmail.com) -
-
2022-09-09 13:52:39
-
-

Hello, any future plans for integrating Airbyte with openlineage?

- - - -
- 👋 Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-09-09 14:01:13
-
-

*Thread Reply:* Hey, @data_fool! Not in the near term. but of course we’d love to see this happen. We’re open to having an Airbyte integration driven by the community. Want to open an issue to start the discussion?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
data_fool - (data.fool.me@gmail.com) -
-
2022-09-09 15:36:20
-
-

*Thread Reply:* hey @Willy Lulciuc, Yep, will open an issue. Thanks!

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hubert Dulay - (hubert.dulay@gmail.com) -
-
2022-09-10 22:00:10
-
-

Hi can you create lineage across namespaces? Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-09-12 19:26:25
-
-

*Thread Reply:* yes!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-26 10:31:56
-
-

*Thread Reply:* Any example or ticket on how to lineage across namespace

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-12 02:27:49
-
-

Hello, Does OpenLineage support column level lineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-12 04:56:13
-
-

*Thread Reply:* Yes https://openlineage.io/blog/column-lineage/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-09-22 02:18:45
-
-

*Thread Reply:* • More details on Spark & Column level lineage integration: https://openlineage.io/docs/integrations/spark/spark_column_lineage -• Proposal on how to implement column level lineage in Marquez (implementation is currently work in progress): https://github.com/MarquezProject/marquez/blob/main/proposals/2045-column-lineage-endpoint.md -@Iftach Schonbaum let us know if you find the information useful.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-12 15:29:12
-
-

where can i find docs on just simply using extractors? without marquez. for example, a basic BashOperator on Airflow 1.10.15

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-12 15:30:08
-
-

*Thread Reply:* or is it automatic for anything that exists in extractors/?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howard.yoo@astronomer.io) -
-
2022-09-12 15:30:16
-
-

*Thread Reply:* Yes

- - - -
- 👍 Paul Lee -
- -
- :gratitude_thank_you: Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-12 15:31:12
-
-

*Thread Reply:* so anything i add to extractors directory with the same name as the operator will automatically extract the metadata from the operator is that correct?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howard.yoo@astronomer.io) -
-
2022-09-12 15:31:31
-
-

*Thread Reply:* Well, not entirely

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howard.yoo@astronomer.io) -
-
2022-09-12 15:31:47
-
-

*Thread Reply:* please take a look at the source code of one of the extractors

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howard.yoo@astronomer.io) -
-
2022-09-12 15:32:13
-
-

*Thread Reply:* also, there are docs available at openlineage.io/docs

- - - -
- 🙏 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-12 15:33:45
-
-

*Thread Reply:* ok, i'll take a look. i think one thing that would be helpful is having a custom setup without marquez. a lot of the docs or videos i found were integrated with marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howard.yoo@astronomer.io) -
-
2022-09-12 15:34:29
-
-

*Thread Reply:* I see. Marquez is a openlineage backend that stores the lineage data, so many examples do need them.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howard.yoo@astronomer.io) -
-
2022-09-12 15:34:47
-
-

*Thread Reply:* If you do not want to run marquez but just test out the openlineage, you can also take a look at OpenLineage Proxy.

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-12 15:35:14
-
-

*Thread Reply:* awesome thanks Howard! i'll take a look at these resources and come back around if i need to

- - - -
- 👍 Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-12 16:01:45
-
-

*Thread Reply:* http://openlineage.io/docs/integrations/airflow/extractor - this is the doc you might want to read

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🎉 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-12 17:08:49
-
-

*Thread Reply:* yeah, saw that doc earlier. thanks @Maciej Obuchowski appreciate it 🙏

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jay - (sanjay.sudhakaran@trovemoney.co.nz) -
-
2022-09-21 20:55:24
-
-

Hey team! I’m pretty new to the field in general

- -

In the real world, I would be running pyspark scripts on AWS EMR. Could you explain to me how the metadata is sent to Marquez from my pyspark script, and where it’s persisted?

- -

Would I need to set up an S3 bucket to store the lineage data?

- -

I’m also unsure about how I would run the Marquez UI on AWS - Would I need to have an EC2 instance running permanently in order to access that UI?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jay - (sanjay.sudhakaran@trovemoney.co.nz) -
-
2022-09-21 20:57:39
-
-

*Thread Reply:* In my head, I have:

- -

Pyspark script -> Store metadata in S3 -> Marquez UI gets data from S3 and displays it

- -

I suspect this is incorrect?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-09-22 02:14:50
-
-

*Thread Reply: It’s more like: you add openlineage jar to Spark job, configure it what to do with the events. Popular options are: - * sent to rest endpoint (like Marquez), - * send as an event onto Kafka, - * print it onto console -There is no S3 in between Spark & Marquez by default. -Marquez serves both as an API where events are sent and UI to investigate them.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jay - (sanjay.sudhakaran@trovemoney.co.nz) -
-
2022-09-22 17:36:10
-
-

*Thread Reply:* Yeah S3 was just an example for a storage option.

- -

I actually found the answer I was looking for, turns out I had to look at Marquez documentation: -https://marquezproject.ai/resources/deployment/

- -

The answer is that Marquez uses a postgres instance to persist the metadata it is given. Thanks for your time though! I appreciate the effort 🙂

- - - -
- 👍 Kevin Adams -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-09-25 17:06:41
-
-

Hello team, -For the OpenLineage Spark, even when I processed one Spark sql query (CTAS Create Table As Select), I will received multiple events back (2+ Start events, 2 Complete events). -I try to understand why OpenLineage need to send back that much events, and what is the primary difference between Start VS Start events, Start VS Complete events? -Do we have any doc can help me understand more on it? -Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-26 00:27:05
-
-

*Thread Reply:* The Spark execution model follows:

- -
  1. Spark SQL Execution Start event
  2. Spark Job Start event
  3. Spark Job End event
  4. Spark SQL Execution End event -As a result, OpenLineage tracks all of those execution and jobs. There is a proposed plan to distinguish between those events (e.g. you wouldn't get two starts but one Start and one Job Start or something like that).
  5. -
- -

You should collect all of these events in order to be sure you are receiving all the data since each event may contain a subset of the complete facets that represent what occurred in the job.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-09-26 15:16:26
-
-

*Thread Reply:* Thanks @Will Johnson -Can I get an example of how the proposed plan can be used to distinguish between start and job start events? -Because I compare the 2 starts events I got, only the event_time is different, all other information are the same.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-09-26 15:30:34
-
-

*Thread Reply:* One followup question, if I process multiple queries in one command, for example (Drop + Create Table + Insert Overwrite), should I expected for -(1). 1 Spark SQL execution start event -(2). 3 Spark job start event (Each query has a job start event ) -(3). 3 Spark job end event (Each query has a job end event ) -(4). 1 Spark SQL execution end event

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-27 10:25:47
-
-

*Thread Reply:* Re: Distinguish between start and job start events. There was a proposal to differentiate the two (https://github.com/OpenLineage/OpenLineage/issues/636) but the current discussion is here: https://github.com/OpenLineage/OpenLineage/issues/599 As it currently stands, there is not a way to tell which one is which (I believe). The design of OpenLineage is such that you should consume ALL events under the same run id and job name / namespace.

- -

Re: Multiple Queries in One Command: This is where Spark's execution model comes into play. I believe each one of those commands are executed sequentially and as a result, you'd actually get three execution start and three execution end. If you chose DROP + Create Table As Select, that would be only two commands and thus only two execution start events.

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-09-27 16:49:37
-
-

*Thread Reply:* Thanks a lot for your help 🙏 @Will Johnson, -For multiple queries in one command, I still have a confused place why Drop + CreateTable and Drop + CreateTableAsSelect act different.

- -

When I test Drop + Create Table -Query: -DROP TABLE IF EXISTS shadow_test.test_sparklineage_4; CREATE TABLE IF NOT EXISTS shadow_test.test_sparklineage_4 (val INT, region STRING) PARTITIONED BY ( ds STRING ) STORED AS PARQUET; -I only received 1 start + 1 complete event -And the events only contains DropTableCommandVisitor/DropTableCommand. -I expected we should also received start and complete events for CreateTable query with CreateTableCommanVisitor/CreateTableComman .

- -

But when I test Drop + Create Table As Select -Query: -DROP TABLE IF EXISTS shadow_test.test_sparklineage_5; CREATE TABLE IF NOT EXISTS shadow_test.test_sparklineage_5 AS SELECT ** from shadow_test.test_sparklineage where ds &gt; '2022-08-24'" -I received 1 start + 1 complete event with DropTableCommandVisitor/DropTableCommand -And 2 start + 2 complete events with CreateHiveTableAsSelectCommandVisitor/CreateHiveTableAsSelectCommand

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-27 22:03:38
-
-

*Thread Reply:* @Hanbing Wang are you running this on Databricks with a hive metastore that is defaulting to Delta by any chance?

- -

I THINK there are some gaps in OpenLineage because of the way Databricks Delta handles things and now there is Unity catalog that is causing some hiccups as well.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-28 09:18:48
-
-

*Thread Reply:* > For multiple queries in one command, I still have a confused place why Drop + CreateTable and Drop + CreateTableAsSelect act different. -@Hanbing Wang That's basically why we capture all the events (SQL Execution, Job) instead of one of them. We're just inconsistently notified of them by Spark.

- -

Some computations emit SQL Execution events, some emit Job events, I think majority emits both. This also differs by spark version.

- -

The solution OpenLineage assumes is having cumulative model of job execution, where your backend deals with possible duplication of information.

- -

> I THINK there are some gaps in OpenLineage because of the way Databricks Delta handles things and now there is Unity catalog that is causing some hiccups as well. -@Will Johnson would be great if you created issue with some complete examples

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-09-28 15:44:45
-
-

*Thread Reply:* @Will Johnson and @Maciej Obuchowski Thanks a lot for your help -We are not running on Databricks. -We implemented the OpenLineage Spark listener, and custom the Event Transport which emitting the events to our own events pipeline with a hive metastore. -We are using Spark version 3.2.1 -OpenLineage version 0.14.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-29 15:16:28
-
-

*Thread Reply:* Ooof! @Hanbing Wang then I'm not certain why you're not receiving the extra event 😞 You may need to run your spark cluster in debug mode to step through the Spark Listener.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-29 15:17:08
-
-

*Thread Reply:* @Maciej Obuchowski - I'll add it to my list!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-09-30 15:34:01
-
-

*Thread Reply:* @Will Johnson Thanks a lot for your help. Let us debug and continue investigating on this issue.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yujia Yang - (yujia@tubi.tv) -
-
2022-09-26 03:46:19
-
-

Hi team, I find Openlineage posts a lot for run events to the backend.

- -

eg. I submit jar to Spark cluster with computations like

- -
  1. count from table1. --> this will have more than one run events inputs:[table1], outputs:[]
  2. count from table2 --> this will have more than one run events inputs:[table2], outputs:[]
  3. write Seq[(t1, count1), (t2, count2)) to table3. --> this may give inputs:[] outputs [table3] -can I just get one post with a summary telling me, inputs:[table1, table2], outputs:[table3] alongside with a merged columnareLineage?
  4. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-09-28 08:34:20
-
-

*Thread Reply:* One of assumptions was to create a stateless integration model where multiple events can be sent for a single job run. This has several advantages like sending events for jobs which suddenly fail, sending events immediately, etc.

- -

The events can be merged then at the backend side. The behavior, you describe, can be then achieved by using backends like Marquez and Marquez API to obtain combined data.

- -

Currently, we’re developing column-lineage dedicated endpoint in Marquez according to the proposal: https://github.com/MarquezProject/marquez/blob/main/proposals/2045-column-lineage-endpoint.md -This will allow you to request whole column lineage graph based on multiple jobs.

-
- - - - - - - - - - - - - - - - -
- - - -
- :gratitude_thank_you: Yujia Yang -
- -
- 👀 Yujia Yang -
- -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-28 09:47:55
-
-

Is there a provision to include additional MDC properties as part of openlineage ? -Or something like sparkSession.sparkContext().setLocalProperties("key","value")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-09-29 14:30:37
-
-

*Thread Reply:* Hello @srutikanta hota, could you elaborate a bit on your use case? I'm not sure what you are trying to achieve. Possibly @Paweł Leszczyński will know.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-29 15:24:26
-
-

*Thread Reply:* @srutikanta hota - Not sure what MDC properties stands for but you might take inspiration from the DatabricksEnvironmentHandler Facet Builder: https://github.com/OpenLineage/OpenLineage/blob/65a5f021a1ba3035d5198e759587737a05[…]ark/agent/facets/builder/DatabricksEnvironmentFacetBuilder.java

- -

You can create a facet that could extract out the properties that you might set from within the spark session.

- -

I don't think OpenLineage / a Spark Listener can affect the SparkSession itself so you wouldn't be able to SET the properties in the listener.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-30 04:56:25
-
-

*Thread Reply:* Many thanks for the details. My usecase is simple, I like to default the sparkgroupjob Id as openlineage parent runid if there is no parent run Id set. -sc.setJobGroup("myjobgroupid", "job description goes here") -This set the value in spark as -setLocalProperty(SparkContext.SPARKJOBGROUPID, group_id)

- -

I like to use myjobgroup_id as openlineage parent run id

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-09-30 05:01:08
-
-

*Thread Reply:* MDC is an ability to add extra key -> value pairs to a log entry, while not doing this within message body. So the question here is (I believe): how to add custom entries / custom facets to OpenLineage events?

- -

@srutikanta hota What information would you like to include? There is great chance we already have some fields for that. If not it’s still worth putting in in write place like: is this info job specific, run specific or relates to some of input / output datasets?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-30 05:04:34
-
-

*Thread Reply:* @srutikanta hota sounds like you want to set up -spark.openlineage.parentJobName -spark.openlineage.parentRunId -https://openlineage.io/docs/integrations/spark/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-30 05:15:18
-
-

*Thread Reply:* @… we are having a long-running spark context(the context may run for a week) where we submit jobs. Settings the parentrunid at beginning won't help. We are submitting the job with sparkgroupid. I like to use the group Id as parentRunId

- -

https://spark.apache.org/docs/1.6.1/api/R/setJobGroup.html

- - - -
- 🤔 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Trevor Swan - (trevor.swan@matillion.com) -
-
2022-09-29 13:59:20
-
-

Hi team - I am from Matillion and we would like to build support for openlineage. Who would be best placed to move the conversation with my product team?

- - - -
- 🙌 Will Johnson, Maciej Obuchowski, Francis McGregor-Macdonald -
- -
- 🎉 Michael Robinson -
- -
- 👍 Ernie Ostic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-09-29 14:22:06
-
-

*Thread Reply:* Hi Trevor, thank you for reaching out. I’d be happy to discuss with you how we can help you support OpenLineage. Let me send you an email.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jarek Potiuk - (jarek@potiuk.com) -
-
2022-09-29 15:58:35
-
-

cccccbctlvggfhvrcdlbbvtgeuredtbdjrdfttbnldcb

- - - -
- 🐈 Julien Le Dem, Jakub Dardziński, Maciej Obuchowski, Paweł Leszczyński -
- -
- 🐈‍⬛ Julien Le Dem, Maciej Obuchowski, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Petr Hajek - (petr.hajek@profinit.eu) -
-
2022-09-30 02:52:51
-
-

Hi Everyone! Would anybody be interested in participation in MANTA Open Lineage connector testing? We are specially looking for an environment with rich Airflow implementation but we will be happy to test on any other OL Producer technology. Send me a direct message for more information. Thanks, Petr

- - - -
- 🙌 Michael Robinson, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-09-30 14:34:45
-
-

Question about Apache Airflow that I think folks here would know, because doing a web search has failed me:

- -

Is there a way to interact with Apache Airflow to retrieve the contents of the files in the sql directory, but NOT to run them?

- -

(the APIs all seem to run sql, and when I search I just get “how to use the airflow API to run queries”)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-30 14:38:34
-
-

*Thread Reply:* Is this in the context of an OpenLineage extractor?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-09-30 14:40:47
-
-

*Thread Reply:* Yes! I was specifically looking at the PostgresOperator

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-09-30 14:41:54
-
-

*Thread Reply:* (as Snowflake lineage can be retrieved from their internal ACCESS_HISTORY tables, we wouldn’t need to use Airflow’s SnowflakeOperator to get lineage, we’d use the method on the openlineage blog)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-30 14:43:08
-
-

*Thread Reply:* The extractor for the SQL operators gets the query like this: -https://github.com/OpenLineage/OpenLineage/blob/45fda47d8ef29dd6d25103bb491fb8c443[…]gration/airflow/openlineage/airflow/extractors/sql_extractor.py

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-30 14:43:48
-
-

*Thread Reply:* let me see if I can find the corresponding part of the Airflow API docs...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-09-30 14:45:00
-
-

*Thread Reply:* aha! I’m not so far behind the times, it was only put in during July https://github.com/OpenLineage/OpenLineage/pull/907

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-30 14:47:28
-
-

*Thread Reply:* Hm. The PostgresOperator seems to extend BaseOperator directly: -https://github.com/apache/airflow/blob/029ebacd9cbbb5e307a03530bdaf111c2c3d4f51/airflow/providers/postgres/operators/postgres.py#L58

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-09-30 14:48:01
-
-

*Thread Reply:* yeah 😞 I couldn’t find a way to make that work as an end-user.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-30 14:48:08
-
-

*Thread Reply:* perhaps that can't be assumed for all operators that deal with SQL. I know that @Maciej Obuchowski has spent a lot of time on this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-30 14:49:14
-
-

*Thread Reply:* I don't know enough about the airflow internals 😞

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-09-30 14:50:00
-
-

*Thread Reply:* No worries. In case it saves you work, I also had a look at https://github.com/apache/airflow/blob/029ebacd9cbbb5e307a03530bdaf111c2c3d4f51/airflow/providers/common/sql/operators/sql.py - which also extends BaseOperator but not with a way to just get the SQL.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2022-09-30 15:22:24
-
-

*Thread Reply:* that's more of an Airflow question indeed. As far as I understand you need to read file with SQL statement within Airflow Operator and do something but run the query (like pass as an XCom)? SQLExtractors we have get same SQL that operators render and uses it to extract additional information like table schema straight from database

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-09-30 14:36:18
-
-

(I’m also ok with a way to get the SQL that has been run - but from Airflow, not the data source - I’m looking for a db-neutral way to do this, otherwise I can just parse query logs on any specific db system)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-30 18:45:09
-
-

👋 are there any docs on how the listener hooks in and gets run with openlineage-airflow? trying to write some unit tests but no docs seem to exist on the flow.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-09-30 19:06:47
-
-

*Thread Reply:* There's a design doc linked from the PR: https://github.com/apache/airflow/pull/20443 -https://docs.google.com/document/d/1L3xfdlWVUrdnFXng1Di4nMQYQtzMfhvvWDR9K4wXnDU/edit

-
- - - - - - - -
-
Labels
- area:scheduler/executor, area:dev-tools, area:plugins, type:new-feature, full tests needed -
- -
-
Comments
- 3 -
- - - - - - - - - - -
- - - -
- 👀 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-30 19:18:47
-
-

*Thread Reply:* amazing thank you I will take a look

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-03 11:32:52
-
-

@channel -Hello everyone, I’m opening up a vote on releasing OpenLineage 0.15.0, including -• an improved development experience in the Airflow integration -• updated proposal and integration templates -• a change to the BigQuery client in the Airflow integration -• plus bug fixes across the project. -3 +1s from committers will authorize an immediate release. For all the commits, see: https://github.com/OpenLineage/OpenLineage/compare/0.14.0...HEAD. Note: this will be the last release to support Airflow 1.x! -Thanks!

- - - -
- 🎉 Paul Lee, Howard Yoo, Minkyu Park, Michael Collado, Paweł Leszczyński, Maciej Obuchowski, Harel Shein -
- -
- 👍 Michael Collado, Julien Le Dem, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-03 11:33:30
-
-

*Thread Reply:* Hey @Michael Robinson. Removal of Airflow 1.x support is planned for next release after 0.15.0

- - - -
- 👍 Jakub Dardziński, Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-03 11:37:03
-
-

*Thread Reply:* 0.15.0 would be the last release supporting Airflow 1.x

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-03 11:37:07
-
-

*Thread Reply:* just caught this myself. I’ll make the change

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 11:40:33
-
-

*Thread Reply:* we’re still on 1.10.15 at the moment so i guess our team would have to rely on <=0.15.0?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-03 11:49:47
-
-

*Thread Reply:* Is this something you want to continue doing or do you want to migrate relatively soon?

- -

We want to remove 1.10 integration because for multiple PRs, maintaining compatibility with it takes a lot of time; the code is littered with checks like this. -if parse_version(AIRFLOW_VERSION) &gt;= parse_version("2.0.0"):

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 12:03:40
-
-

*Thread Reply:* hey Maciej, we do have plans to migrate in the coming months but for right now we need to stay on 1.10.15.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-04 09:39:11
-
-

*Thread Reply:* Thanks, all. The release is authorized, and you can expect it by Thursday.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 17:56:08
-
-

👋 what would be a possible reason for the built in airflow backend being utilized instead of a custom wrapper over airflow.lineage.Backend ? double checked the [lineage] key in our airflow.cfg

- -

there doesn't seem to be any errors being thrown and the object loads 🤔

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 17:56:36
-
-

*Thread Reply:* running airflow 2.3.4 with openlineage-airflow 0.14.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-03 18:03:03
-
-

*Thread Reply:* if you're talking about LineageBackend, it is used in Airflow 2.1-2.2. It did not have functionality where you can be notified on task start or failure, so we wanted to expand the functionality: https://github.com/apache/airflow/issues/17984

- -

Consensus of Airflow maintainers wasn't positive about changing this interface, so we went with another direction: https://github.com/apache/airflow/pull/20443

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-03 18:06:58
-
-

*Thread Reply:* Why nothing happens? https://github.com/OpenLineage/OpenLineage/blob/895160423643398348154a87e0682c3ab5c8704b/integration/airflow/openlineage/lineage_backend/__init__.py#L91

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 18:30:32
-
-

*Thread Reply:* ah hmm ok, i will double check. i commented that part out so technically it should run but maybe i missed something

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 18:30:42
-
-

*Thread Reply:* thank you for your fast response @Maciej Obuchowski ! i appreciate it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 18:31:13
-
-

*Thread Reply:* it seems like it doesn't use my custom wrapper but instead uses the openlineage implementation.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 20:11:15
-
-

*Thread Reply:* @Maciej Obuchowski ok, after checking we are emitting events with our custom backend but an odd thing is an attempt is always made with the openlineage backend. is there something obvious i am perhaps missing 🤔

- -

ends up with requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url immediately after task start. but by the end on task success/failure it emits the event with our custom backend both RunState.COMPLETE and RunState.START into our own pipeline.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-04 06:19:06
-
-

*Thread Reply:* If you're on 2.3 and trying to use some wrapped LineageBackend, what I think is happening is OpenLineagePlugin that automatically registers via setup.py entrypoint https://github.com/OpenLineage/OpenLineage/blob/65a5f021a1ba3035d5198e759587737a05b242e1/integration/airflow/openlineage/airflow/plugin.py#L30

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-04 06:23:48
-
-

*Thread Reply:* I think if you want to extend it with proprietary code there are two good options.

- -

First, if your code only needs to touch HTTP client side - which I guess is the case due to 401 error - then you can create custom Transport.

- -

Second, is that you fork OL code and create your own package, without entrypoint script or with adding your own if you decide to extend OpenLineagePlugin instead of LineageBackend

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-04 14:23:33
-
-

*Thread Reply:* amazing thank you for your help. i will take a look

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-04 14:49:47
-
-

*Thread Reply:* @Maciej Obuchowski is there a way to extend the plugin like how we can wrap the custom backend with 2.2? or would it be necessary to fork it.

- -

we're trying to not fork and instead opt with extending.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-05 04:55:05
-
-

*Thread Reply:* I think it's best to fork, since it's getting loaded by Airflow as an entrypoint: https://github.com/OpenLineage/OpenLineage/blob/133110300e8ea4e42e3640608cfed459683d5a8d/integration/airflow/setup.py#L70

-
- - - - - - - - - - - - - - - - -
- - - -
- 🙏 Paul Lee -
- -
- :gratitude_thank_you: Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-05 13:29:24
-
-

*Thread Reply:* got it. and in terms of the openlineage.yml and defining a custom transport is there a way i can define where openlineage-python should look for the custom transport? e.g. different path

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-05 13:30:04
-
-

*Thread Reply:* because from the docs i. can't tell except for the file i'm supposed to copy and implement.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-05 14:18:19
-
-

*Thread Reply:* @Paul Lee you should derive from Transport base class and register type as full python import path to your custom transport, for example https://github.com/OpenLineage/OpenLineage/blob/f8533266491acea2159f602f782a99a4f8a82cca/client/python/tests/openlineage.yml#L2

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-05 14:20:48
-
-

*Thread Reply:* your custom transport should have also define custom class Config , and this class should implement from_dict method

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-05 14:20:56
-
-

*Thread Reply:* the whole process is here: https://github.com/OpenLineage/OpenLineage/blob/a62484ec14359a985d283c639ac7e8b9cfc54c2e/client/python/openlineage/client/transport/factory.py#L47

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-05 14:21:09
-
-

*Thread Reply:* and I know we need to document this better 🙂

- - - -
- 🙏 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-05 15:35:31
-
-

*Thread Reply:* amazing, thanks for all your help 🙂 +1 to the docs, if i have some time when done i will push up some docs to document what i've done

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-05 15:50:29
-
-

*Thread Reply:* https://github.com/openlineage/docs/ - let me know and I'll review 🙂

-
- - - - - - - -
-
Website
- <https://openlineage.io/docs> -
- -
-
Stars
- 4 -
- - - - - - - - -
- - - -
- 🎉 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-04 12:39:59
-
-

@channel -Hi everyone, opening a vote on a release (0.15.1) to add #1131 to fix the release process on CI. 3 +1s from committers will authorize an immediate release. Thanks. More details are here: -https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
- ✅ Michael Collado, Maciej Obuchowski, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-04 14:25:49
-
-

*Thread Reply:* Thanks, all. The release is authorized.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-05 10:46:46
-
-

@channel -OpenLineage 0.15.1 is now available! -We added: -• Airflow: improve development experience #1101 @JDarDagran -• Documentation: update issue templates for proposal & add new integration template #1116 @rossturk -• Spark: add description for URL parameters in readme, change overwriteName to appName #1130 @tnazarew -We changed: -• Airflow: lazy load BigQuery client #1119 @mobuchowski -Many bug fixes were also included in this release. -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.15.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.14.1...0.15.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Maciej Obuchowski, Jakub Dardziński, Howard Yoo, Harel Shein, Paul Lee, Paweł Leszczyński -
- -
- 🎉 Howard Yoo, Harel Shein, Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-06 07:35:00
-
-

Is there a topic you think the community should discuss at the next OpenLineage TSC meeting? Reply or DM with your item, and we’ll add it to the agenda.

- - - -
- 🌟 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-06 13:29:30
-
-

*Thread Reply:* would love to add improvement in docs :) for newcomers

- - - -
- 👏 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-06 13:31:07
-
-

*Thread Reply:* also, what’s TSC?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-06 15:20:23
-
-

*Thread Reply:* Technical Steering Committee, but it’s open to everyone

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-06 15:20:45
-
-

*Thread Reply:* and we encourage newcomers to attend

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-06 13:49:00
-
-

has anyone seen their COMPLETE/FAILED listeners not firing on Airflow 2.3.4 but START events do emit? using openlineage-airflow 0.14.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2022-10-06 14:39:27
-
-

*Thread Reply:* is there any error/warn message logged maybe?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-06 14:40:53
-
-

*Thread Reply:* none that i'm seeing on our workers. i do see that our custom http transport is being utilized on START.

- -

but on SUCCESS nothing fires.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-06 14:41:21
-
-

*Thread Reply:* which makes me believe the listeners themselves aren't being utilized? 🤔

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2022-10-06 16:37:54
-
-

*Thread Reply:* uhm, any chance you're experiencing this with custom extractors?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2022-10-06 16:38:13
-
-

*Thread Reply:* I'd be happy to jump on a quick call if you wish

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2022-10-06 16:38:40
-
-

*Thread Reply:* but in more EU friendly hours 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-07 16:19:47
-
-

*Thread Reply:* no custom extractors, its usingt he base extractor. a call would be 👍. let me look at my calendar and EU hours.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-06 15:23:27
-
-

@channel The next OpenLineage Technical Steering Committee meeting is on Thursday, October 13 at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom -All are welcome! -Agenda:

- -
  1. Announcements
  2. Recent Release 0.15.1
  3. Project roadmap review
  4. Open discussion -Notes: https://bit.ly/OLwiki -Is there a topic you think the community should discuss at this or a future meeting? Reply or DM me to add items to the agenda.
  5. -
- - - -
- 🙌 Paul Lee, Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Srinivasa Raghavan - (gsrinir@gmail.com) -
-
2022-10-07 06:52:42
-
-

hello all. I am trying to run the airflow example from here -I changed the Marquez web port from 5000 to 15000 but when I start the docker images, it seems to always default to port 5000 and therefore when I go to localhost:3000, the jobs don't load up as they are not able to connect to the marquez app running in 15000. I've overriden the values in docker-compose.yml and in openLineage.env but it seems to be picking up the 5000 value from some other location. -This is what I see in the logs. Any pointers on this or please redirect me to the appropriate channel. Thanks! -INFO [2022-10-07 10:48:58,022] org.eclipse.jetty.server.AbstractConnector: Started application@782fd504{HTTP/1.1, (http/1.1)}{0.0.0.0:5000} -INFO [2022-10-07 10:48:58,034] org.eclipse.jetty.server.AbstractConnector: Started admin@1537c744{HTTP/1.1, (http/1.1)}{0.0.0.0:5001}

- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Srinivasa Raghavan - (gsrinir@gmail.com) -
-
2022-10-20 05:11:09
-
-

*Thread Reply:* Apparently the value is hard coded in the code somewhere that I couldn't figure out but at-least learnt that in my Mac where this port 5000 is being held up can be freed by following the below simple step.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-10-10 18:00:17
-
-

Hi #general - @Will Johnson and I are working on adding support for Snowflake to OL, and as we were going to specify the package under the compileOnly dependencies in gradle, we had some doubts looking at the existing dependencies. Taking bigQuery as an example - we see it's included as a dependency in both the shared build.gradle file, and in the app build.gradle file. We're a bit confused about the following:

- -
  1. Why do we need to have the bigQuery package in shared's dependencies? App of course contains the bigQueryNodeVisitor but we couldn't spot where it's being used within shared.
  2. For all the dependencies in the shared gradle file, the versions for Scala and Spark are fixed (Scala 2.11, Spark 2.4.8), whereas for app, the versionsMap allows for different combinations of spark and scala versions. Why is this so?
  3. How do the dependencies between app and shared interact? Does one or the other take precedence for which version of the bigQuery connector is compiled? -We'd appreciate any guidance!
  4. -
- -

Thank you in advance!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-10-11 03:47:31
-
-

*Thread Reply:* Hi @Hanna Moazam,

- -

Within recent PR https://github.com/OpenLineage/OpenLineage/pull/1111, I removed BigQuery dependencies from spark2, spark32 and spark3 subprojects. It has to stay in sharedbecause of BigQueryNodeVisitor. The usage of BigQueryNodeVisitor is tricky as we never know if bigquery classes are available on runtime or not. The check is done in io.openlineage.spark.agent.lifecycle.BaseVisitorFactory -if (BigQueryNodeVisitor.hasBigQueryClasses()) { - list.add(new BigQueryNodeVisitor(context, factory)); - } -Regarding point 2, there were some Spark versions which allowed two Scala versions (2.11 and 2.12). Then it makes sense to make it configurable. On the other hand, for Spark 3.2 we only support 2.12 which is hardcoded in build.gradle.

- -

The idea of app project is let's create a separate project to aggregate all the dependecies and run integration tests on it . Subprojects spark2, spark3, etc. do depend on shared . Putting integration tests in shared would create additional opposite-way dependency, which we wanted to avoid.

-
- - - - - - - -
-
Labels
- bug, documentation, integration/spark, integration/bigquery -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-10-11 09:20:44
-
-

*Thread Reply:* So, if we wanted to add Snowflake, we would need to:

- -
  1. Pick a version of snowflake's spark library
  2. Pick a version of scala that we target (i.e. we are only going to support Snowflake in Spark 3.2 so scala 2.12 will be hard coded)
  3. Add the visitor code to Shared
  4. Add the dependencies to app (ONLY if there is an integration test in app?? This is the confusing part still)
  5. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-10-12 03:51:54
-
-

*Thread Reply:* Yes. Please note that snowflake library will not be included in target OpenLineage jar. So you may test it manually against multiple Snowflake library versions or even adjust code in case of minor differences.

- - - -
- 👍 Hanna Moazam, Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-10-12 05:20:17
-
-

*Thread Reply:* Thank you Pawel!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-12 12:18:16
-
-

*Thread Reply:* Basically the same pattern you've already done with Kusto 😉 -https://github.com/OpenLineage/OpenLineage/blob/a96ecdabe66567151e7739e25cd9dd03d6[…]va/io/openlineage/spark/agent/lifecycle/BaseVisitorFactory.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-10-12 12:26:35
-
-

*Thread Reply:* We actually used only reflection for Kusto and were hoping to do it the 'better' way with the package itself for snowflake - if it's possible :)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Akash r - (akashrn25@gmail.com) -
-
2022-10-11 02:04:28
-
-

Hi Community,

- -

I was going through the code of dbt integration with Open lineage, Once the events has been emitted from client code , I wanted to check the server code where the events are read and the lineage is formed. Where can I find that code ?

- -

Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-11 05:03:26
-
-

*Thread Reply:* Reference implementation of OpenLineage consumer is Marquez: https://github.com/MarquezProject/marquez

-
- - - - - - - -
-
Website
- <https://marquezproject.ai> -
- -
-
Stars
- 1187 -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-12 11:59:55
-
-

This month’s OpenLineage TSC meeting is tomorrow at 10 am PT! https://openlineage.slack.com/archives/C01CK9T7HKR/p1665084207602369

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-10-13 12:05:17
-
-

Is there anyone in the Open Lineage community in San Diego? I’ll be there Nov 1-3 and would love to meet some of y’all in person

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-20 13:49:39
-
-

👋 is there a way to define a base extractor to be defaulted to? for example, i'd like to have all our operators (50+) default to my custom base extractor instead of having a list of 50+ operators in get_operator_classnames

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howard.yoo@astronomer.io) -
-
2022-10-20 13:53:55
-
-

I don't think that's possible yet, as the extractor checks are based on the class name... and it wouldn't check which parent operator has it inherited from.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-20 14:05:38
-
-

😢 ok, i would contribute upstream but unfortunately we're still on 1.10.15. looking like we might have to hardcode for a bit.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-20 14:06:01
-
-

is this the correct assumption? we're still on 0.14.1 ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-20 14:33:49
-
-

If you'll move to 2.x series and OpenLineage 0.16, you could use this feature: https://github.com/OpenLineage/OpenLineage/pull/1162

-
- - - - - - - -
-
Labels
- integration/airflow, extractor -
- - - - - - - - - - -
- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-20 14:46:36
-
-

thanks @Maciej Obuchowski we're working on it. hoping we'll land on 2.3.4 in the coming month.

- - - -
- 🔥 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Austin Poulton - (austin.poulton@equalexperts.com) -
-
2022-10-26 05:31:07
-
-

👋 Hi everyone!

- - - -
- 👋 Jakub Dardziński, Maciej Obuchowski, Michael Robinson, Ross Turk, Willy Lulciuc, Paweł Leszczyński, Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-10-26 15:22:22
-
-

*Thread Reply:* Hey @Austin Poulton, welcome! 👋

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Austin Poulton - (austin.poulton@equalexperts.com) -
-
2022-10-31 06:09:41
-
-

*Thread Reply:* thanks Harel 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-01 09:44:18
-
-

@channel -Hi everyone, I’m opening a vote to release OpenLineage 0.16.0, featuring: -• support for boolean arguments in the DefaultExtractor -• a more efficient get_connection_uri method in the Airflow integration -• a reorganized, Rust-based SQL integration (easing the addition of language interfaces in the future) -• bug fixes and more. -3 +1s from committers will authorize an immediate release. Thanks. More details are here: -https://github.com/OpenLineage/OpenLineage/compare/0.15.1...HEAD

- - - -
- 🙌 Howard Yoo, Paweł Leszczyński, Maciej Obuchowski -
- -
- 👍 Ross Turk, Paweł Leszczyński, Maciej Obuchowski -
- -
- ➕ Willy Lulciuc, Mandy Chessell, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-01 13:37:54
-
-

*Thread Reply:* Thanks, all! The release is authorized. We will initiate it within 48 hours.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-11-02 08:45:20
-
-

Anybody with a success use-case of ingesting column-level lineage into amundsen?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-02 09:19:43
-
-

*Thread Reply:* I think amundsen-openlineage dataloader precedes column-level lineage in OL by a bit, so I doubt this works

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-11-02 15:54:31
-
-

*Thread Reply:* do you want to open up an issue for it @Iftach Schonbaum?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-02 12:36:22
-
-

Hi everyone, you might notice Dependabot opening PRs to update dependencies now that it’s been configured and turned on (https://github.com/OpenLineage/OpenLineage/pull/1182). There will probably be a large number of PRs to start with, but this shouldn’t always be the case and we can change the tool’s behavior, as well. (Some background: this will help us earn the OSSF Silver badge for the project, which will help us advance in the LFAI.)

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-03 07:53:31
-
-

@channel -I’m opening a vote to release OpenLineage 0.16.1 to fix an issue in the SQL integration. This release will also include all the commits announced for 0.16.0. -3 +1s from committers will authorize an immediate release. Thanks.

-
- - - - - - - -
-
Labels
- integration/sql -
- - - - - - - - - - -
- - - -
- ➕ Maciej Obuchowski, Hanna Moazam, Jakub Dardziński, Ross Turk, Paweł Leszczyński, Jarek Potiuk, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-03 12:25:29
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated shortly.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-03 13:46:58
-
-

@channel -OpenLineage 0.16.1 is now available, featuring: -Additions: -• Airflow: add dag_run information to Airflow version run facet #1133 @fm100 -• Airflow: add LoggingMixin to extractors #1149 @JDarDagran -• Airflow: add default extractor #1162 @mobuchowski -• Airflow: add on_complete argument in DefaultExtractor #1188 @JDarDagran -• SQL: reorganize the library into multiple packages #1167 @StarostaGit @mobuchowski -Changes: -• Airflow: move get_connection_uri as extractor’s classmethod #1169 @JDarDagran -• Airflow: change get_openlineage_facets_on_start/complete behavior #1201 @JDarDagran -Bug fixes and more! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.16.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.15.1...0.16.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Maciej Obuchowski, Francis McGregor-Macdonald, Eric Veleker -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Phil Chen - (phil@gpr.com) -
-
2022-11-03 13:59:29
-
-

Are there any tutorial and documentation how to create an Openlinage connector. For example, what if we Argo workflow instead of Apache airflow for orchestrating ETL jobs? How are we going to create Openlinage Argo workflow connector? How much efforts, roughly? And can people contribute such connectors to the community if they create one?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-04 06:34:27
-
-

*Thread Reply:* > Are there any tutorial and documentation how to create an Openlinage connector. -We have somewhat of a start of a doc: -https://openlineage.io/docs/development/developing/

- -

Here we have an example of using Python OL client to emit OL events: https://openlineage.io/docs/client/python#start-docker-and-marquez

- -

> How much efforts, roughly? -I'm not familiar with Argo workflows, but usually the effort needed depends on extensibility of the underlying system. From the first look, Argo looks like it has sufficient mechanisms for that: https://argoproj.github.io/argo-workflows/executor_plugins/#examples-and-community-contributed-plugins

- -

Then, it depends if you can get the information that you need in that plugin. Basic need is to have information from which datasets the workflow/job is reading and to which datasets it's writing.

- -

> And can people contribute such connectors to the community if they create one? -Definitely! And if you need help with anything OpenLineage feel free to write here on Slack

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-03 17:57:37
-
-

Is there a topic you think the community should discuss at the next OpenLineage TSC meeting? Reply or DM with your item, and we’ll add it to the agenda.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-03 18:03:18
-
-

@channel -This month’s OpenLineage TSC meeting is next Thursday, November 10th, at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom. All are welcome! -On the tentative agenda:

- -
  1. Recent release overview [Michael R.]
  2. Update on LFAI & Data Foundation progress [Michael R.]
  3. Proposal: Defining “implementing OpenLineage” [Julien]
  4. Update from MANTA on their OpenLineage integration [Eric and/or Petr from MANTA]
  5. Linking CMF (a common ML metadata framework) and OpenLineage [Suparna and AnnMary from HP Enterprise]
  6. Open discussion
  7. -
- - - -
- 👍 Luca Soato, Maciej Obuchowski, Paul Lee, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Kenton (swiple.io) - (kknoxparton@gmail.com) -
-
2022-11-08 04:47:41
-
-

Hi all 👋 I’m Kenton — a Software Engineer and founder of Swiple. I’m looking forward to working with OpenLineage and its community to integrate data lineage and data observability. -https://swiple.io

-
-
swiple.io
- - - - - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski, Jakub Dardziński, Michael Robinson, Ross Turk, John Thomas, Julien Le Dem, Willy Lulciuc, Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-11-08 10:22:15
-
-

*Thread Reply:* Welcome Kenton! Happy to help 👍

- - - -
- 👍 Kenton (swiple.io) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Deepika Prabha - (deepikaprabha@gmail.com) -
-
2022-11-08 05:35:03
-
-

Hi everyone, -We wanted to pass some dynamic metadata from spark job that we can catch up in OpenLineage event and use it for processing. Presently I have seen that we have few conf parameters like openlineage params that we can send only with Spark conf. Is there any other option we have where we can send some information dynamically from the spark jobs.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-08 10:06:10
-
-

*Thread Reply:* What kind of data? My first feeling is that you need to extend the Spark integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Deepika Prabha - (deepikaprabha@gmail.com) -
-
2022-11-09 00:35:29
-
-

*Thread Reply:* Yes, we wanted to add information like user/job description that we can use later with rest of openlineage event fields in our system

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Deepika Prabha - (deepikaprabha@gmail.com) -
-
2022-11-09 00:41:35
-
-

*Thread Reply:* I can see in this PR https://github.com/OpenLineage/OpenLineage/pull/490 that env values can be captured which we can use to add some custom metadata but it seems it is specific to Databricks only.

-
- - - - - - - -
-
Comments
- 8 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-09 05:14:50
-
-

*Thread Reply:* I think it makes sense to have something like that, but generic, if you want to contribute it

- - - -
- 👍 Will Johnson, Deepika Prabha -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-11-14 03:28:35
-
-

*Thread Reply:* @Maciej Obuchowski Do you mean adding something like -spark.openlineage.jobFacet.FacetName.Key=Value to the spark conf should add a new job facet like -"FacetName": { - "Key": "Value" -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-14 05:56:02
-
-

*Thread Reply:* We can argue about name of that key, but yes, something like that. Just notice that while it's possible to attach something to run and job facets directly, it would be much harder to do this with datasets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
slackbot - -
-
2022-11-09 11:15:49
-
-

This message was deleted.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-11-10 02:22:18
-
-

*Thread Reply:* Hi @Varun Singh, what version of openlineage-spark where you using? Are you able to copy lineage event here?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-09 12:31:10
-
-

@channel -This month’s TSC meeting is tomorrow at 10 am PT! https://openlineage.slack.com/archives/C01CK9T7HKR/p1667512998061829

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 💥 Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-11-11 11:32:54
-
-

Hi #general, quick question: do we plan to disable spark 2 support in the near future?

- -

Longer question: -I've recently made a PR (https://github.com/OpenLineage/OpenLineage/pull/1231) to support capturing lineage from Snowflake, but it fails at a specific integration test due to what we think is a dependency mismatch for guava. I've tried to exclude any transient dependencies which may cause the problem but no luck with that so far.

- -

Just wondering if:

- -
  1. It makes sense to spend more time trying to ensure that test passes? Especially if we plan to remove spark 2 support soon.
  2. Assuming we do want to make sure to pass the test, does anyone have any other ideas for where to look/modify to prevent the error? -Here's the test failure message: -```io.openlineage.spark.agent.lifecycle.LibraryTest testRdd(SparkSession) FAILED (16s)
  3. -
- -

java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat - at io.openlineage.spark.agent.lifecycle.LibraryTest.testRdd(LibraryTest.java:113) ``` -Thanks in advance!

-
- - - - - - - -
-
Labels
- documentation, integration/spark, spec -
- -
-
Comments
- 4 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-11 16:28:07
-
-

*Thread Reply:* What if we just not include it in the BaseVisitorFactory but only in the Spark3 visitor factories?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-11-11 14:52:19
-
-

quick question: how do i get the &lt;&lt;non-serializable Time...to show in the extraction? or really any object that gets passed in.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-11 16:24:30
-
-

*Thread Reply:* You might look here: https://github.com/OpenLineage/OpenLineage/blob/f7049c599a0b1416408860427f0759624326677d/client/python/openlineage/client/serde.py#L51

-
- - - - - - - - - - - - - - - - -
- - - -
- :gratitude_thank_you: Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-11-14 01:12:45
-
-

Is there a way I can update the detaset description and the column description. While generating the open lineage spark events and columns

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-11-15 02:09:25
-
-

*Thread Reply:* I don’t think this is possible at the moment.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2022-11-15 15:47:49
-
-

Hey all, I'd like to ask for a release for OpenLineage. #1256 fixes bug in DefaultExtractor. This blocks people from migrating code from custom extractors to get_openlineage_facets methods.

- - - -
- ➕ Michael Robinson, Howard Yoo, Maciej Obuchowski, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-16 09:13:17
-
-

*Thread Reply:* Thanks, all. The release is authorized.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-16 10:41:07
-
-

*Thread Reply:* The PR for the changelog updates: https://github.com/OpenLineage/OpenLineage/pull/1306

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-11-16 03:34:01
-
-

Hi, small question: Is it possible to disable the /api/{version}/lineage suffix that gets added to every url automatically? Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-16 12:27:12
-
-

*Thread Reply:* I think we had similar request before, but nothing was implemented.

- - - -
- 👍 Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-16 12:23:54
-
-

@channel -OpenLineage 0.17.0 is now available, featuring: -Additions: -• Spark: support latest Spark 3.3.1 #1183 @pawel-big-lebowski -• Spark: add Kinesis Transport and support config Kinesis in Spark integration #1200 @yogyang -• Spark: disable specified facets #1271 @pawel-big-lebowski -• Python: add facets implementation to Python client #1233 @pawel-big-lebowski -• SQL: add Rust parser interface #1172 @StarostaGit @mobuchowski -• Proxy: add helm chart for the proxy backend #1068 @wslulciuc -• Spec: include possible facets usage in spec #1249 @pawel-big-lebowski -• Website: publish YML version of spec to website #1300 @rossturk -• Docs: update language on nominating new committers #1270 @rossturk -Changes: -• Website: publish spec into new website repo location #1295 @rossturk -• Airflow: change how pip installs packages in tox environments #1302 @JDarDagran -Removals: -• Deprecate HttpTransport.Builder in favor of HttpConfig #1287 @collado-mike -Bug fixes and more! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.17.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.16.1...0.17.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Howard Yoo, Maciej Obuchowski, Ross Turk, Aphra Bloomfield, Harel Shein, Kengo Seki, Paweł Leszczyński, pankaj koti, Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Diego Cesar - (dcesar@krakenrobotik.de) -
-
2022-11-18 05:40:53
-
-

Hi everyone,

- -

I'm trying to get the lineage of a dataset per version. I initially had something like

- -

Dataset A -&gt; Dataset B -&gt; DataSet C (version 1)

- -

then:

- -

Dataset D -&gt; Dataset E -&gt; DataSet C (version 2)

- -

I can get the graph for version 2 without problems, but I'm wondering if there's any way to retrieve the entire graph for DataSet C version 1.

- -

Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-22 13:40:44
-
-

*Thread Reply:* It's kind of a hard problem UI side. Backend can express that relationship

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Diego Cesar - (dcesar@krakenrobotik.de) -
-
2022-11-22 13:48:58
-
-

*Thread Reply:* Thanks for replying. Could you please point me to the API that allows me to do that? I've been calling GET /lineage with dataset in the node ID, e g., nodeId=dataset:my_dataset . Where could I specify the version of my dataset?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-11-18 17:55:24
-
-

👋 how do we get the actual values from macros? e.g. a schema name is passed in with {{params.table_name}} and thats what shows in lineage instead of the actual table name

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2022-11-19 04:54:13
-
-

*Thread Reply:* Templated fields are rendered before generating lineage data. Do you have some sample code or logs preferrably?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-22 13:40:11
-
-

*Thread Reply:* If you're on 1.10 then I think it won't work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-11-28 12:50:39
-
-

*Thread Reply:* @Maciej Obuchowski we are still on airflow 1.10.15 unfortunately.

- -

cc. @Eli Schachar @Allison Suarez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-11-28 12:50:49
-
-

*Thread Reply:* is there no workaround we can make work?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-11-28 12:51:01
-
-

*Thread Reply:* @Jakub Dardziński is this for airflow versions 2.0+?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-11-21 07:07:10
-
-

Hey, quick question: I see there is Kafka transport in the java client, but it's not supported in the spark integration, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-21 07:28:04
- -
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-11-22 13:03:41
-
-

How can we auto instrument a dataset owner at Java agent level? Is there any spark property available?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-11-22 16:47:37
-
-

Is there a way if we are running a job with business day as yesterday to capture the information. Just think if I am running yesterday missing job today. Or Friday's file on Monday as we received file late from vendor etc..

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-22 18:45:48
-
-

*Thread Reply:* I think that's what NominalTimeFacet covers

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-24 09:15:45
-
-

hello Team, i wanna to use data lineage using airflow but not getting understand from docs please let me know if someone have pretty docs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-11-28 10:29:58
-
-

*Thread Reply:* Hey @Rahul Sharma, what version of Airflow are you running?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 10:30:14
-
-

*Thread Reply:* i am using airflow 2.x

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 10:30:27
-
-

*Thread Reply:* can we connect if you have time ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-11-28 11:11:58
-
-

*Thread Reply:* did you see these docs before? https://openlineage.io/integration/apache-airflow/#airflow-20

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 11:12:22
-
-

*Thread Reply:* yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 11:12:36
-
-

*Thread Reply:* i already set configuration in airflow.cfg file

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-11-28 11:12:57
-
-

*Thread Reply:* where are you sending the events to?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 11:13:24
-
-

*Thread Reply:* i have a docker machine on which marquez is working

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-11-28 11:13:47
-
-

*Thread Reply:* so, what is the issue you are seeing?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 11:15:37
-
-

*Thread Reply:* there is no error

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 11:16:01
-
-

*Thread Reply:* ```[lineage]

- -

what lineage backend to use

- -

backend =openlineage.lineage_backend.OpenLineageBackend

- -

MARQUEZ_URL=http://10.36.37.178:3000

- -

MARQUEZ_NAMESPACE=airflow

- -

MARQUEZBACKEND=HTTP -MARQUEZURL=http://10.36.37.178:5000

- -

MARQUEZAPIKEY=[YOURAPIKEY]

- -

MARQUEZ_NAMESPACE=airflow```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 11:16:09
-
-

*Thread Reply:* above config i have set

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 11:16:22
-
-

*Thread Reply:* please let me know any other thing need to do

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed Nabil H - (m.nabil.hafez@gmail.com) -
-
2022-11-24 14:02:27
-
-

hey i wonder if somebody can link me to the lineage ( table lineage ) event schema ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-11-25 02:20:40
-
-

*Thread Reply:* please have a look at openapi definition of the event: https://openlineage.io/apidocs/openapi/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Murali Krishna - (vmurali.krishnaraju@genpact.com) -
-
2022-11-30 02:34:51
-
-

Hello Team, I am from Genpact Data Analytics team, we are looking for demo of your product

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-11-30 14:10:10
-
-

*Thread Reply:* hey, I'll DM you.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-12-01 15:00:28
-
-

Hello all, I’m calling for a vote on releasing OpenLineage 0.18.0, including: -• improvements to the Spark integration, -• extractors for Sagemaker operators and SFTPOperator in the Airflow integration, -• a change to the Databricks integration to support Databricks Runtime 11.3, -• new governance docs, -• bug fixes, -• and more. -Three +1s from committers will authorize an immediate release.

- - - -
- ➕ Maciej Obuchowski, Will Johnson, Bramha Aelem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-12-06 13:56:17
-
-

*Thread Reply:* Thanks, all. The release is authorized will be initiated within two business days.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-12-01 15:11:10
-
-

@channel -This month’s OpenLineage TSC meeting is next Thursday, December 8th, at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom. All are welcome! -On the tentative agenda:

- -
  1. an overview of the new Rust implementation of the SQL integration
  2. a pesentation/discussion of what it actually means to “implement” OpenLineage
  3. open discussion.
  4. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Scott Anderson - (scott.anderson@alteryx.com) -
-
2022-12-02 13:57:07
-
-

Hello everyone! General question here, aside from ‘consumer’ orgs/integrations (dbt/dagster/manta), is anyone aware of any enterprise organizations that are leveraging OpenLineage today? Example lighthouse brands?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-12-02 15:21:20
-
-

*Thread Reply:* Microsoft https://openlineage.io/blog/openlineage-microsoft-purview/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-12-05 13:54:06
-
-

*Thread Reply:* I think we can share that we have over 2,000 installs of that Microsoft solution accelerator using OpenLineage.

- -

That means we have thousands of companies having experimented with OpenLineage and Microsoft Purview.

- -

We can't name any customers at this point unfortunately.

- - - -
- 🎉 Conor Beverland, Kengo Seki -
- -
- 👍 Scott Anderson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-12-07 12:03:06
-
-

@channel -This month’s TSC meeting is tomorrow at 10 am PT. All are welcome! https://openlineage.slack.com/archives/C01CK9T7HKR/p1669925470878699

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-12-07 14:22:58
-
-

*Thread Reply:* For open discussion, I'd like to ask the team for an overview of how the different gradle files are working together for the Spark implementation. I'm terribly confused on where dependencies need to be added (whether it's in shared, app, or a spark version specific folder). Maybe @Maciej Obuchowski...?

- - - -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-12-07 14:25:12
-
-

*Thread Reply:* Unfortunately I'll be unable to attend the meeting @Will Johnson 😞

- - - -
- 😭 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-12-08 13:03:08
-
-

*Thread Reply:* This is starting now. CC @Will Johnson

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-12-09 19:24:15
-
-

*Thread Reply:* @Will Johnson Check the notes and the recording. @Michael Collado did a pass at explaining the relationship between shared, app and the versions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-12-09 19:24:30
-
-

*Thread Reply:* feel free to follow up here as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-12-09 19:39:37
-
-

*Thread Reply:* ascii art to the rescue! (top “depends on” bottom)

- -
              /   \
-             / / \ \
-            / /   \ \
-           / /     \ \
-          / /       \ \
-         / |         | \
-        /  |         |  \
-       /   |         |   \
-      /    |         |    \
-     /     |         |     \
-    /      |         |      \
-   /       |         |       \
-spark2   spark3   spark32   spark33
-   \        |        |       /
-    \       |        |      /
-     \      |        |     /
-      \     |        |    /
-       \    |        |   /
-        \   |        |  /
-         \  |        | /
-          \ |       / /
-           \ \     / /
-            \ \   / /
-             \ \ / /
-              \   /
-               \ /
-             share
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-12-09 19:40:05
-
-

*Thread Reply:* 😍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-12-09 19:41:13
-
-

*Thread Reply:* (btw, we should have written datakin to output ascii art; it’s obviously the superior way to generate graphs 😜)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-14 05:18:53
-
-

*Thread Reply:* Hi, is there a recording for this meeting?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Christian Lundgren - (christian@lunit.io) -
-
2022-12-07 20:33:19
-
-

Hi! I have a basic question about the naming conventions for blob storage. The spec is not totally clear to me. Is the convention to use (1) namespace=bucket name=bucket+path or (2) namespace=bucket name=path?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-12-07 22:05:25
-
-

*Thread Reply:* The namespace is the bucket and the dataset name is the path. Is there a blob storage provider in particular you are thinking of?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Christian Lundgren - (christian@lunit.io) -
-
2022-12-07 23:13:41
-
-

*Thread Reply:* Thanks, that makes sense. We use GCS, so it is already covered by the naming conventions documented. I was just not sure if I was understanding the document correctly or not.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-12-07 23:34:33
-
-

*Thread Reply:* No problem. Let us know if you have suggestions on the wording to make the doc clearer

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-12-08 11:44:49
-
-

@channel -OpenLineage 0.18.0 is available now, featuring: -• Airflow: support SQLExecuteQueryOperator #1379 @JDarDagran -• Airflow: introduce a new extractor for SFTPOperator #1263 @sekikn -• Airflow: add Sagemaker extractors #1136 @fhoda -• Airflow: add S3 extractor for Airflow operators #1166 @fhoda -• Spec: add spec file for ExternalQueryRunFacet #1262 @howardyoo -• Docs: add a TSC doc #1303 @merobi-hub -• Plus bug fixes. -Thanks to all our contributors, including new contributor @Faisal Hoda! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.18.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.17.0...0.18.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🚀 Willy Lulciuc, Minkyu Park, Kengo Seki, Enrico Rotundo, Faisal Hoda -
- -
- 🙌 Howard Yoo, Minkyu Park, Kengo Seki, Enrico Rotundo, Faisal Hoda -
- -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-12-09 01:42:59
-
-

1) Is there a specifications to capture dataset dependency. ds1 is dependent on ds2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-12-09 11:51:16
-
-

*Thread Reply:* Dataset dependencies are represented through common relationship with a Job - e.g., the task that performed the transformation.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-12-11 09:01:19
-
-

*Thread Reply:* Is it possible to populate table level dependency without any transformation using open lineage specifications? Like to define dataset 1 is dependent of table 1 and table 2 which can be represented as separate datasets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-12-13 15:24:20
-
-

*Thread Reply:* Not explicitly, in today's spec. The guiding principle is that something created that dependency, and the dependency changes over time in a way that is important to study.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-12-13 15:25:12
-
-

*Thread Reply:* I say this to explain why it is the way it is - but the spec can change over time to serve new uses cases, certainly!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-14 05:18:10
-
-

Hi everyone, I'd like to use openlineage to capture column level lineage for spark. I would also like to capture a few custom environment variables along with the column lineage. May I know how this can be done? Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-12-14 09:56:22
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, you could start with column-lineage & spark workshop available here -> https://github.com/OpenLineage/workshops/tree/main/spark

- - - -
- ❤️ Ricardo Gaspar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-14 10:05:54
-
-

*Thread Reply:* Hi @Paweł Leszczyński Thanks for the link! But this does not really answer the concern.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-14 10:06:08
-
-

*Thread Reply:* I am already able to capture column lineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-14 10:06:33
-
-

*Thread Reply:* What I would like is to capture some extra environment variables, and send it to the server along with the lineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-12-14 11:22:59
-
-

*Thread Reply:* i remember we already have a facet for that: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/main/java/io/openlineage/spark/agent/facets/EnvironmentFacet.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-12-14 11:24:07
-
-

*Thread Reply:* but it is only used at the moment to capture some databricks environment attributes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-12-14 11:28:29
-
-

*Thread Reply:* so you can contribute to project and add a feature which adds specified/al environment variables to lineage event.

- -

you can also have a look at extending section of spark integration docs (https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#extending) and create a class thats add run facet builder according to your needs.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-12-14 11:29:28
-
-

*Thread Reply:* the third way is to create an issue related to this bcz being able to send selected/all environment variables in OL event seems to be really cool feature.

- - - -
- 👍 Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-14 21:49:19
-
-

*Thread Reply:* That is great! Thank you so much! This really helps!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-15 01:44:42
-
-

*Thread Reply:* List&lt;String&gt; dbPropertiesKeys = - Arrays.asList( - "orgId", - "spark.databricks.clusterUsageTags.clusterOwnerOrgId", - "spark.databricks.notebook.path", - "spark.databricks.job.type", - "spark.databricks.job.id", - "spark.databricks.job.runId", - "user", - "userId", - "spark.databricks.clusterUsageTags.clusterName", - "spark.databricks.clusterUsageTags.azureSubscriptionId"); - dbPropertiesKeys.stream() - .forEach( - (p) -&gt; { - dbProperties.put(p, jobStart.properties().getProperty(p)); - }); -It seems like it is obtaining these env variable information from the jobStart obj, but not capturing from the env directly?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-15 01:57:05
-
-

*Thread Reply:* I have opened an issue in the community here: https://github.com/OpenLineage/OpenLineage/issues/1419

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-01 02:24:39
-
-

*Thread Reply:* Hi @Paweł Leszczyński I have opened a PR for helping to add this use case. Please do help to see if we can merge it in. Thanks! -https://github.com/OpenLineage/OpenLineage/pull/1545

-
- - - - - - - -
-
Labels
- integration/spark -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- 👀 Maciej Obuchowski, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-02 11:45:52
-
-

*Thread Reply:* Hey @Anirudh Shrinivason, sorry for late reply, but I reviewed the PR.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-06 03:06:42
-
-

*Thread Reply:* Hey thanks a lot! I have made the requested changes! Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-06 03:06:49
-
-

*Thread Reply:* @Maciej Obuchowski ^ 🙂

- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-06 09:09:34
-
-

*Thread Reply:* Hey @Anirudh Shrinivason, took a look at it but it unfortunately fails integration tests (throws NPE), can you take a look again?

- -

23/02/06 12:18:39 ERROR AsyncEventQueue: Listener OpenLineageSparkListener threw an exception - java.lang.NullPointerException - at io.openlineage.spark.agent.EventEmitter.&lt;init&gt;(EventEmitter.java:39) - at io.openlineage.spark.agent.OpenLineageSparkListener.initializeContextFactoryIfNotInitialized(OpenLineageSparkListener.java:276) - at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:80) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1433) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-07 04:17:02
-
-

*Thread Reply:* Hi yeah my bad. It should be fixed in the latest push. But I think the tests are not running in the CI because of some GCP environment issue? I am not really sure how to fix it...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-07 04:18:46
-
-

*Thread Reply:* I can make them run, it's just that running them on forks is disabled. We need to make it more clear I suppose

- - - -
- 👍 Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-07 04:24:38
-
-

*Thread Reply:* Ahh I see thanks! Also, some of the tests are failing on my local, such as https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/test/java/io/openlineage/spark/agent/lifecycle/DeltaDataSourceTest.java. Is this expected behaviour?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-07 07:20:11
-
-

*Thread Reply:* tests failing isn't expected behaviour 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-08 03:37:23
-
-

*Thread Reply:* Ahh yeap it was a local ide issue on my side. I added some tests to verify the presence of env variables too.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-08 03:47:22
-
-

*Thread Reply:* @Anirudh Shrinivason let me know then when you'll push fixed version, I can run full tests then

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-08 03:49:35
-
-

*Thread Reply:* I have pushed just now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-08 03:49:39
-
-

*Thread Reply:* You can run the tests

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-08 04:13:07
-
-

*Thread Reply:* @Maciej Obuchowski mb I pushed again rn. Missed out a closing bracket.

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-10 00:47:04
-
-

*Thread Reply:* @Maciej Obuchowski Hi, could we merge this PR in? I'd like to see if we can have these changes in the new release...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2022-12-15 17:14:02
-
-

Hi All- I am sending lineage from ADF for each activity which i am performing. But the individual activities are representing correctly. How can I represent task1 as a parent to task2. can someone please share the sample json request for it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-12-16 13:29:44
-
-

*Thread Reply:* Hi 👋 this would require a series of JSON calls:

- -
  1. start the first task
  2. end the first task, specify output dataset
  3. start the second task, specify input dataset
  4. end the second task
  5. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-12-16 13:32:08
-
-

*Thread Reply:* in OpenLineage relationships are typically Job -> Dataset -> Job, so -• you create a relationship between datasets by referring to them in the same job - i.e., this task ran that read from these datasets and wrote to those datasets -• you create a relationship between tasks by referring to the same datasets across both of them - i.e., this task wrote that dataset and this other task read from it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-12-16 13:35:06
-
-

*Thread Reply:* @Bramha Aelem if you look in this directory, you can find example start/complete JSON calls that show how to specify input/output datasets.

- -

(it’s an airflow workshop, but those examples are for a part of the workshop that doesn’t involve airflow)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-12-16 13:35:46
-
-

*Thread Reply:* (these can also be found in the docs)

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2022-12-16 14:49:30
-
-

*Thread Reply:* @Ross Turk - Thanks for the details. will try and get back to you on it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2022-12-17 19:53:21
-
-

*Thread Reply:* @Ross Turk - Good Evening, It worked as expected. I am able to replicate the scenarios which I am looking for.

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2022-12-17 19:53:48
-
-

*Thread Reply:* @Ross Turk - Thanks for your response.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-01-12 13:23:56
-
-

*Thread Reply:* @Ross Turk - First activity : I am making HTTP Call to pull the lookup data and store it in ADLS. -Second Activity : After the completion of first activity I am making Azure databricks call to use the lookup file and generate the output tables. How I can refer the databricks generated tables facets as an input to the subsequent activities in the pipeline. -When I refer it's as an input the spark tables metadata is not showing up. How can this be achievable. -After the execution of each activity in ADF Pipeline I am sending start and complete/fail event lineage to Marquez.

- -

Can someone please guide me on this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2022-12-15 17:19:34
-
-

I am not using airflow in my Process. pls suggest

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2022-12-19 12:40:26
-
-

Hi All - Good Morning, how the column lineage of data source when it ran by different teams and jobs in openlineage.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Al (Koii) - (al@koii.network) -
-
2022-12-20 14:26:57
-
-

Hey folks! I'm al from Koii.network, very happy to have heard about this project :)

- - - -
- 👋 Willy Lulciuc, Maciej Obuchowski, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-12-20 14:27:59
-
-

*Thread Reply:* welcome! let’s us know if you have any questions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Matt Menzenski - (matt@payitgov.com) -
-
2022-12-29 08:22:26
-
-

Hello! I found the OpenLineage project today after searching for “OpenTelemetry” in the dbt Slack.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-12-29 10:47:00
-
-

*Thread Reply:* Hey Matt! Happy to have you here! Feel free to reach out if you have any questions

- - - -
- :gratitude_thank_you: Matt Menzenski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Max - (maxime.broussard@gmail.com) -
-
2022-12-30 05:33:40
-
-

Hi guys - I am really excited to test open lineage. -I had a quick question, sorry if this is not the right place for it. -We are testing dbt-ol with airflow and I was hoping this would by default push the number of rows updated/created in that dbt transformation to marquez. -It runs fine on airflow, but when I check in marquez there doesn't seem to be a 'dataset' created, only 'jobs' with job level metadata. -When i check here I see that the dataset facets should have it though https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md -Does anyone know if creating a dataset & sending row counts to OL is out of the box on dbt-ol or if I need to build another script to get that number from my snowflake instance and push it to OL as another step in my process? -Thanks a lot!

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Viraj Parekh - (vmpvmp94@gmail.com) -
-
2023-01-03 13:20:14
-
-

*Thread Reply:* @Ross Turk maybe you can help with this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-03 13:34:23
-
-

*Thread Reply:* hmm, I believe the dbt-ol integration does capture bytes/rows, but only for some data sources: https://github.com/OpenLineage/OpenLineage/blob/6ae1fd5665d5fd539b05d044f9b6fb831ce9d475/integration/common/openlineage/common/provider/dbt.py#L567

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-03 13:34:58
-
-

*Thread Reply:* I haven't personally tried it with Snowflake in a few versions, but the code suggests that it's one of them.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-03 13:35:42
-
-

*Thread Reply:* @Max you say your dbt-ol run is resulting in only jobs and no datasets emitted, is that correct?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-03 13:38:06
-
-

*Thread Reply:* if so, I'd say something rather strange is going on because in my experience each model should result in a Job and a Dataset.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kuldeep - (kuldeep.marathe@affirm.com) -
-
2023-01-03 00:41:09
-
-

Hi All, Curious to see if there is an openlineage integration with luigi or any open source projects working on it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kuldeep - (kuldeep.marathe@affirm.com) -
-
2023-01-03 01:53:10
-
-

*Thread Reply:* I was looking for something similar to the airflow integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Viraj Parekh - (vmpvmp94@gmail.com) -
-
2023-01-03 13:21:18
-
-

*Thread Reply:* hey @Kuldeep - i don't think there's something for Luigi right now - is that something you'd potentially be interested in?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kuldeep - (kuldeep.marathe@affirm.com) -
-
2023-01-03 13:23:53
-
-

*Thread Reply:* @Viraj Parekh Yes this is something we are interested in! There are a lot of projects out there that use luigi

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-03 11:05:48
-
-

Hello all, I’m opening a vote to release OpenLineage 0.19.0, including: -• new extractors for Trino and S3FileTransformOperator in the Airflow integration -• a new, standardized run facet in the Airflow integration -• a new NominalTimeRunFacet and OwnershipJobFacet in the Airflow integration -• Postgres support in the dbt integration -• a new client-side proxy (skeletal version) -• a new, improved mechanism for passing conf parameters to the OpenLineage client in the Spark integration -• a new ExtractionErrorRunFacet to reflect internal processing errors for the SQL parser -• testing improvements, bug fixes and more. -As always, three +1s from committers will authorize an immediate release. Thanks in advance!

- - - -
- ➕ Willy Lulciuc, Maciej Obuchowski, Paweł Leszczyński, Jakub Dardziński, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-01-03 23:07:59
-
-

*Thread Reply:* Hi @Michael Robinson a new, improved mechanism for passing conf parameters to the OpenLineage client in the Spark integration -Would it be possible to have more details on what this entails please? Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-04 09:21:46
-
-

*Thread Reply:* @Tomasz Nazarewicz might explain this better

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2023-01-04 10:04:22
-
-

*Thread Reply:* @Anirudh Shrinivason until now If you wanted to add new property to OL client, you had to also implement it in the integration because it had to parse all properties, create appropriate objects etc. New implementation makes client properties transparent to integration, they are only passed through and parsing happens inside the client.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-04 13:02:39
-
-

*Thread Reply:* Thanks, all. The release is authorized and will commence shortly 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-01-04 22:00:55
-
-

*Thread Reply:* @Tomasz Nazarewicz Ahh I see. Okay thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-05 10:37:09
-
-

@channel -This month’s OpenLineage TSC meeting is next Thursday, January 12th, at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom. All are welcome! -On the tentative agenda:

- -
  1. Recent release overview @Michael Robinson
  2. Column lineage update @Maciej Obuchowski
  3. Airflow integration improvements @Jakub Dardziński
  4. Discussions: -• Real-world implementation of OpenLineage (What does it really mean?) @Sheeri Cabral (Collibra) -• Using namespaces @Michael Robinson
  5. Open discussion -Notes: https://bit.ly/OLwiki -Is there a topic you think the community should discuss at this or a future meeting? Reply or DM me to add items to the agenda.
  6. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-01-05 23:45:38
-
-

*Thread Reply:* @Michael Robinson Will there be a recording?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-06 09:10:50
-
-

*Thread Reply:* @Anirudh Shrinivason Yes, and the recording will be here: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-05 13:00:01
-
-

OpenLineage 0.19.2 is available now, including: -• Airflow: add Trino extractor #1288 @sekikn -• Airflow: add S3FileTransformOperator extractor #1450 @sekikn -• Airflow: add standardized run facet #1413 @JDarDagran -• Airflow: add NominalTimeRunFacet and OwnershipJobFacet #1410 @JDarDagran -• dbt: add support for postgres datasources #1417 @julienledem -• Proxy: add client-side proxy (skeletal version) #1439 #1420 @fm100 -• Proxy: add CI job to publish Docker image #1086 @wslulciuc -• SQL: add ExtractionErrorRunFacet #1442 @mobuchowski -• SQL: add column-level lineage to SQL parser #1432 #1461 @mobuchowski @StarostaGit -• Spark: pass config parameters to the OL client #1383 @tnazarew -• Plus bug fixes and testing and CI improvements. -Thanks to all the contributors, including new contributor Saurabh (@versaurabh) -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.19.2 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.18.0...0.19.2 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- ❤️ Julien Le Dem, Howard Yoo, Willy Lulciuc, Maciej Obuchowski, Kengo Seki, Harel Shein, Jarek Potiuk, Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-01-06 01:07:18
-
-

Question on Spark Integration and External Hive Metastores

- -

@Hanna Moazam and I are working with a team using OpenLineage and wants to extract out the server name of the hive metastore they're using when writing to a Hive table through Spark.

- -

For example, the hive metastore is an Azure SQL database and the table name is sales.transactions.

- -

OpenLineage will give something like /usr/hive/warehouse/sales.db/transactions for the name.

- -

However, this is not a complete picture since sales.db/transactions is defined like this for a given hive metastore. In Hive, you'd define the fully qualified name as sales.transactions@sqlservername.database.windows.net .

- -

Has anyone else come across this before? If not, we plan on raising an issue and suggesting we extract out the spark.hadoop.javax.jdo.option.ConnectionURL in the DatabricksEnvironmentFacetBuilder but ideally there would be a better way of extracting this.

- -

https://learn.microsoft.com/en-us/azure/databricks/data/metastores/external-hive-metastore#set-up-an-external-metastore-using-the-ui

- -

There was an issue by @Maciej Obuchowski or @Paweł Leszczyński that talked about providing a facet of the alias of a path but I can't find it at this point :(

-
-
learn.microsoft.com
- - - - - - - - - - - - - - - - - -
- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-01-09 02:28:43
-
-

*Thread Reply:* Hi @Hanna Moazam, we've written Jupyter notebook to demo dataset symlinks feature: -https://github.com/OpenLineage/workshops/blob/main/spark/dataset_symlinks.ipynb

- -

For scenario you describe, there should be symlink facet sent similar to: -{ - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.15.1/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet>", - "identifiers": [ - { - "namespace": "<hive://metastore>", - "name": "default.some_table", - "type": "TABLE" - } - ] -} -Within Openlineage Spark integration code, symlinks are included here: -https://github.com/OpenLineage/OpenLineage/blob/0.19.2/integration/spark/shared/src/main/java/io/openlineage/spark/agent/util/PathUtils.java#L75

- -

and they are added only when spark catalog is hive and metastore URI in spark conf is present.

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
- ➕ Maciej Obuchowski -
- -
- 🤯 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-01-09 14:21:10
-
-

*Thread Reply:* This is so awesome, @Paweł Leszczyński - Thank you so much for sharing this! I'm wondering if we could extend this to capture the hive JDBC Connection URL. I will explore this and put in an issue and PR to try and extend it. Thank you for the insights!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-11 12:00:02
-
-

@channel -Friendly reminder: this month’s OpenLineage TSC meeting is tomorrow at 10am, and all are welcome. https://openlineage.slack.com/archives/C01CK9T7HKR/p1672933029317449

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski, Will Johnson, John Bagnall, AnnMary Justine, Willy Lulciuc, Minkyu Park, Paweł Leszczyński, Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2023-01-12 06:37:56
-
-

Hi, are there any plans to add an Azure EventHub transport similar to the Kinesis one?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-01-12 17:31:12
-
-

*Thread Reply:* @Varun Singh why not just use the KafkaTransport and the Event Hub's Kafka endpoint?

- -

https://github.com/yogyang/OpenLineage/blob/2b7fa8bbd19a2207d54756e79aea7a542bf7bb[…]/main/java/io/openlineage/client/transports/KafkaTransport.java

- -

https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-kafka-stream-analytics

-
-
learn.microsoft.com
- - - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-12 09:01:24
-
-

Following up on last month’s discussion (), I created the <#C04JPTTC876|spec-compliance> channel for further discussion

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-01-12 17:43:55
-
-

*Thread Reply:* @Julien Le Dem is there a channel to discuss the community call / ask follow-up questions on the communiyt call topics? For example, I wanted to ask more about the AirflowFacet and if we expected to introduce more tool specific facets into the spec. Where's the right place to ask that question? On the PR?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-17 15:11:05
-
-

*Thread Reply:* I think asking in #general is the right place. If there’s a specific github issue/PR, his is a good place as well. You can tag the relevant folks as well to get their attention

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-01-12 18:37:24
-
-

@here I am using the Spark listener and whenever a query like INSERT OVERWRITE TABLE gets executed it looks like I can see some outputs, but there are no symlinks for the output table. The operation type being executed is InsertIntoHadoopFsRelationCommand . I am not sure why I cna see symlinks for all the input tables but not the output tables. Anyone know the reason behind this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-01-13 02:30:37
-
-

*Thread Reply:* Hello @Allison Suarez, in case of InsertIntoHadoopFsRelationCommand, Spark Openlineage implementation uses method: -DatasetIdentifier di = PathUtils.fromURI(command.outputPath().toUri(), "file"); -(https://github.com/OpenLineage/OpenLineage/blob/0.19.2/integration/spark/shared/sr[…]ark/agent/lifecycle/plan/InsertIntoHadoopFsRelationVisitor.java)

- -

If the dataset identifier is constructed from a path, then no symlinks are added. That's the current behaviour.

- -

Calling io.openlineage.spark.agent.util.DatasetIdentifier#withSymlink(io.openlineage.spark.agent.util.DatasetIdentifier.Symlink) on DatasretIdentifier in InsertIntoHadoopFsRelationVisitor -could be a remedy to that.

- -

Do you have some Spark code snippet to reproduce this issue?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-01-22 10:04:56
-
-

*Thread Reply:* @Allison Suarez it would also be good to know what compute engine you're using to run your code on? On-Prem Apache Spark? Azure/AWS/GCP Databricks?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-13 18:18:52
-
-

*Thread Reply:* I created a custom visitor and fixed the issue that way, thank you!

- - - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2023-01-13 11:44:19
-
-

Hi, I am trying to use kafka transport in spark for sending events to an EventHub but it requires me to set a property sasl.jaas.config which needs to have semicolons (;) in its value. But this gives an error about being unable to convert Array to a String. I think this is due to this line which splits property values into an array if they have a semicolon https://github.com/OpenLineage/OpenLineage/blob/92adbc877f0f4008928a420a1b8a93f394[…]pp/src/main/java/io/openlineage/spark/agent/ArgumentParser.java -Does this seem like a bug or is it intentional?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-01-13 14:39:51
-
-

*Thread Reply:* seems like a bug to me, but tagging @Tomasz Nazarewicz / @Paweł Leszczyński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2023-01-13 15:22:19
-
-

*Thread Reply:* So we needed a generic way of passing parameters to client and made an assumption that every field with ; will be treated as an array

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2023-01-14 02:00:04
-
-

*Thread Reply:* Thanks for the confirmation, should I add a condition to split only if it's a key that can have array values? We can have a list of such keys like facets.disabled

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2023-01-14 02:28:41
-
-

*Thread Reply:* We thought about this solution but it forces us to know the structure of each config and we wanted to avoid that as much as possible

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2023-01-14 02:34:06
-
-

*Thread Reply:* Maybe the condition could be having ; and [] in the value

- - - -
- 👍 Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2023-01-15 08:14:14
-
-

*Thread Reply:* Makes sense, I can add this check. Thanks @Tomasz Nazarewicz!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2023-01-16 01:15:19
-
-

*Thread Reply:* Created issue https://github.com/OpenLineage/OpenLineage/issues/1506 for this

-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-17 12:00:02
-
-

Hi everyone, I’m excited to share some good news about our progress in the LFAI & Data Foundation: we’ve achieved Incubation status! This required us to earn a Silver Badge from the OpenSSF, get 300+ stars on GitHub (which was NBD as we have over 1100 already), and win the approval of the LFAI & Data’s TAC. Now that we’ve cleared this hurdle, we have access to additional services from the foundation, including assistance with creative work, marketing and communication support, and event-planning assistance. Graduation from the program, which will earn us a voting seat on the TAC, is on the horizon. Stay tuned for updates on our progress with the foundation.

- -

LF AI & Data is an umbrella foundation of the Linux Foundation that supports open source innovation in artificial intelligence (AI) and data. LF AI & Data was created to support open source AI and data, and to create a sustainable open source AI and data ecosystem that makes it easy to create AI and data products and services using open source technologies. They foster collaboration under a neutral environment with an open governance in support of the harmonization and acceleration of open source technical projects.

- -

For more info about the foundation and other LFAI & Data projects, visit their website.

- - - -
- ❤️ Julien Le Dem, Paweł Leszczyński, Maciej Obuchowski, Ross Turk, Jakub Dardziński, Minkyu Park, Howard Yoo, Jarek Potiuk, Danilo Mota, Willy Lulciuc, Kengo Seki, Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-17 15:53:12
-
-

if you want to share this news (and I hope you do!) there is a blog post here: https://openlineage.io/blog/incubation-stage-lfai/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-17 15:54:07
-
-

and I'll add a quick shoutout of @Michael Robinson, who has done a whole lot of work to make this happen 🎉 thanks, man, you're awesome!

- - - -
- 🙌 Howard Yoo, Maciej Obuchowski, Jarek Potiuk, Minkyu Park, Willy Lulciuc, Kengo Seki, Paweł Leszczyński, Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-17 15:56:38
-
-

*Thread Reply:* Thank you, Ross!! I appreciate it. I might have coordinated it, but it’s been a team effort. Lots of folks shared knowledge and time to help us check all the boxes, literally and figuratively (lots of boxes). ;)

- - - -
- ☑️ Willy Lulciuc, Paweł Leszczyński, Viraj Parekh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jarek Potiuk - (jarek@potiuk.com) -
-
2023-01-17 16:03:36
-
-

Congrats @Michael Robinson and @Ross Turk - > major step for Open Lineage!

- - - -
- 🙌 Michael Robinson, Maciej Obuchowski, Jakub Dardziński, Julien Le Dem, Ross Turk, Willy Lulciuc, Kengo Seki, Viraj Parekh, Paweł Leszczyński, Anirudh Shrinivason, Robert -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhir Nune - (sudhir.nune@kraftheinz.com) -
-
2023-01-18 11:15:02
-
-

Hi all, I am new to the https://openlineage.io/integration/dbt/, I followed the steps on Windows Laptop. But the dbt-ol does not get executed.

- -

'dbt-ol' is not recognized as an internal or external command, -operable program or batch file.

- -

I see the following Packages installed too -openlineage-dbt==0.19.2 -openlineage-integration-common==0.19.2 -openlineage-python==0.19.2

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-18 11:17:14
-
-

*Thread Reply:* What are the errors?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhir Nune - (sudhir.nune@kraftheinz.com) -
-
2023-01-18 11:18:09
-
-

*Thread Reply:* 'dbt-ol' is not recognized as an internal or external command, -operable program or batch file.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-19 11:11:09
-
-

*Thread Reply:* Hm, I think this is due to different windows conventions around scripts.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-19 14:26:35
-
-

*Thread Reply:* I have not tried it on Windows before myself, but on mac/linux if you make a Python virtual environment in venv/ and run pip install openlineage-dbt, the script winds up in ./venv/bin/dbt-ol.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-19 14:27:04
-
-

*Thread Reply:* (maybe that helps!)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-19 14:38:23
-
-

*Thread Reply:* This might not work, but I think I have an idea that would allow it to run as python -m dbt-ol run ...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-19 14:38:27
-
-

*Thread Reply:* That needs one fix though

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhir Nune - (sudhir.nune@kraftheinz.com) -
-
2023-01-19 14:40:52
-
-

*Thread Reply:* Hi @Maciej Obuchowski, thanks for the input, when I try to use python -m dbt-ol run, I see the below error :( -\python.exe: No module named dbt-ol

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-24 13:23:56
-
-

*Thread Reply:* We’re seeing a similar issue with the Great Expectations integration at the moment. This is purely a guess, but what happens when you try with openlineage-dbt 0.18.0?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-24 13:24:36
-
-

*Thread Reply:* @Michael Robinson GE issue is on Windows?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-24 13:24:49
-
-

*Thread Reply:* No, not Windows

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-24 13:24:55
-
-

*Thread Reply:* (that I know of)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhir Nune - (sudhir.nune@kraftheinz.com) -
-
2023-01-24 13:46:39
-
-

*Thread Reply:* @Michael Robinson - I see the same error. I used 2 Combinations

- -
  1. Python 3.8.10 with openlineage-dbt 0.18.0 & Latest
  2. Python 3.9.7 with openlineage-dbt 0.18.0 & Latest
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-24 13:49:19
-
-

*Thread Reply:* Hm. You should be able to find the dbt-ol command wherever pip is installing the packages. In my case, that's usually in a virtual environment.

- -

But if I am not in a virtual environment, it installs the packages in my PYTHONPATH. You might try this to see if the dbt-ol script can be found in one of the directories in sys.path.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-24 13:58:38
-
-

*Thread Reply:* this can help you verify that your PYTHONPATH and PATH are correct - installing an unrelated python command-line tool and seeing if you can execute it:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-24 13:59:42
-
-

*Thread Reply:* Again, I think this is windows issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-24 14:00:54
-
-

*Thread Reply:* @Maciej Obuchowski you think even if dbt-ol could be found in the path, that might not be the issue?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhir Nune - (sudhir.nune@kraftheinz.com) -
-
2023-01-24 14:15:13
-
-

*Thread Reply:* Hi @Ross Turk - I could not find the dbt-ol in the site-packages.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-24 14:16:48
-
-

*Thread Reply:* Hm 😕 then perhaps @Maciej Obuchowski is right and there is a bigger issue here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhir Nune - (sudhir.nune@kraftheinz.com) -
-
2023-01-24 14:31:15
-
-

*Thread Reply:* @Ross Turk & @Maciej Obuchowski I see the issue event when I do the install using the https://pypi.org/project/openlineage-dbt/#files - openlineage-dbt-0.19.2.tar.gz.

- -

For some reason, I see only the following folder created

- -
  1. openlineage
  2. openlineage_dbt-0.19.2.dist-info
  3. openlineageintegrationcommon-0.19.2.dist-info
  4. openlineage_python-0.19.2.dist-info -and not brining in the openlineage-dbt-0.19.2, which has the scripts/dbt-ol
  5. -
- -

If it helps I am using pip 21.2.4

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2023-01-18 18:40:32
-
-

@Paul Villena @Stephen Said and Vishwanatha Nayak published an AWS blog Automate data lineage on Amazon MWAA with OpenLineage

-
-
Amazon Web Services
- - - - - - - - - - - - - - - - - -
- - - -
- 👀 Ross Turk, Peter Hicks, Willy Lulciuc -
- -
- 🔥 Ross Turk, Willy Lulciuc, Michael Collado, Peter Hicks, Minkyu Park, Julien Le Dem, Kengo Seki, Anirudh Shrinivason, Paweł Leszczyński, Maciej Obuchowski, Harel Shein, Paul Wilson Villena -
- -
- ❤️ Willy Lulciuc, Minkyu Park, Julien Le Dem, Kengo Seki, Paweł Leszczyński, Viraj Parekh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-18 18:54:57
-
-

*Thread Reply:* This is excellent! May we promote it on openlineage and marquez social channels?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-01-18 18:55:30
-
-

*Thread Reply:* This is an amazing write up! 🔥 💯 🚀

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2023-01-18 19:49:46
-
-

*Thread Reply:* Happy to have it promoted. 😄 -Vish posted on LinkedIn: https://www.linkedin.com/posts/vishwanatha-nayak-b8462054automate-data-lineage-on-amazon-mwaa-with-activity-7021589819763945473-yMHF?utmsource=share&utmmedium=memberios|https://www.linkedin.com/posts/vishwanatha-nayak-b8462054automate-data-lineage-on-amazon-mwaa-with-activity-7021589819763945473-yMHF?utmsource=share&utmmedium=memberios if you want something to repost there.

-
-
linkedin.com
- - - - - - - - - - - - - - - - - -
- - - -
- ❤️ Willy Lulciuc, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-01-19 00:13:26
-
-

Hi guys, I am trying to build the openlineage jar locally for spark. I ran ./gradlew shadowJar in the /integration/spark directory. However, I am getting this issue: -** What went wrong: -A problem occurred evaluating project ':app'. -&gt; Could not resolve all files for configuration ':app:spark33'. - &gt; Could not resolve io.openlineage:openlineage_java:0.20.0-SNAPSHOT. - Required by: - project :app &gt; project :shared - &gt; Could not resolve io.openlineage:openlineage_java:0.20.0-SNAPSHOT. - &gt; Unable to load Maven meta-data from <https://astronomer.jfrog.io/artifactory/maven-public-libs-snapshot/io/openlineage/openlineage-java/0.20.0-SNAPSHOT/maven-metadata.xml>. - &gt; Could not GET '<https://astronomer.jfrog.io/artifactory/maven-public-libs-snapshot/io/openlineage/openlineage-java/0.20.0-SNAPSHOT/maven-metadata.xml>'. Received status code 401 from server: Unauthorized -It used to work a few weeks ago...May I ask if anyone would know what the reason might be? Thanks! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-01-19 03:58:42
-
-

*Thread Reply:* Hello @Anirudh Shrinivason, you need to build your openlineage-java package first. Possibly you built in some time ao in different version

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-01-19 03:59:28
-
-

*Thread Reply:* ./gradlew clean build publishToMavenLocal -in /client/java should help.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-01-19 04:34:33
-
-

*Thread Reply:* Ahh yeap this works thanks! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-01-19 09:17:01
-
-

Are there any resources to explain the differences between lineage with Apache Atlas vs. lineage using OpenLineage? we have discussions with customers and partners, and some of them are looking into which is more “ready for industry”.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-19 11:03:39
-
-

*Thread Reply:* It's been a while since I looked at Atlas, but does it even now supports something else than very Java Apache-adjacent projects like Hive and HBase?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-19 13:10:11
-
-

*Thread Reply:* To directly answer your question @Sheeri Cabral (Collibra): I am not aware of any resources currently that explain this 😞 but I would welcome the creation of one & pitch in where possible!

- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-01-20 17:00:25
-
-

*Thread Reply:* I don’t know enough about Atlas to make that doc.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Justine Boulant - (justine.boulant@seenovate.com) -
-
2023-01-19 10:43:18
-
-

Hi everyone, I am currently working on a project and we have some questions to use OpenLineage with Apache Airflow : -• How does it work : ux vs code/script? How can we implement it? a schema of its architecture for example -• What are the visual outputs available? -• Is the lineage done from A to Z? if there are multiple intermediary transformations for example? -• Is the lineage done horizontally across the organization or vertically on different system levels? or both? -• Can we upgrade it to industry-level? -• Does it work with Python and/or R? -• Does it read metadata or scripts? -Thanks a lot if you can help 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-19 11:00:54
-
-

*Thread Reply:* I think most of your questions will be answered by this video: https://www.youtube.com/watch?v=LRr-ja8_Wjs

-
-
YouTube
- -
- - - } - - Astronomer - (https://www.youtube.com/@Astronomer) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-19 13:10:58
-
-

*Thread Reply:* I agree - a lot of the answers are in that overview video. You might also take a look at the docs, they do a pretty good job of explaining how it works.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-19 13:19:34
-
-

*Thread Reply:* More explicitly: -• Airflow is an interesting platform to observe because it runs a large variety of workloads and lineage can only be automatically extracted for some of them -• In general, OpenLineage is essentially a standard and data model for lineage. There are integrations for various systems, including Airflow, that cause them to emit lineage events to an OpenLineage compatible backend. It's a push model. -• Marquez is one such backend, and the one I recommend for testing & development -• There are a few approaches for lineage in Airflow: - ◦ Extractors, which pair with Operators to extract and emit lineage - ◦ Manual inlets/outlets on a task, defined by a developer - useful for PythonOperator and other cases where an extractor can't do it auto - ◦ Orchestration of an underlying OpenLineage integration, like openlineage-dbt -• IDK about "A to Z", that depends on your environment. The goal is to capture every transformation. Depending on your pipeline, there may be a set of integrations that give you the coverage you need. We often find that there are gaps. -• It works with Python. You can use the openlineage-python client to emit lineage events to a backend. This is useful if there isn't an integration for something your pipeline does. -• It describes the pipeline by observing running jobs and the way they affect datasets, not the organization. I don't know what you mean by "industry-level". -• I am not aware of an integration that parses source code to determine lineage at this time. -• The openlineage-dbt integration consumes the various metadata that dbt leaves behind to construct lineage. Dunno if that's what you mean by "read metadata".

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-19 13:23:33
-
-

*Thread Reply:* FWIW I did a workshop on openlineage and airflow a while back, and it's all in this repo. You can find slides + a quick Python example + a simple Airflow example in there.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Justine Boulant - (justine.boulant@seenovate.com) -
-
2023-01-20 03:44:22
-
-

*Thread Reply:* Thanks a lot!! Very helpful!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-20 11:42:43
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-01-20 15:28:06
-
-

Hey folks, my team is working on a solution that would support the OL standard with column level lineage. I'm working through the architecture now and I'm wondering if everyone uses the standard rest api backed by a db or if other teams found success using other technologies such as webhooks, streams, etc in order to capture and process lineage events. I'd be very curious to connect on the topic

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-20 19:45:55
-
-

*Thread Reply:* Hello Brad, on top of my head:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-20 19:47:15
-
-

*Thread Reply:* • Marquez uses the API HTTP Post. so does Astro -• Egeria and Purview prefer consuming through a Kafka topic. There is a ProxyBackend that takes HTTP Posts and writes to Kafka. The client can also be configured to write to Kafka

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-20 19:48:09
-
-

*Thread Reply:* @Will Johnson @Mandy Chessell might have opinions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-20 19:49:10
-
-

*Thread Reply:* The Microsoft Purview approach is documented here: https://learn.microsoft.com/en-us/samples/microsoft/purview-adb-lineage-solution-accelerator/azure-databricks-to-purview-lineage-connector/

-
-
learn.microsoft.com
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-20 19:49:47
-
-

*Thread Reply:* There’s a blog post about Egeria here: https://openlineage.io/blog/openlineage-egeria/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-01-22 10:00:56
-
-

*Thread Reply:* @Brad Paskewitz at Microsoft, the solution that Julien linked above, we are using the HTTP Transport (REST API) as we are consuming the OpenLineage Events and transforming them to Apache Atlas / Microsoft Purview.

- -

However, there is a good deal of interest in using the kafka transport instead and that's our future roadmap.

- - - -
- 👍 Ross Turk, Brad Paskewitz -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Quentin Nambot - (qnambot@gmail.com) -
-
2023-01-25 09:59:13
-
-

❓ Hi everyone, I am trying to use openlineage with Databricks (using 11.3 LTS runtime, and openlineage 0.19.2) -Using this documentation I managed to install openlineage and send events to marquez -However marquez did not received all COMPLETE events, it seems like databricks cluster is shutdown immediatly at the end of the job. It is not the first time that i see this with databricks, last year I tried to use spline and we noticed that Databricks seems to not wait that spark session is nicely closed before shutting down instances (see this issue) -My question is: has anyone faced the same issue? Does somebody know a workaround? 🙏

-
-
spline
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2023-01-25 12:04:48
-
-

*Thread Reply:* Hmm, if Databricks is shutting the process down without waiting for the ListenerBus to clear, I don’t know that there’s a lot we can do. The best thing is to somehow delay the main application thread from exiting. One thing you could try is to subclass the OpenLineageSparkListener and generate a lock for each SparkListenerSQLExecutionStart and release it when the accompanying SparkListenerSQLExecutionEnd event is processed. Then, in the main application, block until all such locks are released. If you try it and it works, let us know!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Quentin Nambot - (qnambot@gmail.com) -
-
2023-01-26 05:46:35
-
-

*Thread Reply:* Ok thanks for the idea! I'll tell you if I try this and if it works 🤞

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Petr Hajek - (petr.hajek@profinit.eu) -
-
2023-01-25 10:12:42
-
-

Hi, would anybody be able and willing to help us configure S3 and Snowflake extractors within Airflow integration for one of our clients? Our trouble is that Airflow integration returns valid OpenLineage .json files but it lacks any information about input and output DataSets. Thanks in advance 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-01-25 10:38:03
-
-

*Thread Reply:* Hey Petr. Please DM me or describe the issue here 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-27 15:24:47
-
-

Hello.. I am trying to play with openlineage spark integration with Kafka and currently trying to just use the config as part of the spark submit command but I run into errors. Details in the 🧵

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-27 15:25:04
-
-

*Thread Reply:* Command -spark-submit --packages "io.openlineage:openlineage_spark:0.19.+" \ - --conf "spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener" \ - --conf "spark.openlineage.transport.type=kafka" \ - --conf "spark.openlineage.transport.topicName=topicname" \ - --conf "spark.openlineage.transport.localServerId=Kafka_server" \ - file.py

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-27 15:25:14
-
-

*Thread Reply:* 23/01/27 17:29:06 ERROR AsyncEventQueue: Listener OpenLineageSparkListener threw an exception -java.lang.NullPointerException - at io.openlineage.client.transports.TransportFactory.build(TransportFactory.java:44) - at io.openlineage.spark.agent.EventEmitter.&lt;init&gt;(EventEmitter.java:40) - at io.openlineage.spark.agent.OpenLineageSparkListener.initializeContextFactoryIfNotInitialized(OpenLineageSparkListener.java:278) - at io.openlineage.spark.agent.OpenLineageSparkListener.onApplicationStart(OpenLineageSparkListener.java:267) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:55) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-27 15:25:31
-
-

*Thread Reply:* I would appreciate any pointers on getting started with using openlineage-spark with Kafka.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-27 16:15:00
-
-

*Thread Reply:* Also this might seem a little elementary but the kafka topic itself, should it be hosted on the spark cluster or could it be any kafka topic?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 08:37:07
-
-

*Thread Reply:* 👀 Could I get some help on this, please?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 09:07:08
-
-

*Thread Reply:* I think any NullPointerException is clearly our bug, can you open issue on OL GitHub?

- - - -
- 👍 Susmitha Anandarao -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 09:30:51
-
-

*Thread Reply:* @Maciej Obuchowski Another interesting thing is if I use 0.19.2 version specifically, I get -23/01/30 14:28:33 INFO RddExecutionContext: RDDs are empty: skipping sending OpenLineage event

- -

I am trying to print to console at the moment. I haven't been able to get Kafka transport type working though.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 09:41:12
-
-

*Thread Reply:* Are you getting events printed on the console though? This log should not affect you if you're running, for example Spark SQL jobs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 09:42:28
-
-

*Thread Reply:* I am trying to run a python file using pyspark. 23/01/30 14:40:49 INFO RddExecutionContext: RDDs are empty: skipping sending OpenLineage event -I see this and don't see any events on the console.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 09:55:41
-
-

*Thread Reply:* Any logs filling pattern -log.warn("Unable to access job conf from RDD", nfe); -or -<a href="http://log.info">log.info</a>("Found job conf from RDD {}", jc); -before?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 09:57:20
-
-

*Thread Reply:* ```23/01/30 14:40:48 INFO DAGScheduler: Submitting ShuffleMapStage 0 (PairwiseRDD[2] at reduceByKey at /tmp/spark-20487725-f49b-4587-986d-e63a61890673/statusapidemo.py:47), which has no missing parents -23/01/30 14:40:49 WARN RddExecutionContext: Unable to access job conf from RDD -java.lang.NoSuchFieldException: Field is not instance of HadoopMapRedWriteConfigUtil - at io.openlineage.spark.agent.lifecycle.RddExecutionContext.lambda$setActiveJob$0(RddExecutionContext.java:117) - at java.util.Optional.orElseThrow(Optional.java:290) - at io.openlineage.spark.agent.lifecycle.RddExecutionContext.setActiveJob(RddExecutionContext.java:115) - at java.util.Optional.ifPresent(Optional.java:159) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$onJobStart$9(OpenLineageSparkListener.java:148) - at java.util.Optional.ifPresent(Optional.java:159) - at io.openlineage.spark.agent.OpenLineageSparkListener.onJobStart(OpenLineageSparkListener.java:145) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

- -

23/01/30 14:40:49 INFO RddExecutionContext: Found job conf from RDD Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-rbf-default.xml, hdfs-site.xml, hdfs-rbf-site.xml, resource-types.xml

- -

23/01/30 14:40:49 INFO RddExecutionContext: Found output path null from RDD PythonRDD[5] at collect at /tmp/spark-20487725-f49b-4587-986d-e63a61890673/statusapidemo.py:48 -23/01/30 14:40:49 INFO RddExecutionContext: RDDs are empty: skipping sending OpenLineage event``` -I see both actually.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 10:03:35
-
-

*Thread Reply:* I think this is same problem as this: https://github.com/OpenLineage/OpenLineage/issues/1521

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 10:04:14
-
-

*Thread Reply:* and I think I might have solution on a branch for it, just need to polish it up to release

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 10:13:37
-
-

*Thread Reply:* Aah got it. I will give it a try with SQL and a jar.

- -

Do you have a ETA on when the python issue would be fixed?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 10:37:51
-
-

*Thread Reply:* @Maciej Obuchowski Well I run into the same errors if I run spark-submit on a jar.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 10:38:44
-
-

*Thread Reply:* I think that has nothing to do with python

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 10:39:16
-
-

*Thread Reply:* BTW, which Spark version are you using?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 10:41:22
-
-

*Thread Reply:* We are on 3.3.1

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 11:38:24
-
-

*Thread Reply:* @Maciej Obuchowski Do you have a estimated release date for the fix. Our team is specifically interested in using the Emitter to write out to Kafka.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 11:46:30
-
-

*Thread Reply:* I think we plan to release somewhere in the next week

- - - -
- :gratitude_thank_you: Susmitha Anandarao -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-06 09:21:25
-
-

*Thread Reply:* @Susmitha Anandarao PR fixing this has been merged, release should be today

- - - -
- :gratitude_thank_you: Susmitha Anandarao -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-01-27 16:31:45
-
-

👋 -what would be the reason conn_id on something like SQLCheckOperator ends up being None when OpenLineage attempts to extract metadata but is fine on task execution?

- -

i'm using OpenLineage for Airflow 0.14.1 on 2.3.4 and i'm getting an error about connid not being found. it's a SQLCheckOperator where the check runs fine but the task fails because when OpenLineage goes to extract task metadata it attempts to grab the connid but at that moment it finds it to be None.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-27 18:38:40
-
-

*Thread Reply:* hmmm, I am not sure. perhaps @Benji Lampel can help, he’s very familiar with those operators.

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-01-27 18:46:15
-
-

*Thread Reply:* @Benji Lampel any help would be appreciated!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-01-30 09:01:34
-
-

*Thread Reply:* Hey Paul, the SQLCheckExtractors were written with the intent that they would be used by a provider that inherits for them - they are all treated as a sort of base class. What is the exact error message you're getting? And what is the operator code? -Could you try this with a PostgresCheckOperator ? -(Also, only the SqlColumnCheckOperator and SqlTableCheckOperator will provide data quality facets in their output, those functions are not implementable in the other operators at this time)

- - - -
- 👀 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-01-31 14:36:07
-
-

*Thread Reply:* @Benji Lampel here is the error message. i am not sure what the operator code is.

- -

3-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - Traceback (most recent call last): -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - self.run() -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/usr/lib/python3.8/threading.py", line 870, in run -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - self._target(**self._args, ****self._kwargs) -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/code/venvs/venv/lib/python3.8/site-packages/openlineage/airflow/listener.py", line 99, in on_running -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - task_metadata = extractor_manager.extract_metadata(dagrun, task) -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/code/venvs/venv/lib/python3.8/site-packages/openlineage/airflow/extractors/manager.py", line 28, in extract_metadata -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - extractor = self._get_extractor(task) -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/code/venvs/venv/lib/python3.8/site-packages/openlineage/airflow/extractors/manager.py", line 96, in _get_extractor -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - self.task_to_extractor.instantiate_abstract_extractors(task) -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/code/venvs/venv/lib/python3.8/site-packages/openlineage/airflow/extractors/extractors.py", line 118, in instantiate_abstract_extractors -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - task_conn_type = BaseHook.get_connection(task.conn_id).conn_type -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/code/venvs/venv/lib/python3.8/site-packages/airflow/hooks/base.py", line 67, in get_connection -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - conn = Connection.get_connection_from_secrets(conn_id) -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/code/venvs/venv/lib/python3.8/site-packages/airflow/models/connection.py", line 430, in get_connection_from_secrets -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - raise AirflowNotFoundException(f"The conn_id `{conn_id}` isn't defined") -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - airflow.exceptions.AirflowNotFoundException: The conn_id `None` isn't defined

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-01-31 14:37:06
-
-

*Thread Reply:* and above that

- -

[2023-01-31, 00:32:38 UTC] {connection.py:424} ERROR - Unable to retrieve connection from secrets backend (EnvironmentVariablesBackend). Checking subsequent secrets backend. -Traceback (most recent call last): - File "/code/venvs/venv/lib/python3.8/site-packages/airflow/models/connection.py", line 420, in get_connection_from_secrets - conn = secrets_backend.get_connection(conn_id=conn_id) - File "/code/venvs/venv/lib/python3.8/site-packages/airflow/secrets/base_secrets.py", line 91, in get_connection - value = self.get_conn_value(conn_id=conn_id) - File "/code/venvs/venv/lib/python3.8/site-packages/airflow/secrets/environment_variables.py", line 48, in get_conn_value - return os.environ.get(CONN_ENV_PREFIX + conn_id.upper())

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-01-31 14:39:31
-
-

*Thread Reply:* sorry, i should mention we're wrapping over the CheckOperator as we're still migrating from 1.10.15 @Benji Lampel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-01-31 15:09:51
-
-

*Thread Reply:* What do you mean by wrapping the CheckOperator? Like how so, exactly? Can you show me the operator code you're using in the DAG?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-01-31 17:38:45
-
-

*Thread Reply:* like so

- -

class CustomSQLCheckOperator(CheckOperator): -....

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-01-31 17:39:30
-
-

*Thread Reply:* i think i found the issue though, we have our own get_hook function and so we don't follow the traditional Airflow way of setting CONN_ID which is why CONN_ID is always None and that path only gets called through OpenLineage which doesn't ever get called with our custom wrapper

- - - -
- ✅ Benji Lampel -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-01-30 03:50:39
-
-

Hi everyone, I am using openlineage to capture column level lineage from spark databricks. I noticed that the environment variables captured are only present in the start event, but are not present in the complete event. Is there a reason why it is implemented like this? It seems more intuitive that whatever variables are present in the start event should also be present in the complete event...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-31 08:30:37
-
-

Hi everyone.. Does the DBT integration provide an option to emit events to a Kafka topic similar to the Spark integration? I could not find anything regarding this in the documentation and I wanted to make sure if only http transport type is supported. Thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-31 12:57:47
-
-

*Thread Reply:* The dbt integration uses the python client, you should be able to do something similar than with the java client. See here: https://github.com/OpenLineage/OpenLineage/tree/main/client/python#kafka

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-31 13:26:33
-
-

*Thread Reply:* Thank you for this!

- -

I created a openlineage.yml file with the following data to test out the integration locally. -transport: - type: "kafka" - config: { 'bootstrap.servers': 'localhost:9092', } - topic: "ol_dbt_events" -However, I run into a no module named 'confluent_kafka' error from this code. -Running OpenLineage dbt wrapper version 0.19.2 -This wrapper will send OpenLineage events at the end of dbt execution. -Traceback (most recent call last): - File "/Users/susmithaanandarao/.pyenv/virtualenvs/dbt-examples-domain-repo/3.9.8/bin/dbt-ol", line 168, in &lt;module&gt; - main() - File "/Users/susmithaanandarao/.pyenv/virtualenvs/dbt-examples-domain-repo/3.9.8/bin/dbt-ol", line 94, in main - client = OpenLineageClient.from_environment() - File "/Users/susmithaanandarao/.pyenv/virtualenvs/dbt-examples-domain-repo/3.9.8/lib/python3.9/site-packages/openlineage/client/client.py", line 73, in from_environment - return cls(transport=get_default_factory().create()) - File "/Users/susmithaanandarao/.pyenv/virtualenvs/dbt-examples-domain-repo/3.9.8/lib/python3.9/site-packages/openlineage/client/transport/factory.py", line 37, in create - return self._create_transport(yml_config) - File "/Users/susmithaanandarao/.pyenv/virtualenvs/dbt-examples-domain-repo/3.9.8/lib/python3.9/site-packages/openlineage/client/transport/factory.py", line 69, in _create_transport - return transport_class(config_class.from_dict(config)) - File "/Users/susmithaanandarao/.pyenv/virtualenvs/dbt-examples-domain-repo/3.9.8/lib/python3.9/site-packages/openlineage/client/transport/kafka.py", line 43, in __init__ - import confluent_kafka as kafka -ModuleNotFoundError: No module named 'confluent_kafka' -Manually installing confluent-kafka worked. But I am curious why it was not automatically installed and if I am missing any config.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-02 14:39:29
-
-

*Thread Reply:* @Susmitha Anandarao It's not installed because it's large binary package. We don't want to install for every user something giant majority won't use, and it's 100x bigger than rest of the client.

- -

We need to indicate this way better, and do not throw this error directly at user thought, both in docs and code.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-01-31 11:28:53
-
-

~Hey, would love to see a release of OpenLineage~

- - - -
- ➕ Michael Robinson, Jakub Dardziński, Ross Turk, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-31 12:51:44
-
-

Hello, I have been working on a proposal to bring an OpenLineage provider to -Airflow. I am currently looking for feedback on a draft AIP. See the thread here: https://lists.apache.org/thread/2brvl4ynkxcff86zlokkb47wb5gx8hw7

- - - -
- 🔥 Maciej Obuchowski, Viraj Parekh, Jakub Dardziński, Enrico Rotundo, Harel Shein, Paweł Leszczyński -
- -
- 👀 Enrico Rotundo -
- -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-01-31 14:02:21
-
-

@Willy Lulciuc, - Any updates on - https://github.com/OpenLineage/OpenLineage/discussions/1494

-
- - - - - - - -
-
Category
- Ideas -
- -
-
Comments
- 3 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Natalie Zeller - (natalie.zeller@naturalint.com) -
-
2023-02-02 08:26:38
-
-

Hello, -While trying to use OpenLineage with spark, I've noticed that sometimes the query execution is missing or already got closed (here is the relevant code). As a result, some of the events are skipped. Is this a known issue? Is there a way to overcome it?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-02 08:39:34
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/999#issuecomment-1209048556

- -

Does this fit your experience?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-02 08:39:59
-
-

*Thread Reply:* We sometimes experience this in context of very small, quick jobs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Natalie Zeller - (natalie.zeller@naturalint.com) -
-
2023-02-02 08:43:24
-
-

*Thread Reply:* Yes, my scenarios are dealing with quick jobs. -Good to know that we will be able to solve it with future spark versions. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-02 11:09:13
-
-

@channel -This month’s OpenLineage TSC meeting is next Thursday, February 9th, at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom. All are welcome! -On the tentative agenda:

- -
  1. Recent release overview @Michael Robinson
  2. AIP: OpenLineage in Airflow
  3. Discussions: -• Real-world implementation of OpenLineage (What does it really mean?) @Sheeri Cabral (Collibra) (continued) -• Using namespaces @Michael Robinson
  4. Open discussion -Notes: https://bit.ly/OLwiki -Is there a topic you think the community should discuss at this or a future meeting? Reply or DM me to add items to the agenda.
  5. -
- - - -
- 🔥 Maciej Obuchowski, Bramha Aelem, Viraj Parekh, Brad Paskewitz, Harel Shein -
- -
- 👍 Bramha Aelem, Viraj Parekh, Enrico Rotundo, Daniel Henneberger -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-03 13:22:51
-
-

Hi folks, I’m opening a vote to release OpenLineage 0.20.0, featuring: -• Airflow: add new extractor for GCSToGCSOperator - Adds a new extractor for this operator. -• Proxy: implement lineage event validator for client proxy
- Implements logic in the proxy (which is still in development) for validating and handling lineage events. -• A fix of a breaking change in the common integration and other bug fixes in the DBT, Airflow, Spark, and SQL integrations and in the Java and Python clients. -As per the policy here, three +1s from committers will authorize. Thanks in advance.

- - - -
- ➕ Willy Lulciuc, Maciej Obuchowski, Julien Le Dem, Jakub Dardziński, Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-03 13:24:03
-
-

*Thread Reply:* exciting to see the client proxy work being released by @Minkyu Park 💯

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-03 13:35:38
-
-

*Thread Reply:* This was without a doubt among the fastest release votes we’ve ever had 😉 . Thank you! You can expect the release to happen on Monday.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2023-02-03 14:02:52
-
-

*Thread Reply:* Lol the proxy is still in development and not ready for use

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-03 14:03:26
-
-

*Thread Reply:* Good point! Let’s make that clear in the release / docs?

- - - -
- 👍 Michael Robinson, Minkyu Park -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2023-02-03 14:03:33
-
-

*Thread Reply:* But it doesn’t block anything anyway, so happy to see the release

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2023-02-03 14:04:38
-
-

*Thread Reply:* We can celebrate that the proposal for the proxy is merged. I’m happy with that 🥳

- - - -
- 🎊 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Joanes - (djoanes@gmail.com) -
-
2023-02-06 00:01:49
-
-

Hey 👋 From what I gather, there's no solution to getting column level lineage from spark streaming jobs. Is there a issue I can follow to keep track?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-02-06 14:47:15
-
-

*Thread Reply:* Hey @Daniel Joanes! thanks for the question.

- -

I am not aware of an issue that captures this. Column-level lineage is a somewhat new facet in the spec, and implementations across the various integrations are in varying states of readiness.

- -

I invite you to create the issue - that way it's attributed to you, which makes sense because you're the one who first raised it. But I'm happy to create it for you & give you the PR# if you'd rather, just let me know 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Joanes - (djoanes@gmail.com) -
-
2023-02-06 14:50:59
-
-

*Thread Reply:* Go for it, once it's created i'll add a watch

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Joanes - (djoanes@gmail.com) -
-
2023-02-06 14:51:13
-
-

*Thread Reply:* Thanks Ross!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-02-06 23:10:30
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/1581

-
- - - - - - - -
-
Labels
- integration/spark, column-level-lineage -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-07 18:46:50
-
-

@channel -OpenLineage 0.20.4 is now available, including: -Additions: -• Airflow: add new extractor for GCSToGCSOperator #1495 @sekikn -• Flink: resolve topic names from regex, support 1.16.0 #1522 @pawel-big-lebowski -• Proxy: implement lineage event validator for client proxy #1469 @fm100 -Changes: -• CI: use ruff instead of flake8, isort, etc., for linting and formatting #1526 @mobuchowski -Plus many bug fixes & doc changes. -Thank you to all our contributors! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.20.4 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.19.2...0.20.4 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🎉 Kengo Seki, Harel Shein, Willy Lulciuc, Nadav Geva -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-08 15:31:32
-
-

@channel -Friendly reminder: this month’s OpenLineage TSC meeting is tomorrow at 10am, and all are welcome. https://openlineage.slack.com/archives/C01CK9T7HKR/p1675354153489629

- - - -
- ❤️ Minkyu Park, Kengo Seki, Paweł Leszczyński, Harel Shein, Sheeri Cabral (Collibra), Enrico Rotundo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-02-09 10:50:07
-
-

Hey, can we please schedule a release of OpenLineage? I would like to have a release that includes the latest fixes for Async Operator on Airflow and some dbt bug fixes.

- - - -
- ➕ Michael Robinson, Maciej Obuchowski, Benji Lampel, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-09 10:50:49
-
-

*Thread Reply:* Thanks for requesting a release. 3 +1s from committers will authorize an immediate release.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-09 11:15:35
-
-

*Thread Reply:* 0.20.5 ?

- - - -
- ➕ Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-02-09 11:28:20
-
-

*Thread Reply:* @Michael Robinson auth'd

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-09 11:32:06
-
-

*Thread Reply:* 👍 the release is authorized

- - - -
- ❤️ Sheeri Cabral (Collibra), Willy Lulciuc, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Avinash Pancham - (avinashpancham@outlook.com) -
-
2023-02-09 15:57:58
-
-

Hi all, I have been experimenting with OpenLineage for a few days and it's great! I successfully setup the openlineage-spark listener on my Databricks cluster and that pushes openlineage data to our Marquez backend. That was all pretty easy to do 🙂

- -

Now for my challenge: I would like to actually extend the metadata that my cluster pushes with custom values (you can think of spark config settings, commit hash of the executed code, or maybe even runtime defined values). I browsed through some documentation and found custom facets one can define. The link below describes how to use Python to push custom metadata to a backend, but I was actually hoping that there was a way to do this automatically in Spark. So ideally I would like to write my own OpenLineage.json (that has my custom facet) and tell Spark to use that Openlineage spec instead of the default one. In that way I hope my custom metadata will be forwarded automatically.

- -

I just do not know how to do that (and whether that is even possible), since I could not find any tutorials on that topic. Any help on this would be greatly appreciated!

- -

https://openlineage.io/docs/spec/facets/custom-facets

-
-
openlineage.io
- - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-02-09 16:23:36
-
-

*Thread Reply:* I am also exploring something similar, but writing to kafka, and would want to know more on how we could add custom metadata from spark.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-02-10 02:23:40
-
-

*Thread Reply:* Hi @Avinash Pancham @Susmitha Anandarao, it's great to hear about successful experimenting on your side.

- -

Although Openlineage spec provides some built-in facets definition, a facet object can be anything you want (https://openlineage.io/apidocs/openapi/#tag/OpenLineage/operation/postRunEvent). The example metadata provided in this chat could be put into job or run facets I believe.

- -

There is also a way to extend Spark integration to collect custom metadata described here (https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#extending). One needs to create own JAR with DatasetFacetBuilders, RunFacetsBuilder (whatever is needed). openlineage-spark integration will make use of those bulders.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-02-10 09:09:10
-
-

*Thread Reply:* (I would love to see what your specs are! I’m not with Astronomer, just a community member, but I am finding that many of the customizations people are making to the spec are valuable ones that we should consider adding to core)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-02-14 16:51:28
-
-

*Thread Reply:* Are there any examples out there of customizations already done in Spark? An example would definitely help!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-15 08:43:08
-
-

*Thread Reply:* I think @Will Johnson might have something to add about customization

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-02-15 23:58:36
-
-

*Thread Reply:* Oh man... Mike Collado did a nice write up on Slack of how many different ways there are to customize / extend OpenLineage. I know we all talked about doing a blog post at one point!

- -

@Susmitha Anandarao - You might take a look at https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/[…]ark/agent/facets/builder/DatabricksEnvironmentFacetBuilder.java which has a hard coded set of properties we are extracting.

- -

It looks like Avinash's changes were accepted as well: https://github.com/OpenLineage/OpenLineage/pull/1545

- -
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-10 12:42:24
-
-

@channel -OpenLineage 0.20.6 is now available, including: -Additions -• Airflow: add new extractor for FTPFileTransmitOperator #1603 @sekikn -Changes -• Airflow: make extractors for async operators work #1601 @JDarDagran -Thanks to all our contributors! -For the bug fixes and details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.20.6 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.20.4...0.20.6 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🥳 Minkyu Park, Willy Lulciuc, Kengo Seki, Paweł Leszczyński, Anirudh Shrinivason, pankaj koti, Maciej Obuchowski -
- -
- ❤️ Minkyu Park, Ross Turk, Willy Lulciuc, Kengo Seki, Paweł Leszczyński, Anirudh Shrinivason, pankaj koti -
- -
- 🎉 Minkyu Park, Willy Lulciuc, Kengo Seki, Anirudh Shrinivason, pankaj koti -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-13 14:20:26
-
-

Hi everyone, in case you missed the announcement at the most recent community meeting, our first-ever meetup will be held on March 9th in Providence, RI. Join us there to learn more about the present and future of OpenLineage, meet other members of the ecosystem, learn about the project’s goals and fundamental design, and participate in a discussion about the future of the project. -Food will be provided, and the meetup is open to all. Don’t miss this opportunity to influence the direction of this important new standard! We hope to see you there. -More information: https://openlineage.io/blog/data-lineage-meetup/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🎉 Harel Shein, Ross Turk, Maciej Obuchowski, Kengo Seki, Paweł Leszczyński, Willy Lulciuc, Sheeri Cabral (Collibra) -
- -
- 🔥 Harel Shein, Ross Turk, Maciej Obuchowski, Anirudh Shrinivason, Kengo Seki, Paweł Leszczyński, Willy Lulciuc, Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Quentin Nambot - (qnambot@gmail.com) -
-
2023-02-15 04:52:27
-
-

Hi, I opened a PR to fix the way that Athena extractor get the database, but spark integration tests failed. However I don't think that it is related to my PR, since I only updated the Airflow integration -Can anybody help me with that please? 🙏

-
- - - - - - - -
-
Labels
- integration/airflow, extractor -
- -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Quentin Nambot - (qnambot@gmail.com) -
-
2023-02-15 04:52:59
- -
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-15 07:19:39
-
-

*Thread Reply:* @Quentin Nambot this happens because we run additional integration tests against real databases (like BigQuery) which aren't ever configured on forks, since we don't want to expose our secrets. We need to figure out how to make this experience better, but in the meantime we've pushed your code using git-push-fork-to-upstream-branch and it passes all the tests.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-15 07:21:49
-
-

*Thread Reply:* Feel free to un-draft your PR if you think it's ready for review

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Quentin Nambot - (qnambot@gmail.com) -
-
2023-02-15 08:03:56
-
-

*Thread Reply:* Ok nice thanks 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Quentin Nambot - (qnambot@gmail.com) -
-
2023-02-15 08:04:49
-
-

*Thread Reply:* I think it's ready, however should I update the version somewhere?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-15 08:42:39
-
-

*Thread Reply:* @Quentin Nambot I don't think so - it's just that you opened PR as Draft , so I'm not sure if you want to add something else to it.

- - - -
- 👍 Quentin Nambot -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Quentin Nambot - (qnambot@gmail.com) -
-
2023-02-15 08:43:36
-
-

*Thread Reply:* No I don't want to add anything so I opened it 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-15 21:26:37
-
-

@here I have a question about extending the spark integration. Is there a way to use a custom visitor factory? I am trying to see if I can add a visitor for a command that is not currently covered in this integration (AlterTableAddPartitionCommand). It seems that because its not in the base visitor factory I am unable to use the visitor I created.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - - -
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-15 21:32:19
-
-

*Thread Reply:* I have that set up already like this: -public class LyftOpenLineageEventHandlerFactory implements OpenLineageEventHandlerFactory { - @Override - public Collection&lt;PartialFunction&lt;LogicalPlan, List&lt;OutputDataset&gt;&gt;&gt; - createOutputDatasetQueryPlanVisitors(OpenLineageContext context) { - Collection&lt;PartialFunction&lt;LogicalPlan, List&lt;OutputDataset&gt;&gt;&gt; visitors = new ArrayList&lt;PartialFunction&lt;LogicalPlan, List&lt;OutputDataset&gt;&gt;&gt;(); - visitors.add(new LyftInsertIntoHadoopFsRelationVisitor(context)); - visitors.add(new AlterTableAddPartitionVisitor(context)); - visitors.add(new AlterTableDropPartitionVisitor(context)); - return visitors; - } -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-15 21:33:35
-
-

*Thread Reply:* do I just add a constructor? the visitorFactory is private so I wasn't sure if that's something that was intended to change

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-15 21:34:30
-
-

*Thread Reply:* .

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-15 21:34:49
-
-

*Thread Reply:* @Michael Collado

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2023-02-15 21:35:14
-
-

*Thread Reply:* The VisitorFactory is only used by the internal EventHandlerFactory. It shouldn’t be needed for your custom one

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2023-02-15 21:35:48
-
-

*Thread Reply:* Have you added the file to the META-INF folder of your jar?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-16 11:01:56
-
-

*Thread Reply:* yes, I am able to use my custom event handler factory with a list of visitors but for some reason I cant access the visitors for some commands (AlterTableAddPartitionCommand) is one

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-16 11:02:29
-
-

*Thread Reply:* so even if I set up everything correctly I am unable to reach the code for that specific visitor

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-16 11:05:22
-
-

*Thread Reply:* and my assumption is I can reach other commands but not this one because the command is not defined in the BaseVisitorFactory but maybe im wrong @Michael Collado

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2023-02-16 15:05:19
-
-

*Thread Reply:* the VisitorFactory is loaded by the InternalEventHandlerFactory here. However, the createOutputDatasetQueryPlanVisitors should contain a union of everything defined by the VisitorFactory as well as your custom visitors: see this code.

- - - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2023-02-16 15:09:21
-
-

*Thread Reply:* there might be a conflict with another visitor that’s being matched against that command. Can you turn on debug logging and look for this line to see what visitor is being applied to that command?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-16 16:54:46
-
-

*Thread Reply:* This was helpful, it works now, thank you so much Michael!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
slackbot - -
-
2023-02-16 19:08:26
-
-

This message was deleted.

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:09:49
-
-

*Thread Reply:* what is the curl cmd you are running? and what endpoint are you hitting? (assuming Marquez?)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:18:28
-
-

*Thread Reply:* yep -I am running curl - X curl -X POST http://localhost:5000/api/v1/namespaces/test ^ - -H 'Content-Type: application/json' ^ - -d '{ownerName:"me", description:"no description"^ - }'

- -

the weird thing is the log where I don't have a 0.0.0.0 IP (the log correspond to the equivament postman command)

- -

marquez-api | WARN [2023-02-17 00:14:32,695] marquez.logging.LoggingMdcFilter: status: 405 -marquez-api | XXX.23.0.1 - - [17/Feb/2023:00:14:32 +0000] "POST /api/v1/namespaces/test HTTP/1.1" 405 52 "-" "PostmanRuntime/7.30.0" 2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:23:08
-
-

*Thread Reply:* Marquez logs all supported endpoints (and methods) on start up. For example, here are all the supported methods on /api/v1/namespaces/{namespace} : -marquez-api | DELETE /api/v1/namespaces/{namespace} (marquez.api.NamespaceResource) -marquez-api | GET /api/v1/namespaces/{namespace} (marquez.api.NamespaceResource) -marquez-api | PUT /api/v1/namespaces/{namespace} (marquez.api.NamespaceResource) -To ADD a namespace, you’ll want to use PUT (see API docs)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:26:23
-
-

*Thread Reply:* 3rd stupid question of the night -Sorry kept on trying POST who knows why

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:26:56
-
-

*Thread Reply:* no worries! keep the questions coming!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:29:46
-
-

*Thread Reply:* well, maybe because it’s so late on your end! get some rest!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:36:25
-
-

*Thread Reply:* Yeah but I want to see how it works -Right now I have a response 200 for the creation of the names ... but it seems that nothing occurred -nor on marquez front end (localhost:3000) -nor on the database

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:37:13
-
-

*Thread Reply:* can you curl the list namespaces endpoint?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:38:14
-
-

*Thread Reply:* yep : nothing changed -only default and food_delivery

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:38:47
-
-

*Thread Reply:* can you post your server logs? you should see the request

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:40:41
-
-

*Thread Reply:* marquez-api | XXX.23.0.4 - - [17/Feb/2023:00:30:38 +0000] "PUT /api/v1/namespaces/ciro HTTP/1.1" 500 110 "-" "-" 7 -marquez-api | INFO [2023-02-17 00:32:07,072] marquez.logging.LoggingMdcFilter: status: 200

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:41:12
-
-

*Thread Reply:* the server is returning a 500 ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:41:57
-
-

*Thread Reply:* odd that LoggingMdcFilter is logging 200

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:43:24
-
-

*Thread Reply:* Bit confused because now I realize that postman is returning bad request

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:43:51
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:44:30
-
-

*Thread Reply:* You'll notice that I go to use 3000 in the url -If I use 5000 I get No host

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-17 01:14:50
-
-

*Thread Reply:* odd, the API should be using port 5000, have you followed our quickstart for Marquez?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-17 03:43:29
-
-

*Thread Reply:* Hello Willy -I am starting from scratch followin instruction from https://openlineage.io/docs/getting-started/ -I am on Windows -Instead of -git clone git@github.com:MarquezProject/marquez.git && cd marquez -I run the -git clone

- -

git clone <https://github.com/MarquezProject/marquez.git> -But before I had to clear the auto carriage return in git -git config --global core.autocrlf false -This avoid an error message on marquez-api when running wait-for-it.sh àt line 1 where -#!/usr/bin/env bash -is otherwise read as -#!/usr/bin/env bash\r'

- -

It turns out that when switching off the autocr, this impacts some file containing marquez password ... and I get a fail on accessing the db -to overcome this I run notepad++ and replaced ALL the \r\n with \n -And in this way I managed to run -docker\up.sh and docker\down.sh -correctly (with or without seed ... with access to the db, via pgadmin)

- - - -
- 👍 Ernie Ostic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-20 03:40:48
-
-

*Thread Reply:* The issue is related to PostMan

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-17 03:39:07
-
-

Hi, I'd like to capture column lineage from spark, but also capture how the columns are transformed, and any column operations that are done too. May I ask if this feature is supported currently, or will be supported in future based on current timeline? Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-02-17 03:54:47
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, this is a great question. We included extra fields in OpenLineage spec to contain that information: -"transformationDescription": { - "type": "string", - "description": "a string representation of the transformation applied" -}, -"transformationType": { - "type": "string", - "description": "IDENTITY|MASKED reflects a clearly defined behavior. IDENTITY: exact same as input; MASKED: no original data available (like a hash of PII for example)" -} -so the standard is ready to support it. We included two fields, so that one can contain human readable description of what is happening. However, we don't have this implemented in Spark integration.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-17 04:02:30
-
-

*Thread Reply:* Thanks a lot! That is great. Is there a potential plan in the roadmap to support this for spark?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-02-17 04:08:16
-
-

*Thread Reply:* I think there will be a growing interest in that. In general a dependency may really difficult to express if many Spark operators are used on input columns to produce output one. The simple version would be just to detect indetity operation or some kind of hashing.

- -

To sum up, we don't have yet a proposal on that but this seems to be a natural next step in enriching column lineage features.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-17 04:40:04
-
-

*Thread Reply:* Got it. Thanks! If this item potentially comes on the roadmap, then I'd be happy to work with other interested developers to help contribute! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-02-17 04:43:00
-
-

*Thread Reply:* Great to hear that. What you could perhaps start with, is come to our monthly OpenLineage meetings and ask @Michael Robinson to put this item on discussions' list. There are many strategies to address this issue and hearing your story, usage scenario and would are you trying to achieve, would be super helpful in design and implementation phase.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-17 04:44:18
-
-

*Thread Reply:* Got it! The monthly meeting might be a bit hard for me to attend live, because of the time zone. But I'll try my best to make it to the next one! thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-17 09:46:22
-
-

*Thread Reply:* Thank you for bringing this up, @Anirudh Shrinivason. I’ll add it to the agenda of our next meeting because there might be interest from others in adding this to the roadmap.

- - - -
- 👍 Anirudh Shrinivason -
- -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-17 15:12:57
-
-

Hello -how can I improve the verbosity of the marquez-api? -Regards

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-02-20 02:10:13
-
-

*Thread Reply:* Hi @thebruuu, pls take a look at logging documentation of Dropwizard (https://www.dropwizard.io/en/latest/manual/core.html#logging) - the framework Marquez is implemented in. The logging configuration section is present in marquez.yml .

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-20 03:29:07
-
-

*Thread Reply:* Thank You Pavel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-21 02:23:40
-
-

Hey, can we please schedule a release of OpenLineage? I would like to have the release that includes the feature to capture custom env variables from spark clusters... Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-21 09:12:17
-
-

*Thread Reply:* We generally schedule a release every month, next one will be in the next week - is that okay @Anirudh Shrinivason?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-21 11:38:50
-
-

*Thread Reply:* Yes, there’s one scheduled for next Wednesday, if that suits.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-21 21:45:58
-
-

*Thread Reply:* Okay yeah sure that works. Thanks

- - - -
- 🙌 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-01 10:12:45
-
-

*Thread Reply:* @Anirudh Shrinivason we’re expecting the release to happen today or tomorrow, FYI

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-01 21:22:40
-
-

*Thread Reply:* Awesome thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jingyi Chen - (jingyi@cloudshuttle.com.au) -
-
2023-02-23 23:43:23
-
-

Hello team, we used OpenLineage and Great Expectations integrated. I want to use GE to verify the table in Snowflake. I found that the configuration I added OpenLineage into GE produced this error after running. Could someone please give me some answers? 👀 -File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/great_expectations/validation_operators/validation_operators.py", line 469, in _run_actions - action_result = self.actions[action["name"]].run( - File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/great_expectations/checkpoint/actions.py", line 106, in run - return self._run( - File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/openlineage/common/provider/great_expectations/action.py", line 156, in _run - datasets = self._fetch_datasets_from_sql_source( - File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/openlineage/common/provider/great_expectations/action.py", line 362, in _fetch_datasets_from_sql_source - self._get_sql_table( - File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/openlineage/common/provider/great_expectations/action.py", line 395, in _get_sql_table - if engine.connection_string: -AttributeError: 'Engine' object has no attribute 'connection_string' -'Engine' object has no attribute 'connection_string'

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jingyi Chen - (jingyi@cloudshuttle.com.au) -
-
2023-02-23 23:44:03
-
-

*Thread Reply:* This is my checkponit configuration in GE. -```name: 'openlineagecheckpoint' -configversion: 1.0 -templatename: -modulename: greatexpectations.checkpoint -classname: Checkpoint -runnametemplate: '%Y%m%d-%H%M%S-mycheckpoint' -expectationsuitename: EMAILVALIDATION -batchrequest: -actionlist:

  • name: storevalidationresult -action: - class_name: StoreValidationResultAction
  • name: storeevaluationparams -action: - class_name: StoreEvaluationParametersAction
  • name: updatedatadocs -action: - classname: UpdateDataDocsAction - sitenames: []
  • name: openlineage -action: - classname: OpenLineageValidationAction - modulename: openlineage.common.provider.greatexpectations - openlineagehost: http://localhost:5000 - # openlineageapiKey: 12345 - openlineagenamespace: geexpectations # Replace with your job namespace; we recommend a meaningful namespace like dev or prod, etc. - jobname: gevalidation -evaluationparameters: {} -runtime_configuration: {} -validations:
  • batchrequest: - datasourcename: LANDINGDEV - dataconnectorname: defaultinferreddataconnectorname - dataassetname: 'snowpipe.pii' - dataconnectorquery: - index: -1 -expectationsuitename: EMAILVALIDATION
  • -
- -

profilers: [] -gecloudid: -expectationsuitegecloudid:```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-02-24 11:31:05
-
-

*Thread Reply:* What version of GX are you running? And is this being run directly through GX or through Airflow with the operator?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jingyi Chen - (jingyi@cloudshuttle.com.au) -
-
2023-02-26 20:05:12
-
-

*Thread Reply:* I use the latest version of Great Expectations. This error occurs either directly through Great Expectations or airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-02-27 09:10:00
-
-

*Thread Reply:* I noticed another issue in the latest version as well. Try dropping to GE version great-expectations==0.15.44 for now. That is the latest one that works for me.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-02-27 09:11:34
-
-

*Thread Reply:* You should definitely open an issue here, and you can tag me @denimalpaca in the comment

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jingyi Chen - (jingyi@cloudshuttle.com.au) -
-
2023-02-27 18:07:29
-
-

*Thread Reply:* Thanks Benji, but I still have the same problem after I drop to great-expectations==0.15.44 , this is my requirement file

- -
great_expectations==0.15.44
-sqlalchemy
-psycopg2-binary
-numpy
-pandas
-snowflake-connector-python
-snowflake-sqlalchem
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-02-28 13:34:03
-
-

*Thread Reply:* interesting... I do think this may be a GX issue so let's see if they say anything. I can also cross post this thread to their slack

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Saravanan - (saravanan@athivatech.com) -
-
2023-03-01 00:27:30
-
-

Hello Team, I’m trying to use Open Lineage with AWS Glue and Marquez. Has anyone successfully integrated AWS Workflows/ Glue ETL jobs with Open Lineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-01 11:47:40
-
-

*Thread Reply:* I know I’m responding to an older post - I’m not sure if this would work in your environment? https://aws.amazon.com/blogs/big-data/build-data-lineage-for-data-lakes-using-aws-glue-amazon-neptune-and-spline/ -Are you using AWS Glue with Spark jobs?

-
-
Amazon Web Services
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Saravanan - (saravanan@athivatech.com) -
-
2023-05-02 15:16:14
-
-

*Thread Reply:* This was proposed by our AWS Solution architect but we are not seeing much improvement compared to open lineage. Have you deployed the above solution to prod?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-25 11:30:44
-
-

*Thread Reply:* We are currently in the research phase, so we have not deployed to prod. We have customers with thousands of existing scripts that they don’t want to rewrite to add openlineage libraries - i would imagine that if you are already integrating OpenLineage in your code, the spark listener isn’t an improvement. Our research is on magically getting lineage from existing scripts 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-01 09:42:23
-
-

Hello everyone, I’m opening a vote to release OpenLineage 0.21.0, featuring: -• a new CustomEnvironmentFacetBuilder class and new output visitors AlterTableAddPartitionCommandVisitor and AlterTableSetLocationCommandVisitor in the Spark integration -• a Linux-ARM version of the SQL parser’s native library -• DEBUG logging of events in transports -• bug fixes and more. -Three +1s from committers will authorize an immediate release.

- - - -
- ➕ Maciej Obuchowski, Jakub Dardziński, Benji Lampel, Natalie Zeller, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-01 10:26:22
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated as soon as possible.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nigel Jones - (nigel.l.jones@gmail.com) -
-
2023-03-02 03:52:03
-
-

I’ve got some security related questions/observations. The main site suggests opening an issue to report vulnerabilities etc. I wanted to check if there is a private mailing list/DM channel to just check a few things first? I’m happy to use github issues otherwise. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Moritz E. Beber - (midnighter@posteo.net) -
-
2023-03-02 05:15:55
-
-

*Thread Reply:* GitHub has a new issue template for reporting vulnerabilities, actually. If you use a config that enables this issue template.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-02 10:21:16
-
-

Reminder: our first meetup is one week from today in Providence, RI! You can find the details in the meetup blog post. And if you’re coming, it would be great if you could RSVP. Looking forward to seeing some of you there!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
- 🎉 Kengo Seki -
- -
- 🚀 Kengo Seki -
- -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-02 16:52:50
-
-

@channel -We released OpenLineage 0.21.1, including: -Additions -• Clients: add DEBUG logging of events to transports #1633 by @mobuchowski -• Spark: add CustomEnvironmentFacetBuilder class #1545 by New contributor @Anirudh181001 -• Spark: introduce the new output visitors AlterTableAddPartitionCommandVisitor and AlterTableSetLocationCommandVisitor #1629 by New contributor @nataliezeller1 -• Spark: add column lineage for JDBC relations #1636 by @tnazarew -• SQL: add linux-aarch64 native library to Java SQL parser #1664 by @mobuchowski -Changes -• Airflow: get table database in Athena extractor #1631 by New contributor @rinzool -Removals -• Airflow: remove JobIdMapping and update macros to better support Airflow version 2+ #1645 by @JDarDagran -Thanks to all our contributors! -For the bug fixes and details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.21.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.20.6...0.21.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🎉 Kengo Seki, Harel Shein, Maciej Obuchowski -
- -
- 🚀 Kengo Seki, Harel Shein, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-03-02 19:01:23
-
-

how do you turn off the openlineage listener in airflow 2? for some reason we're seeing a Thread-2 and seeing it fire twice in tasks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-03-02 20:04:19
-
-

*Thread Reply:* Hey @Paul Lee, are you seeing this happen for Async operators?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-03-02 20:06:00
-
-

*Thread Reply:* might be related to this issue https://github.com/OpenLineage/OpenLineage/pull/1601 -that was fixed in 0.20.6

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-03-03 16:15:44
-
-

*Thread Reply:* hmm perhaps.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-03-03 16:15:55
-
-

*Thread Reply:* @Harel Shein if i want to turn off openlineage listener how do i do that? do i just remove the package?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-03-03 16:24:07
-
-

*Thread Reply:* meaning, you don’t want openlineage to collect any information from your Airflow deployment?

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-03-03 16:24:50
-
-

*Thread Reply:* in that case, you could either remove it from your requirements file, or set OPENLINEAGE_DISABLED=True in your Airflow env vars

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-03-06 14:43:56
-
-

*Thread Reply:* removed it from requirements and also the backend key in airflow config. needed both

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-02 20:29:42
-
-

@channel -This month’s OpenLineage TSC meeting is next Thursday, March 9th, at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom. All are welcome! -On the tentative agenda:

- -
  1. Recent release overview
  2. A new consumer
  3. Custom env variable support in Spark
  4. Async operator support in Airflow
  5. JDBC relations support in Spark
  6. Discussion topics: -• New feature idea: column transformations/operations in the Spark integration -• Using namespaces
  7. Open discussion -Notes: https://bit.ly/OLwiki -Is there a topic you think the community should discuss at this or a future meeting? Reply or DM me to add items to the agenda.
  8. -
- - - -
- 🙌 Willy Lulciuc, Paweł Leszczyński, Maciej Obuchowski, alexandre bergere -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-02 21:48:29
-
-

Hi everyone, I noticed that Openlineage is sending each of the events twice for spark. Is this expected? Is there some way to disable this behaviour?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-03-02 23:46:08
-
-

*Thread Reply:* Are you seeing duplicate START events or do you see two events one that is a START and one that is COMPLETE?

- -

OpenLineage's events may send partial information. You should expect to collect all events for a given RunId and merge them together to get the complete events.

- -

In addition, some data sources are really chatty like Delta tables. That may cause you to see many events that look very similar.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-03 00:45:19
-
-

*Thread Reply:* Hmm...I'm seeing 2 start events for the same runnable command

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-03 00:45:27
-
-

*Thread Reply:* And 2 complete

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-03 00:46:08
-
-

*Thread Reply:* I am currently only testing on parquet tables...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-03-03 02:31:28
-
-

*Thread Reply:* One of openlineage assumptions is the ability to merge lineage events in the backend to make client integrations stateless. So, it is possible that Spark can emit multiple events for the same job. However, sometimes it does not make any sense to send or collect some events, which happened to us some time ago with delta. In that case we decided to filter them and created filtering mechanism (https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark/shared/src/main/java/io/openlineage/spark/agent/filters) than can be extended in case of other unwanted events being generated and sent.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-05 22:59:06
-
-

*Thread Reply:* Ahh I see...okay thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Joanes - (djoanes@gmail.com) -
-
2023-03-07 00:05:48
-
-

*Thread Reply:* in general , you should build any event consumer system with at least once semantics. Even if this issue is fixed, there is a possibility of duplicates for other valid scenarios

- - - -
- ➕ Maciej Obuchowski, Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-09 14:10:47
-
-

*Thread Reply:* Hi..I compared some duplicate 'START' events just now, and noticed that they are exactly the same, with the only exception of one of them having an 'environment-properties' field... Could I just quickly check if this is a bug or a feature haha?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-10 01:18:18
-
-

*Thread Reply:* CC: @Paweł Leszczyński ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-08 11:15:48
-
-

@channel -Reminder: this month’s OpenLineage TSC meeting is tomorrow at 10am PT. All are welcome. https://openlineage.slack.com/archives/C01CK9T7HKR/p1677806982084969

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-03-08 15:51:07
-
-

Hi if we have OpenLineage listener configured as a default spark conf, is there an easy way to disable ol for a specific notebook?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-03-08 17:30:44
-
-

*Thread Reply:* if you can set up env variables for particular notebooks, you can set OPENLINEAGE_DISABLED=true

- - - -
- :gratitude_thank_you: Susmitha Anandarao -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-03-10 13:15:41
-
-

Hey all,

- -

I opened a PR (and corresponding issue) to change how naming works in OpenLineage. The idea generally is to move from Naming.md as the end-all-be-all of names for integrations, and towards JSON schemas per integration, with each schema defining very precisely what fields a name and namespace should contain, how they're connected, and how they're validated. Would really appreciate some feedback as this is a pretty big change!

-
- - - - - - - -
-
Labels
- documentation, proposal -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sunil Patil - (spatil@twilio.com) -
-
2023-03-13 17:05:56
-
-

What do i need to do to enable dag level metric capturing for airflow. I followed the instruction to install openlineage 0.21.1 on airflow 2.3.3. When i run a DAG i see metrics related to Task start, success/failure. But i dont see any metrics for Dag success/failure. Do i have to do something to enable DAG execution capturing ?

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sunil Patil - (spatil@twilio.com) -
-
2023-03-13 17:08:53
-
-

*Thread Reply:* is DAG run capturing enabled starting airflow 2.5.1 ? https://github.com/apache/airflow/pull/27113

-
- - - - - - - -
-
Labels
- area:scheduler/executor, type:new-feature -
- -
-
Comments
- 6 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-03-13 17:11:47
-
-

*Thread Reply:* you're right, only the change was included in 2.5.0

- - - -
- 🙏 Sunil Patil -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sunil Patil - (spatil@twilio.com) -
-
2023-03-13 17:43:15
-
-

*Thread Reply:* Thanks Jakub

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-14 15:37:34
-
-

Fresh on the heels of our first-ever in-person event, we’re meeting up again soon at Data Council Austin! Join us on March 30th (the same day as @Julien Le Dem’s talk) at 12:15 pm to discuss the project’s goals and design, meet other members of the data ecosystem, and help shape the future of the spec. For more info, check out the OpenLineage blog. If you haven’t registered for the conference yet, click and use promo code OpenLineage20 for a special rate. Hope to see you there!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
-
-
tickettailor.com
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-03-15 15:11:18
-
-

If someone is using airflow and DAG-docs for lineage, can they export the lineage in, say, OL format?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-03-15 15:18:22
-
-

*Thread Reply:* I don’t see it currently on the AirflowRunFacet, but probably not a big deal to add it? @Benji Lampel wdyt?

-
- - - - - - - - - - - - - - - - -
- - - -
- ❤️ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-03-15 15:22:00
-
-

*Thread Reply:* Definitely could be a good thing to have--is there not some info facet that could hold this data already? I don't see an issue with adding to the AirflowRunFacet tho (full disclosure, I'm not super familiar with this facet)

- - - -
- ❤️ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-03-15 15:58:40
-
-

*Thread Reply:* Perhaps DocumentationJobFacet or DocumentationDatasetFacet?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-03-15 15:13:55
-
-

(is it https://docs.astronomer.io/learn/airflow-openlineage ? )

-
-
docs.astronomer.io
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-03-17 12:31:02
-
-

Happy Friday 👋 I am looking for some help setting the parent information for a dbt run. I have set the namespace variable in the openlineage.yml but doesn't seem to take effect and ends up using the default value of dbt. Also using openlineage.yml to set the transport properties for emitting to kafka. Is there a way to set parent namespace, name and run id in the yml file? Thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-03-18 12:09:23
-
-

*Thread Reply:* dbt-ol does not read from openlineage.yml so you need to pass this information in OPENLINEAGE_NAMESPACE environment variable

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-03-20 15:17:03
-
-

*Thread Reply:* Hmmm. Interesting! I thought that it used client = OpenLineageClient.from_environment(), I’ll do some testing with Kafka backends.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-03-20 15:22:07
-
-

*Thread Reply:* Thank you for the hint. I was able to make it work with specifying the env OPENLINEAGE_CONFIGto specify the yml file holding transport info and OPENLINEAGE_NAMESPACE

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-03-20 15:24:05
-
-

*Thread Reply:* Awesome! That’s exactly what I was going to test.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-03-20 15:25:04
-
-

*Thread Reply:* I think it also works if you put it in $HOME/.openlineage/openlineage.yml.

- - - -
- :gratitude_thank_you: Susmitha Anandarao -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-03-21 08:32:17
-
-

*Thread Reply:* @Susmitha Anandarao I might have provided misleading information. I meant that dbt-ol does not read OL namespace from openlineage.yml but from OPENLINEAGE_NAMESPACE env var instead

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-21 13:48:28
-
-

Data Council Austin, the host of our next meetup, is one week away: https://openlineage.slack.com/archives/C01CK9T7HKR/p1678822654288379

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-21 13:52:52
-
-

In addition to Data Council Austin next week, the hybrid Big Data Technology Warsaw Summit will be taking place on March 28th-30th, featuring three of our committers: @Maciej Obuchowski, @Paweł Leszczyński and @Ross Turk ! There’s more info here: https://bigdatatechwarsaw.eu/

-
-
Big Data Technology Warsaw Summit
- - - - - - -
-
Estimated reading time
- 6 minutes -
- - - - - - - - - - - - -
- - - -
- 🙌 Howard Yoo, Maciej Obuchowski, Jakub Dardziński, Ross Turk, Perttu Salonen -
- -
- 👍 thebruuu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-03-22 22:38:26
-
-

hey folks, is anyone capturing dataset metadata for multi-table schemas? I'm looking at the schema dataset facet: https://openlineage.io/docs/spec/facets/dataset-facets/schema but it looks like this only represents a single table so im wondering if I'll need to write a custom facet

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-03-23 04:25:19
-
-

*Thread Reply:* It should be represented by multiple datasets, unless I misunderstood what you mean by multi-table

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-03-23 10:55:58
-
-

*Thread Reply:* here at Fivetran when we sync data it is generally 1 schema with multiple tables (sometimes many) so we would want to represent all of that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-03-23 11:11:25
-
-

*Thread Reply:* So what I understand:

- -
  1. your single job represents synchronization of multiple tables
  2. you want to have precise input-output dataset lineage? -am I right?
  3. -
- -

I would model that as multiple OL jobs that describe each dataset mappings. Additionally, I'd have one "wrapping" job that represents your definition of a job. Rest of those jobs would refer to it in ParentRunFacet.

- -

This is a pattern we use for Airflow and dbt dags.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-03-23 12:57:15
-
-

*Thread Reply:* Yes your statements are correct. Thanks for sharing that model, that makes sense to me

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-03-24 15:56:27
-
-

has anyone had success creating custom facets using java? I'm following this guide: https://openlineage.io/docs/spec/facets/custom-facets and im wondering if it makes sense to manually create POJOs or if others are creating the json schema for the object and then automatically generating the java code?

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-03-27 05:26:06
-
-

*Thread Reply:* I think it's better to just create POJO. This is what we do in Spark integration, for example.

- -

For now, JSON Schema generator isn't flexible enough to generate custom facets from whatever schema we give it, so it would be unnecessary complexity

-
- - - - - - - - - - - - - - - - -
- - - -
- :gratitude_thank_you: Brad Paskewitz -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-03-27 12:29:57
-
-

*Thread Reply:* Agreed, just a POJO would work. This is using Jackson, so you would use annotations as needed. You can also use a Jackson JSONNode or even Map.

- - - -
- :gratitude_thank_you: Brad Paskewitz -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-03-27 14:01:07
-
-

One other question: I'm in the process of adding different types of facets to our base payloads and I'm wondering if we have any related guidelines / best practices / standards / conventions. For example if I add a full source schema as a schema dataset facet to every start event it seems like that could be inefficient compared to a 1-time full-source-schema followed by incremental diffs for each following sync. Curious how others are thinking about + solving these types of problems in practice

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-03-27 17:59:28
-
-

*Thread Reply:* That depends on the OL consumer, but for something like SchemaDatasetFacet it seems to be okay to assume schema stays the same if not send.

- -

For others, like OutputStatisticsOutputDatasetFacet you definitely can't assume that, as the data is unique to each run.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-03-27 19:05:14
-
-

*Thread Reply:* ok great thanks, that makes sense to me

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Saravanan - (saravanan@athivatech.com) -
-
2023-03-27 21:42:20
-
-

Hi Team, I’m seeing creating data source, dataset API’s marked as deprecated . Can anyone point me how to create datasets via API calls?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-03-28 04:47:31
-
-

*Thread Reply:* OpenLineage API: https://openlineage.io/docs/getting-started/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-28 06:08:18
-
-

Hi everyone, I recently encountered this error saying V2SessionCatalog is not supported by openlineage. May I ask if support for this will be added in near future? Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-03-28 08:05:30
-
-

*Thread Reply:* I think it would be great to support V2SessionCatalog, and it would very much help if you created GitHub issue with more explanation and examples of it's use.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-29 02:53:37
-
-

*Thread Reply:* Sure thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-29 05:34:37
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/1747 -I have opened an issue here. Thanks! 🙂

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-17 11:53:52
-
-

*Thread Reply:* Hi @Maciej Obuchowski Just curious, is this issue on the potential roadmap for the next Openlineage release?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-02 19:37:27
-
-

Hi all! Can anyone provide me some advice on how to solve this error: -ValueError: `emit` only accepts RunEvent class -[2023-04-02, 23:22:00 UTC] {taskinstance.py:1326} INFO - Marking task as FAILED. dag_id=etl_openlineage, task_id=send_ol_events, execution_date=20230402T232112, start_date=20230402T232114, end_date=20230402T232200 -[2023-04-02, 23:22:00 UTC] {standard_task_runner.py:105} ERROR - Failed to execute job 400 for task send_ol_events (`emit` only accepts RunEvent class; 28020) -[2023-04-02, 23:22:00 UTC] {local_task_job.py:212} INFO - Task exited with return code 1 -[2023-04-02, 23:22:00 UTC] {taskinstance.py:2585} INFO - 0 downstream tasks scheduled from follow-on schedule check -I'm trying to follow this tutorial (https://openlineage.io/blog/openlineage-snowflake/) on connecting Snowflake to OpenLineage through Apache Airflow, however, the last step (sending the OpenLineage events) returns an error.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-03 09:32:46
-
-

*Thread Reply:* The blog post is a bit old and in the meantime there were changes in OpenLineage Python Client introduced. -May I ask if you want just to test the flow or looking for any viable Snowflake data lineage solution?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-04-03 10:47:57
-
-

*Thread Reply:* I believe that this will work if you change the line to client.transport.emit()

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-04-03 10:49:05
-
-

*Thread Reply:* (this would be in the dags/lineage folder, if memory serves)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-03 10:57:23
-
-

*Thread Reply:* Ross is right, that should work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-04 12:23:13
-
-

*Thread Reply:* This works! Thank you so much!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-04 12:24:40
-
-

*Thread Reply:* @Jakub Dardziński I want to use a viable Snowflake data lineage solution alongside a Amazon DataZone Catalog 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-04-04 13:03:58
-
-

*Thread Reply:* I have been meaning to revisit that tutorial 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-03 10:52:42
-
-

Hello all, -I’d like to open a vote to release OpenLineage 0.22.0, including: -• a new properties facet in the Spark integration -• a new field in HttpConfig for passing custom headers in the Spark integration -• improved namespace generation for JDBC connections in the Spark integration -• removal of unnecessary warnings about column lineage in the Spark integration -• support for alter, truncate, and drop statements in the SQL parser -• typing hints in the SQL integration -• a new from_dict class method in the Python client to support creating it from a dictionary -• a case-insensitive env variable for disabling OpenLineage in the Python client and Airflow integration -• bug fixes, docs changes, and more. -Three +1s from committers will authorize an immediate release. For more details about the release process, see GOVERNANCE.md.

- - - -
- ➕ Maciej Obuchowski, Perttu Salonen, Jakub Dardziński, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-03 15:39:46
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated within 48 hours.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-03 16:55:44
-
-

@channel -We released OpenLineage 0.22.0, including: -Additions: -• Spark: add properties facet #1717 by @tnazarew -• SQL: SQLParser supports alter, truncate and drop statements #1695 by @pawel-big-lebowski -• Common/SQL: provide public interface for openlineage_sql package #1727 by @JDarDagran -• Java client: add configurable headers to HTTP transport #1718 by @tnazarew -• Python client: create client from dictionary #1745 by @JDarDagran -Changes: -• Spark: remove URL parameters for JDBC namespaces #1708 by @tnazarew -• Make OPENLINEAGE_DISABLED case-insensitive #1705 by @jedcunningham -Removals: -• Spark: remove unnecessary warnings for column lineage #1700 by @pawel-big-lebowski -• Spark: remove deprecated configs #1711 by @tnazarew -Thanks to all the contributors! -For the bug fixes and details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.22.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.21.1...0.22.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Jakub Dardziński, Francis McGregor-Macdonald, Howard Yoo, 김형은, Kengo Seki, Anirudh Shrinivason, Perttu Salonen, Paweł Leszczyński, Maciej Obuchowski, Harel Shein -
- -
- 🎉 Ross Turk, 김형은, Kengo Seki, Anirudh Shrinivason, Perttu Salonen -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-04 01:49:37
-
-

Hi everyone, if I set executors to 0, and bind address to localhost, and then if I want to use openlineage to capture metadata, I seem to run into an error where the executor tries to fetch the spark jar from the driver, even though there is no executor set. Then, it fails because a connection cannot be established. This is some of the error stack trace: -INFO Executor: Fetching spark://&lt;DRIVER_IP&gt;:44541/jars/io.openlineage_openlineage-spark-0.21.1.jar with timestamp 1680506544239 -ERROR Utils: Aborting task -java.io.IOException: Failed to connect to /&lt;DRIVER_IP&gt;:44541 - at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:287) - at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218) - at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230) - at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:399) - at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:367) - at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) - at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473) - at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:366) - at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:755) - at org.apache.spark.util.Utils$.fetchFile(Utils.scala:541) - at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:953) - at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:945) - at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) - at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) - at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) - at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) - at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) - at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) - at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876) - at <a href="http://org.apache.spark.executor.Executor.org">org.apache.spark.executor.Executor.org</a>$apache$spark$executor$Executor$$updateDependencies(Executor.scala:945) - at org.apache.spark.executor.Executor.&lt;init&gt;(Executor.scala:247) - at org.apache.spark.scheduler.local.LocalEndpoint.&lt;init&gt;(LocalSchedulerBackend.scala:64) - at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132) - at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220) - at org.apache.spark.SparkContext.&lt;init&gt;(SparkContext.scala:579) - at org.apache.spark.api.java.JavaSparkContext.&lt;init&gt;(JavaSparkContext.scala:58) - at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) - at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) - at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) - at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source) - at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) - at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) - at py4j.Gateway.invoke(Gateway.java:238) - at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) - at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) - at py4j.GatewayConnection.run(GatewayConnection.java:238) - at java.base/java.lang.Thread.run(Unknown Source) -Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /&lt;DRIVER_IP&gt;:44541 -Caused by: java.net.ConnectException: Connection refused - at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) - at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) - at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) - at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) - at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702) - at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) - at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) - at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) - at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) - at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) - at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) - at java.base/java.lang.Thread.run(Unknown Source) -Just curious if anyone here has run into a similar problem before, and what the recommended way to resolve this would be...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-04 13:39:19
-
-

*Thread Reply:* Do you have small configuration and job to replicate this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-04 22:21:35
-
-

*Thread Reply:* Yeah. For configs: -spark.driver.bindAddress: "localhost" -spark.master: "local[**]" -spark.sql.catalogImplementation: "hive" - spark.openlineage.transport.endpoint: "&lt;endpoint&gt;" - spark.openlineage.transport.type: "http" - spark.sql.catalog.spark_catalog: "org.apache.spark.sql.delta.catalog.DeltaCatalog" - spark.openlineage.transport.url: "&lt;url&gt;" - spark.extraListeners: "io.openlineage.spark.agent.OpenLineageSparkListener" - and job is submitted via spark submit in client mode with number of executors set to 0. -The spark job by itself could be anything...I think the job fails before initializing the spark session itself.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-04 22:23:19
-
-

*Thread Reply:* The issue is because of the spark.jars.packages config... spark.jars config also runs into the same issue. Because the executor tries to fetch the jar from driver for some reason even though there is no executors set...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-05 05:38:55
-
-

*Thread Reply:* TBH I'm not sure if we can do anything about it. Seems like just having any SparkListener which is not in Spark jars would fall under the same problems, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-10 06:07:11
-
-

*Thread Reply:* Yeah... Actually, this was because of binding the driver ip to localhost. In that case, the executor was not able to get the jar from the driver. But yeah I don't think we could have done anything from openlienage end anyway for this. Was just an interesting error to encounter lol

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lq Dodo - (tryopenmetadata@gmail.com) -
-
2023-04-04 12:07:21
-
-

Hi, I am new to open lineage. I was able to follow https://openlineage.io/getting-started/ to create a lineage "my-input-->my-job-->my-output". I want to use "my-output" as an input dataset, and connect to the next job, thing like this "my-input-->my-job-->my-output-->my-job2-->my-final-output". How to do it? I have trouble to set eventType and runId, etc. Once the new lineages get massed up, the Marquez UI becomes blank (which is a separated issue).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-04-04 13:02:21
-
-

*Thread Reply:* In this case you would have four runevents:

- -
  1. a START event on my-job where my-input is the input and my-output is the output, with a runId you generate on the client
  2. a COMPLETE event on my-job with the same runId from #1
  3. a START event on my-job2 where the input is my-output and the output is my-final-output, with a separate runId you generate
  4. a COMPLETE event on my-job2 with the same runId from #3
  5. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lq Dodo - (tryopenmetadata@gmail.com) -
-
2023-04-04 14:53:14
-
-

*Thread Reply:* thanks for the response. I tried it but now the UI only shows like one second and then turn to blank. I has similar issue before. It seems to me every time when I added a bad lineage, the UI stops working. I have to delete the docker image:-( Not sure whether it is MacOS M1 related issue.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-04-04 16:07:06
-
-

*Thread Reply:* Hmmm, that's interesting. Not sure I've seen that before. If you happen to catch it in that state again, perhaps capture the contents of the lineage_events table so it can be replicated.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lq Dodo - (tryopenmetadata@gmail.com) -
-
2023-04-04 16:24:28
-
-

*Thread Reply:* I can fairly easy to reproduce this blank UI issue. Apparently I used the same runId for two different jobs. If I use different unId (which I should), the lineage displays correctly. Thanks again!

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Lq Dodo - (tryopenmetadata@gmail.com) -
-
2023-04-04 16:41:54
-
-

Is it possible to add column level lineage via api? Let's say I have fields A,B,C from my-input, and A,B from my-output, and B,C from my-output-s3. I want to see, filter, or query by the column name.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-05 05:35:02
-
-

*Thread Reply:* You can add https://openlineage.io/docs/spec/facets/dataset-facets/column_lineage_facet/ to your datasets.

- -

However, I don't think you can currently do any filtering over it

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-04-05 13:20:20
-
-

*Thread Reply:* you can see a good example here, @Lq Dodo: https://github.com/MarquezProject/marquez/blob/289fa3eef967c8f7915b074325bb6f8f55480030/docker/metadata.json#L430

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lq Dodo - (tryopenmetadata@gmail.com) -
-
2023-04-06 11:48:48
-
-

*Thread Reply:* those examples really help. I can at least build the lineage with column level info using the apis. thanks a lot! Ideally I'd like select one column from the UI and then show me the column level graph. Seems not possible.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-04-06 12:46:54
-
-

*Thread Reply:* correct, right now there isn't column-level metadata on the lineage graph 😞

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Pavani - (ylpavani@gmail.com) -
-
2023-04-05 22:01:33
-
-

Is airflow mandatory, while integrating snowflake with openlineage?

- -

I am currently looking for a solution which can capture lineage details from snowflake execution

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-04-06 10:22:17
-
-

*Thread Reply:* something needs to trigger lineage collection, are you using some sort of scheduler / execution engine?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Pavani - (ylpavani@gmail.com) -
-
2023-04-06 11:26:13
-
-

*Thread Reply:* Nope... We currently don't have scheduling tool. Isn't it possible to use open lineage api and collect the details?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-06 13:12:44
-
-

@channel -This month’s OpenLineage TSC meeting is on Thursday, April 20th, at 10 am PT. Meeting info: https://openlineage.io/meetings/. All are welcome! -On the tentative agenda:

- -
  1. Announcements
  2. Updates (new!) -a. OpenLineage in Airflow AIP -b. Static lineage support -c. Reworking namespaces
  3. Recent release overview
  4. A new consumer
  5. Caching support for column lineage
  6. Discussion items -a. Snowflake tagging
  7. Open discussion -Notes: https://bit.ly/OLwiki -Is there a topic you think the community should discuss at this or a future meeting? Reply or DM me to add items to the agenda.
  8. -
-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🚀 alexandre bergere, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-06 15:27:41
-
-

Hi!

- -

I have a specific question about how OpenLineage fits in between Amazon MWAA and Marquez on AWS EKS. I guess I need to change for example the etl_openlineage DAG in this Snowflake integration tutorial and the OPENLINEAGE_URL here. However, I'm wondering how to reproduce the Docker containers airflow, airflow_scheduler, and airflow_worker here.

- -

I heard from @Ross Turk that @Willy Lulciuc and @Michael Collado are experts on the K8s integration for OpenLineage and Marquez. Could you provide me some recommendations on how to approach this integration? Or can anyone else help me?

- -

Kind regards,

- -

Tom

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-07 12:47:18
-
-

[RESOLVED]👋 Hi there, I’m doing a POC of OpenLineage for our airflow deployment. We have a ton of custom operators and I’m trying to test out extracting lineage using the get_openlineage_facets_on_start method. Currently when I’m testing I can see that the OpenLineage plugin is running via airflow plugins but am not able to see that the method is ever getting called. Do I need to do anything else to tell the default extractor to use get_openlineage_facets_on_start? This is the documentation I’m referencing: https://openlineage.io/docs/integrations/airflow/extractors/default-extractors

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-07 12:50:14
-
-

*Thread Reply:* E.g. do I need to update my custom operators to inherit from DefaultExtractor?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-07 13:18:05
-
-

*Thread Reply:* FWIW, I can tell some level of connectivity to my Marquez deployment is working since I can see it created the default namespace I defined in my OPENLINEAGE_NAMESPACE env var.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-07 18:37:44
-
-

*Thread Reply:* hey John, it is enough to add the method to your custom operator. Perhaps something breaks inside the method. Did anything show up in the logs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-07 19:03:01
-
-

*Thread Reply:* That’s the strange part. I’m not seeing anything to suggest that the method is ever getting called. I’m also expecting that the listener created by the plugin should at least be calling this log line when the task runs. However, I’m not seeing that either. I’m able to verify the plugin is registered using airflow plugins and have debug level logging enabled via AIRFLOW__LOGGING__LOGGING_LEVEL='DEBUG'. This is the output of airflow plugins

- -

name | macros | listeners | source -==================+================================================+==============================+================================================= -OpenLineagePlugin | openlineage.airflow.macros.lineage_run_id,open | openlineage.airflow.listener | openlineage-airflow==0.22.0: - | lineage.airflow.macros.lineage_parent_id | | EntryPoint(name='OpenLineagePlugin', - | | | value='openlineage.airflow.plugin:OpenLineagePlu - | | | gin', group='airflow.plugins') -Appreciate any ideas you might have!

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-11 13:09:05
-
-

*Thread Reply:* Figured this out. Just needed to run the airflow scheduler and trigger tasks through the DAGs vs. airflow tasks test …

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-04-07 16:29:03
-
-

I have a question that I believe will be very easy to answer, and I think I know the answer already, but I want to confirm my understanding of extracting OpenLineage with airflow python scripts.

- -

Extractors extract lineage from operators, so they have to be using operators, right? If someone asks if I can get lineage from their Airflow-orchestrated python scripts, and they show me their scripts but they’re not importing anything starting with airflow.operators, then I can’t use extractors and therefore can’t get lineage. Is that accurate?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-04-07 16:30:00
-
-

*Thread Reply:* (they are importing dagkit sdk stuff like Job, JobContext, ExecutionContext, and NodeContext.)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-07 18:40:39
-
-

*Thread Reply:* Do they run those scripts in PythonOperator? If so, they should receive some events but with no datasets extracted

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-04-07 21:28:25
-
-

*Thread Reply:* How can I know that? Would it be in the scripts or the airflow configuration or...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-04-08 07:13:56
-
-

*Thread Reply:* And "with no datasets extracted" that means I wouldn't have the schema of the input and output datasets? (I need the db/schema/table/column names for my purposes)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-11 02:49:07
-
-

*Thread Reply:* That really depends what is the current code but in general any custom code in Airflow does not extract any extra information, especially datasets. One can write their own extractors (more in the docs)

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-04-12 16:52:04
-
-

*Thread Reply:* Thanks! This is very helpful. Exactly what I needed.

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tushar Jain - (tujain@ivp.in) -
-
2023-04-09 12:48:04
-
-

Hi. I was exploring OpenLineage and I want to know does OpenLineage integrate with MS-SQL (Microsoft SQL Server) ? If yes, how to generate OpenLineage events for MS-SQL Views/Tables/Queries?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-12 02:30:19
-
-

*Thread Reply:* Currently there's no extractor implemented for MS-SQL. We try to update list of supported databases here: https://openlineage.io/docs/integrations/about/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-10 12:00:03
-
-

@channel -Save the date: the next OpenLineage meetup will be in New York on April 26th! More info is coming soon…

- - - -
- ✅ Sheeri Cabral (Collibra), Ross Turk, Minkyu Park -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-10 19:00:38
-
-

@channel -Due to many TSC members being on vacation this week, this month’s TSC meeting will be moved to next Thursday, April 20th. All are welcome! https://openlineage.slack.com/archives/C01CK9T7HKR/p1680801164289949

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-11 13:42:03
-
-

Hi everyone!

- -

I'm so sorry for all the messages but I'm trying to get Snowflake, OpenLineage and Marquez working for days now. Hopefully, this is my last question. -The snowflake.connector import connect package seems to be outdated here in extract_openlineage.py and is not working for airflow. Does anyone know how to rewrite this code (e.g., with SnowflakeOperator ) and extract the openlineage access history? You'd be my absolute hero!!!

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-11 17:05:35
-
-

*Thread Reply:* > The snowflake.connector import connect package seems to be outdated here in extract_openlineage.py and is not working for airflow. -What's the error?

- -

> Does anyone know how to rewrite this code (e.g., with SnowflakeOperator ) -Current extractor for SnowflakeOperator extracts lineage for SQL executed in the task, in contrast to the method above with OPENLINEAGE_ACCESS_HISTORY view

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-11 18:13:49
-
-

*Thread Reply:* Hi Maciej!Thank you so much for the reply! I managed to generate a working combination on Windows between the airflow example in the marquez git and the snowflake openlineage git. The only error I still get is: -****** Log file does not exist: /opt/bitnami/airflow/logs/dag_id=etl_openlineage/run_id=manual__2023-04-10T14:12:53.764783+00:00/task_id=send_ol_events/attempt=1.log -****** Fetching from: <http://1c8bb4a78f14:8793/log/dag_id=etl_openlineage/run_id=manual__2023-04-10T14:12:53.764783+00:00/task_id=send_ol_events/attempt=1.log> -****** !!!! Please make sure that all your Airflow components (e.g. schedulers, webservers and workers) have the same 'secret_key' configured in 'webserver' section and time is synchronized on all your machines (for example with ntpd) !!!!! -************ See more at <https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#secret-key> -************ Failed to fetch log file from worker. Client error '403 FORBIDDEN' for url '<http://1c8bb4a78f14:8793/log/dag_id=etl_openlineage/run_id=manual__2023-04-10T14:12:53.764783+00:00/task_id=send_ol_events/attempt=1.log>' -For more information check: <https://httpstatuses.com/403> -This one doesn't make sense to me. I found a workaround for the ETL examples in the OpenLineage git by manually creating a Snowflake connector in Airflow, however, the error is still present for the extract_openlineage.py file. I noticed this file is the only one that uses snowflake.connector import connect and not airflow.providers.snowflake.operators.snowflake import SnowflakeOperator like the other ETL Dags.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-12 05:35:41
-
-

*Thread Reply:* I think it's Airflow error related to getting logs from worker

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-12 05:36:07
-
-

*Thread Reply:* snowflake.connector is a Snowflake connector library that SnowflakeOperator uses underneath to connect to Snowflake

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-12 10:15:21
-
-

*Thread Reply:* Ah alright! Thanks for pointing that out! 🙂 Do you know how to solve it? Or do you have any recommendations on how to look for the solution?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-12 10:19:53
-
-

*Thread Reply:* I have no experience with Windows, and I think it's the issue: https://github.com/apache/airflow/issues/10388

- -

I would try running it in Docker TBH

-
- - - - - - - -
-
Labels
- kind:feature -
- -
-
Comments
- 22 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-12 11:47:41
-
-

*Thread Reply:* Yeah I was running Airflow in Docker but this didn't work. I'll try to use my Macbook for now because I don't think there is a solution for this in the short time. Thank you so much for the support though!!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Peter Hanssens - (peter@cloudshuttle.com.au) -
-
2023-04-13 04:55:41
-
-

Hi All, -My team and I have been building a status page based on open lineage and I did a talk about it… keen for feedback and thoughts: -https://youtu.be/nGh5_j3hXrE

-
-
YouTube
- -
- - - } - - DataEngAU - (https://www.youtube.com/@DataEngAU) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-13 11:19:57
-
-

*Thread Reply:* Very interesting!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-04-13 13:28:53
-
-

*Thread Reply:* that’s awesome 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2023-04-13 08:22:50
-
-

Hi Peter. Looks good. I like the way you introduced the premise of, and benefits of, using OpenLineage for your project. Have you also explored other integrations in addition to dbt?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Peter Hanssens - (peter@cloudshuttle.com.au) -
-
2023-04-13 08:36:01
-
-

*Thread Reply:* Thanks Ernie, I’m looking at Airflow as well as GE and would like to contribute back to the project as well… we’re close to getting a public preview release of our product done and then we want to help build out open lineage

- - - -
- ❤️ Julien Le Dem, Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-13 14:08:38
-
-

[Resolved] Has anyone seen this error before where the openlineage-airflow plugin / listener fails to deepcopy the task instance? I’m using the native airflow DAG / BashOperator objects to do a basic test of static lineage tagging. More details in 🧵

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-13 14:10:08
-
-

*Thread Reply:* The dag is basically just: -```dag = DAG( - dagid="asanaexampledag", - defaultargs=defaultargs, - scheduleinterval=None, -)

- -

samplelineagetask = BashOperator( - taskid="samplelineagetask", - bashcommand='echo $OPENLINEAGEURL', - dag=dag, - inlets=[Table(database="redshift", cluster="someschema", name="someinputtable")], - outlets=[Table(database="redshift", cluster="someotherschema", name="someoutputtable")] -)```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-13 14:11:02
-
-

*Thread Reply:* This is the error I’m getting, seems to be coming from this line: -[2023-04-13, 17:45:33 UTC] {logging_mixin.py:115} WARNING - Exception in thread Thread-1: -Traceback (most recent call last): - File "/opt/conda/lib/python3.7/threading.py", line 926, in _bootstrap_inner - self.run() - File "/opt/conda/lib/python3.7/threading.py", line 870, in run - self._target(**self._args, ****self._kwargs) - File "/opt/conda/lib/python3.7/site-packages/openlineage/airflow/listener.py", line 89, in on_running - task_instance_copy = copy.deepcopy(task_instance) - File "/opt/conda/lib/python3.7/copy.py", line 180, in deepcopy - y = _reconstruct(x, memo, **rv) - File "/opt/conda/lib/python3.7/copy.py", line 281, in _reconstruct - state = deepcopy(state, memo) - File "/opt/conda/lib/python3.7/copy.py", line 150, in deepcopy - y = copier(x, memo) - File "/opt/conda/lib/python3.7/copy.py", line 241, in _deepcopy_dict - y[deepcopy(key, memo)] = deepcopy(value, memo) - File "/opt/conda/lib/python3.7/copy.py", line 161, in deepcopy - y = copier(memo) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1156, in __deepcopy__ - setattr(result, k, copy.deepcopy(v, memo)) - File "/opt/conda/lib/python3.7/copy.py", line 150, in deepcopy - y = copier(x, memo) - File "/opt/conda/lib/python3.7/copy.py", line 241, in _deepcopy_dict - y[deepcopy(key, memo)] = deepcopy(value, memo) - File "/opt/conda/lib/python3.7/copy.py", line 161, in deepcopy - y = copier(memo) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/dag.py", line 1941, in __deepcopy__ - setattr(result, k, copy.deepcopy(v, memo)) - File "/opt/conda/lib/python3.7/copy.py", line 150, in deepcopy - y = copier(x, memo) - File "/opt/conda/lib/python3.7/copy.py", line 241, in _deepcopy_dict - y[deepcopy(key, memo)] = deepcopy(value, memo) - File "/opt/conda/lib/python3.7/copy.py", line 161, in deepcopy - y = copier(memo) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1156, in __deepcopy__ - setattr(result, k, copy.deepcopy(v, memo)) - File "/opt/conda/lib/python3.7/copy.py", line 180, in deepcopy - y = _reconstruct(x, memo, **rv) - File "/opt/conda/lib/python3.7/copy.py", line 281, in _reconstruct - state = deepcopy(state, memo) - File "/opt/conda/lib/python3.7/copy.py", line 150, in deepcopy - y = copier(x, memo) - File "/opt/conda/lib/python3.7/copy.py", line 241, in _deepcopy_dict - y[deepcopy(key, memo)] = deepcopy(value, memo) - File "/opt/conda/lib/python3.7/copy.py", line 150, in deepcopy - y = copier(x, memo) - File "/opt/conda/lib/python3.7/copy.py", line 241, in _deepcopy_dict - y[deepcopy(key, memo)] = deepcopy(value, memo) - File "/opt/conda/lib/python3.7/copy.py", line 161, in deepcopy - y = copier(memo) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1156, in __deepcopy__ - setattr(result, k, copy.deepcopy(v, memo)) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1000, in __setattr__ - self.set_xcomargs_dependencies() - File "/opt/conda/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1107, in set_xcomargs_dependencies - XComArg.apply_upstream_relationship(self, arg) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/xcom_arg.py", line 186, in apply_upstream_relationship - op.set_upstream(ref.operator) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/taskmixin.py", line 241, in set_upstream - self._set_relatives(task_or_task_list, upstream=True, edge_modifier=edge_modifier) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/taskmixin.py", line 185, in _set_relatives - dags: Set["DAG"] = {task.dag for task in [**self.roots, **task_list] if task.has_dag() and task.dag} - File "/opt/conda/lib/python3.7/site-packages/airflow/models/taskmixin.py", line 185, in &lt;setcomp&gt; - dags: Set["DAG"] = {task.dag for task in [**self.roots, **task_list] if task.has_dag() and task.dag} - File "/opt/conda/lib/python3.7/site-packages/airflow/models/dag.py", line 508, in __hash__ - val = tuple(self.task_dict.keys()) -AttributeError: 'DAG' object has no attribute 'task_dict'

-
- - - - - - - - - - - - - - - - -
- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-13 14:12:11
-
-

*Thread Reply:* This is with Airflow 2.3.2 and openlineage-airflow 0.22.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-13 14:13:34
-
-

*Thread Reply:* Seems like it might be some issue like this with a circular structure? https://stackoverflow.com/questions/46283738/attributeerror-when-using-python-deepcopy

-
-
Stack Overflow
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-14 08:44:36
-
-

*Thread Reply:* Just by quick look at it, it will definitely be fixed with Airflow 2.6, as it won't need to deepcopy anything.

- - - -
- 👍 John Lukenoff -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-14 08:47:16
-
-

*Thread Reply:* I can't seem to reproduce the issue. I ran following example DAG with same Airflow and OL versions as yours: -```import datetime

- -

from airflow.lineage.entities import Table -from airflow.models import DAG -from airflow.operators.bash import BashOperator

- -

defaultargs = { - "startdate": datetime.datetime.now() -}

- -

dag = DAG( - dagid="asanaexampledag", - defaultargs=defaultargs, - scheduleinterval=None, -)

- -

samplelineagetask = BashOperator( - taskid="samplelineagetask", - bashcommand='echo $OPENLINEAGEURL', - dag=dag, - inlets=[Table(database="redshift", cluster="someschema", name="someinputtable")], - outlets=[Table(database="redshift", cluster="someotherschema", name="someoutputtable")] -)```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-14 08:53:48
-
-

*Thread Reply:* is there any extra configuration you made possibly?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-14 13:02:40
-
-

*Thread Reply:* @John Lukenoff, I was finally able to reproduce this when passing xcom as task.output -looks like this was reported here and solved by this PR (not sure if this was released in 2.3.3 or later)

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-14 13:06:59
-
-

*Thread Reply:* Ah interesting. Let me see if bumping my Airflow version resolves this. Haven’t had a chance to tinker with it much since yesterday.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-14 13:13:21
-
-

*Thread Reply:* I ran it against 2.4 and same dag works

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-14 13:15:35
-
-

*Thread Reply:* 👍 Looks like a fix for that issue was rolled out in 2.3.3. I’m gonna try that for now (my company has a notoriously difficult time with airflow major version updates 😅)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-14 13:17:06
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-17 12:29:09
-
-

*Thread Reply:* Got this working! We just monkey patched the __deepcopy__ method of the BaseOperator for now until we can get bandwidth for an airflow upgrade. Thanks for the help here!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-17 03:45:47
-
-

Hi everyone, I am facing this null pointer error: -ERROR AsyncEventQueue: Listener OpenLineageSparkListener threw an exception -java.lang.NullPointerException -java.base/java.util.concurrent.ConcurrentHashMap.putVal(Unknown Source) -java.base/java.util.concurrent.ConcurrentHashMap.put(Unknown Source) -io.openlineage.spark.agent.JobMetricsHolder.addMetrics(JobMetricsHolder.java:40) -io.openlineage.spark.agent.OpenLineageSparkListener.onTaskEnd(OpenLineageSparkListener.java:179) -org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45) -org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) -org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) -org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) -org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) -org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) -org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) -org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) -scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) -scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) -<a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) -org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) -org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1381) -org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) -Could I get some help on this pls 🙇

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-17 03:56:30
-
-

*Thread Reply:* This is the spark submit command: -spark-submit --py-files /usr/local/lib/common_utils.zip,/usr/local/lib/team_utils.zip,/usr/local/lib/project_utils.zip - --conf spark.executor.cores=16 - --conf spark.hadoop.fs.s3a.connection.maximum=100 --conf spark.sql.shuffle.partitions=1000 - --conf spark.speculation=true --conf spark.sql.adaptive.advisoryPartitionSizeInBytes=256MB - --conf spark.hadoop.fs.s3a.multiobjectdelete.enable=false --conf spark.memory.fraction=0.7 --conf spark.kubernetes.executor.label.experiment=some_label --conf spark.kubernetes.executor.label.team=team_name --conf spark.driver.memory=26112m --conf <a href="http://spark.kubernetes.executor.label.app.kubernetes.io/managed-by=pipeline_name">spark.kubernetes.executor.label.app.kubernetes.io/managed-by=pipeline_name</a> --conf spark.kubernetes.executor.label.instance-type=4xlarge --conf spark.executor.instances=10 --conf spark.kubernetes.executor.label.env=prd --conf spark.kubernetes.executor.label.job-name=job_name --conf spark.kubernetes.executor.label.owner=owner --conf spark.kubernetes.executor.label.pipeline=pipeline --conf spark.kubernetes.executor.label.platform-name=platform_name --conf spark.speculation.multiplier=10 --conf spark.memory.storageFraction=0.4 --conf spark.driver.maxResultSize=26112m --conf spark.kubernetes.executor.request.cores=15000m --conf spark.speculation.interval=1s --conf spark.executor.memory=104g --conf spark.sql.catalogImplementation=hive --conf spark.eventLog.dir=file:///logs/spark-events --conf spark.hadoop.fs.s3a.threads.max=100 --conf spark.speculation.quantile=0.75 job.py

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-17 04:09:57
-
-

*Thread Reply:* @Anirudh Shrinivason pls create an issue for this and I will look at it. Although it may be difficult to find the root cause, null pointer exception should be always avoided and this seems to be a bug.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-17 04:14:41
-
-

*Thread Reply:* Hmm yeah sure. I'll create an issue on github for this issue. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-17 05:13:54
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/1784 -Opened an issue here

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-04-17 19:32:23
-
-

Hey! Question about spark column lineage. What is the intended way to write custom code for getting column lineage? i am trying to implement CustomColumnLineageVisitor but when I try to do so I get: -io.openlineage.spark3.agent.lifecycle.plan.column.CustomColumnLineageVisitor is not public in io.openlineage.spark3.agent.lifecycle.plan.column; cannot be accessed from outside package

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-18 02:25:04
-
-

*Thread Reply:* Hi @Allison Suarez, CustomColumnLineageVisitor should be definitely public. I'll prepare a fix PR for that. We do have a test for custom column lineage visitors (CustomColumnLineageVisitorTestImpl), but they're in the same package. Thanks for bringing this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-18 03:07:11
-
-

*Thread Reply:* This PR should resolve problem: -https://github.com/OpenLineage/OpenLineage/pull/1788

-
- - - - - - - -
-
Labels
- integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-04-18 13:34:43
-
-

*Thread Reply:* Thank you so much @Paweł Leszczyński 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-04-18 13:35:46
-
-

*Thread Reply:* How does the release process work for OL? Do we have to wait a certain amount of time to get this change in a new release?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-04-18 17:34:29
-
-

*Thread Reply:* @Maciej Obuchowski ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-19 01:49:33
-
-

*Thread Reply:* 0.22.0 was released two weeks ago, so the next schedule should be in next two weeks. We can ask @Michael Robinson his opinion on releasing 0.22.1 before that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-19 09:08:58
-
-

*Thread Reply:* Hi Allison 👋, -Anyone can request a release in the #general channel. I encourage you to go this route. You’ll need three +1s (there’s more info about the process here: https://github.com/OpenLineage/OpenLineage/blob/main/GOVERNANCE.md), but I don’t know of any reasons why we can’t do a mid-cycle release. 🙂

- - - -
- 🙏 Allison Suarez -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-04-19 16:23:20
-
-

*Thread Reply:* seems like we got enough +1s

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-19 16:24:33
-
-

*Thread Reply:* We need three committers to give a +1. I’ll reach out again to see if I can recruit a third

- - - -
- 🙌 Allison Suarez -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-04-19 16:24:55
-
-

*Thread Reply:* oooh

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-19 16:32:47
-
-

*Thread Reply:* Yeah, sorry I forgot to mention that!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-20 05:02:46
-
-

*Thread Reply:* we have it now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-19 09:52:02
-
-

@channel -This month’s TSC meeting is tomorrow, 4/20, at 10 am PT: https://openlineage.slack.com/archives/C01CK9T7HKR/p1681167638153879

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-04-19 13:40:31
-
-

I would like to get a 0.22.1 patch release to get the issue described in this thread before the next scheduled release.

-
- - -
- - - } - - Allison Suarez - (https://openlineage.slack.com/team/U04BNREL8PM) -
- - - - - - - - - - - - - - - - - -
- - - -
- ➕ Michael Robinson, Paweł Leszczyński, Rohit Menon, Maciej Obuchowski, Julien Le Dem, Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-20 09:46:06
-
-

*Thread Reply:* The release is authorized and will be initiated within 2 business days (not including tomorrow).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-19 15:19:38
-
-

Here are the details about next week’s OpenLineage Meetup at Astronomer’s NY offices: https://openlineage.io/blog/nyc-meetup. Hope to see you there if you can make it!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Ernie Ostic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 07:38:55
-
-

Hi Team, I tried integrating openLineage with spark databricks and followed the steps as per the documentation. Installation and all looks good as the listener is enabled, but no event is getting passed to Marquez. I can see below message in log4j logs. Am I missing any configuration to be set?

- -

Running few spark commands in databricks notebook to create events.

- -

23/04/20 11:10:34 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart -23/04/20 11:10:34 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 08:57:45
-
-

*Thread Reply:* Hi Sai,

- -

Perhaps you could try within printing OpenLineage events into logs. This can be achieved with Spark config parameter: -spark.openlineage.transport.type -equal to console .

- -

This can help you determine if a problem is generating Openlineage events itself or emitting them into Marquez.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 09:18:53
-
-

*Thread Reply:* Hi @Paweł Leszczyński I passed this config as below, but could not see any changes in the logs. The events are getting generated sometimes like below:

- -

23/04/20 10:00:15 INFO ConsoleTransport: {"eventType":"START","eventTime":"2023-04-20T10:00:15.085Z","run":{"runId":"ef4f46d1-d13a-420a-87c3-19fbf6ffa231","facets":{"spark.logicalPlan":{"producer":"https://github.com/OpenLineage/OpenLineage/tree/0.22.0/integration/spark","schemaURL":"https://openlineage.io/spec/1-0-5/OpenLineage.json#/$defs/RunFacet","plan":[{"class":"org.apache.spark.sql.catalyst.plans.logical.CreateTableAsSelect","num-children":2,"name":0,"partitioning":[],"query":1,"tableSpec":null,"writeOptions":null,"ignoreIfExists":false},{"class":"org.apache.spark.sql.catalyst.analysis.ResolvedTableName","num-children":0,"catalog":null,"ident":null},{"class":"org.apache.spark.sql.catalyst.plans.logical.Project","num-children":1,"projectList":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num_children":0,"name":"workorderid","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-cl

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 09:19:37
-
-

*Thread Reply:* Ok, great. This means the issue is related to Spark <-> Marquez connection

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 09:20:33
-
-

*Thread Reply:* Some time ago Spark config has changed and here is the up-to-date-documentation: https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 09:21:10
-
-

*Thread Reply:* please note that spark.openlineage.transport.url has to be used which is different from what you have on screenshot attached

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 09:22:40
-
-

*Thread Reply:* You mean instead of "spark.openlineage.host" I need to use "spark.openlineage.transport.url"?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 09:23:04
-
-

*Thread Reply:* yes, please give it a try

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 09:23:40
-
-

*Thread Reply:* sure will give a try and let you know the outcome

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 09:23:48
-
-

*Thread Reply:* and set spark.openlineage.transport.type to http

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 09:24:04
-
-

*Thread Reply:* okay

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 09:26:42
-
-

*Thread Reply:* does these configs suffice or I need to add anything else

- -

spark.extraListeners io.openlineage.spark.agent.OpenLineageSparkListener -spark.openlineage.consoleTransport true -spark.openlineage.version v1 -spark.openlineage.transport.type http -spark.openlineage.transport.url http://<host>:5000/api/v1/namespaces/sparkintegrationpoc/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 09:27:07
-
-

*Thread Reply:* spark.openlineage.consoleTransport true this one can be removed

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 09:27:33
-
-

*Thread Reply:* otherwise shall be OK

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 10:01:30
-
-

*Thread Reply:* I added these configs and run, but still same issue. Now I am not able to see the events in log file as well.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 10:04:27
-
-

*Thread Reply:* 23/04/20 13:51:22 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart -23/04/20 13:51:22 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd

- -

Does this need any changes in the config side?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-04-20 13:02:23
-
-

If you are trying to get into the OpenLineage Technical Steering Committee meeting, you have to RSVP to the specific event at https://www.addevent.com/calendar/pP575215 to get the password (in the invitation to add to your calendar)

-
-
addevent.com
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-20 13:53:31
-
-

Here is a nice article I found online that briefly explains about the spark catalogs just for some context: https://www.waitingforcode.com/apache-spark-sql/pluggable-catalog-api/read -In reference to the V2SessionCatalog use case brought up in the meeting just now

-
-
waitingforcode.com
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Michael Robinson, Maciej Obuchowski, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-24 06:49:43
-
-

*Thread Reply:* @Anirudh Shrinivason Thanks for linking this as it contains a clear explanation on Spark catalogs. However, I am still unable to write a failing integration test that reproduces the scenario. Could you provide an example of Spark which is failing on V2SessionCatalog and provide more details how are you trying to read/write data?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-24 07:14:04
-
-

*Thread Reply:* Hi @Paweł Leszczyński I noticed this issue on one of our pipelines before actually. I didn't note down which pipeline the issue was occuring in unfortunately. I'll keep checking from my end to identify the spark job that ran into this error. In the meantime, I'll also try to see for which cases deltaCatalog makes use of the V2SessionCatalog to understand this better. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-26 03:44:15
-
-

*Thread Reply:* Hi @Paweł Leszczyński -''' - CREATE TABLE IF NOT EXISTS TABLE_NAME ( - SOME COLUMNS - ) USING delta - PARTITIONED BY (col) - location 's3 location' - ''' -A spark sql like this actually triggers the V2SessionCatalog

- - - -
- ❤️ Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-26 03:44:48
-
-

*Thread Reply:* Thanks @Anirudh Shrinivason, will look into that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-26 05:06:05
-
-

*Thread Reply:* which spark & delta versions are you using?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-27 02:35:50
-
-

*Thread Reply:* I am not 100% sure if this is something you described, but this was an error I was able to replicate and fix. Please look at the exception stacktrace and let me know if it is same on your side. -https://github.com/OpenLineage/OpenLineage/pull/1798

-
- - - - - - - -
-
Labels
- documentation, integration/spark -
- - - - - - - - - - -
- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-27 02:36:20
-
-

*Thread Reply:* Hi

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-27 02:36:45
-
-

*Thread Reply:* Hmm actually I am noticing this error on my local

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-27 02:37:01
-
-

*Thread Reply:* But on the prod job, I am seeing no such error in the logs...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-27 02:37:28
-
-

*Thread Reply:* Also, I was using spark 3.1.2

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-27 02:37:39
-
-

*Thread Reply:* then perhaps it's sth different :face_palm: will try to replicate on spark 3.1.2

- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-27 02:37:42
-
-

*Thread Reply:* Not too sure which delta version the prod job was using...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-28 03:30:49
-
-

*Thread Reply:* I was running on Spark 3.1.2 the following command: -spark.sql( - "CREATE TABLE t_partitioned (a int, b int) USING delta " - + "PARTITIONED BY (a) LOCATION '/tmp/delta/tbl'" - ); -and I got Openlineage event emitted with t_partitioned output dataset.

- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-28 03:31:47
-
-

*Thread Reply:* Oh... hmm... that is strange. Let me check more from my end too

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-28 03:33:01
-
-

*Thread Reply:* for spark 3.1, we're using delta 1.0.0

- - - -
- 👀 Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Cory Visi - (cvisi@amazon.com) -
-
2023-04-20 14:41:23
-
-

Hi team! I have two Spark jobs chained together to process incoming data files, and I'm using openlineage-spark-0.22.0 with Marquez to visualize. -I'm struggling to figure out the best way to use spark.openlineage.parentRunId and spark.openlineage.parentJobName. Should these values be unique for each Spark job? Should they be unique for each execution of the chain of both spark jobs? Or should they be the same for all runs? -I'm setting them to be unique to the execution of the chain and I'm getting strange results (jobs are not showing completed, and not showing at all)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-24 05:38:09
-
-

*Thread Reply:* Hi Cory, I think the definition of ParentRunFacet (https://openlineage.io/docs/spec/facets/run-facets/parent_run) contains answer to that: -Commonly, scheduler systems like Apache Airflow will trigger processes on remote systems, such as on Apache Spark or Apache Beam jobs. Those systems might have their own OpenLineage integration and report their own job runs and dataset inputs/outputs. The ParentRunFacet allows those downstream jobs to report which jobs spawned them to preserve job hierarchy. To do that, the scheduler system should have a way to pass its own job and run id to the child job. -For example, when airflow is used to run Spark job, we want Spark events to contain some information on what triggered the spark job and parameters, you ask about, are used to pass that information from airflow operator to spark job.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Cory Visi - (cvisi@amazon.com) -
-
2023-04-26 17:28:39
-
-

*Thread Reply:* Thank you for pointing me at this documentation; I did not see it previously. In my setup, the calling system is AWS Step Functions, which have no integration with OpenLineage.

- -

So I've been essentially passing non-existing parent job information to OpenLineage. It has been useful as a data point for searches and reporting though.

- -

Is there any harm in doing what I am doing? Is it causing the jobs that I see never completing?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-27 04:59:39
-
-

*Thread Reply:* I think parentRunId should be the same for Openlineage START and COMPLETE event. Is it like this in your case?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Cory Visi - (cvisi@amazon.com) -
-
2023-05-03 11:13:58
-
-

*Thread Reply:* that makes sense, and based on my configuration, i would think that it would be. however, given that i am seeing incomplete jobs in Marquez, i'm wondering if somehow the parentrunID is changing. I need to investigate

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-20 15:44:39
-
-

@channel -We released OpenLineage 0.23.0, including: -Additions: -• SQL: parser improvements to support: copy into, create stage, pivot #1742 @pawel-big-lebowski -• dbt: add support for snapshots #1787 @JDarDagran -Changes: -• Spark: change custom column lineage visitors #1788 @pawel-big-lebowski -Plus bug fixes, doc changes and more. -Thanks to all the contributors! -For the bug fixes and details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.23.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.22.0...0.23.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🎉 Harel Shein, Maciej Obuchowski, Anirudh Shrinivason, Kengo Seki, Paweł Leszczyński, Perttu Salonen -
- -
- 👍 Cory Visi, Maciej Obuchowski, Anirudh Shrinivason, Kengo Seki -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-21 05:07:30
-
-

Just curious, how long before we can see 0.23.0 over here: https://mvnrepository.com/artifact/io.openlineage/openlineage-spark

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-21 09:06:06
-
-

*Thread Reply:* I think @Michael Robinson has to manually promote artifacts

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-21 09:08:06
-
-

*Thread Reply:* I promoted the artifacts, but there is a delay before they appear in Maven. A couple releases ago, the delay was about 24 hours long

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-21 09:26:09
-
-

*Thread Reply:* Ahh I see... Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-21 10:10:38
-
-

*Thread Reply:* @Anirudh Shrinivason are you using search.maven.org by chance? Version 0.23.0 is not appearing there yet, but I do see it on central.sonatype.com.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-21 10:15:00
-
-

*Thread Reply:* Hmm I can see it now on search.maven.org actually. But I still cannot see it on https://mvnrepository.com/artifact/io.openlineage/openlineage-spark ...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-21 10:19:38
-
-

*Thread Reply:* Understood. I believe you can download the 0.23.0 jars from central.sonatype.com. For Spark, try going here: https://central.sonatype.com/artifact/io.openlineage/openlineage-spark/0.23.0/versions

-
-
Maven Central
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-22 06:11:10
-
-

*Thread Reply:* Yup. I can see it on all maven repos now haha. I think its just the delay.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-22 06:11:18
-
-

*Thread Reply:* ~24 hours ig

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-24 16:49:15
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Doe - (adarsh.pansari@tigeranalytics.com) -
-
2023-04-21 08:49:54
-
-

Hello Everyone, I am facing an issue while trying to integrate openlineage with Jupyter notebook. I am following the Docs. My containers are running and I am getting the URL for Jupyter notebook but when I try with the token in the terminal, I get invalid credentials error. Can someone please help resolve this ? Am I doing something wrong..

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Doe - (adarsh.pansari@tigeranalytics.com) -
-
2023-04-21 09:28:18
-
-

*Thread Reply:* Good news, everyone! The login worked on the second attempt after starting the Docker containers. Although it's unclear why it failed the first time.

- - - -
- 👍 Maciej Obuchowski, Anirudh Shrinivason, Michael Robinson, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Natalie Zeller - (natalie.zeller@naturalint.com) -
-
2023-04-23 23:52:34
-
-

Hi team, -I have a question regarding the customization of transport types in OpenLineage. -At my company, we are using OpenLineage to report lineage from our Spark jobs to OpenMetadata. We have created a custom OpenMetadataTransport to send lineage to the OpenMetadata APIs, conforming to the OpenMetadata format. -Currently, we are using a fork of OpenLineage, as we needed to make some changes in the core to identify the new TransportConfig. -We believe it would be more optimal for OpenLineage to support custom transport types, which would allow us to use OpenLineage JAR alongside our own JAR containing the custom transport. -I noticed some comments in the code suggesting that customizations are possible. However, I couldn't make it work without modifying the TransportFactory and the TransportConfig interface, as the transport types are hardcoded. Am I missing something? 🤔 -If custom transport types are not currently supported, we would be more than happy to contribute a PR that enables custom transports. -What are your thoughts on this?

- - - -
- ❤️ Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-24 02:32:51
-
-

*Thread Reply:* Hi Natalie, it's wonderful to hear you're planning to contribute. Yes, you're right about TransportFactory . What other transport type was in your mind? If it is something generic, then it is surely OK to include it within TransportFactory. If it is a custom feature, we could follow ServiceLoader pattern that we're using to allow including custom plan visitors and dataset builders.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Natalie Zeller - (natalie.zeller@naturalint.com) -
-
2023-04-24 02:54:40
-
-

*Thread Reply:* Hi @Paweł Leszczyński -Yes, I was planning to change TransportFactory to support custom/generic transport types using ServiceLoader pattern. After this change is done, I will be able to use our custom OpenMetadataTransport without changing anything in OpenLineage core. For now I don't have other types in mind, but after we'll add the customization support anyone will be able to create their own transport type and report the lineage to different backends

- - - -
- 👍 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-24 03:28:30
-
-

*Thread Reply:* Perhaps it's not strictly related to this particular usecase, but you may also find interesting our recent PoC about Fluentd & Openlineage integration. This will bring some cool backend features like: copy event and send it to multiple backends, send it to backends supported by fluentd output plugins etc. https://github.com/OpenLineage/OpenLineage/pull/1757/files?short_path=4fc5534#diff-4fc55343748f353fa1def0e00c553caa735f9adcb0da18baad50a989c0f2e935

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Natalie Zeller - (natalie.zeller@naturalint.com) -
-
2023-04-24 05:36:24
-
-

*Thread Reply:* Sounds interesting. Thanks, I will look into it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-24 16:37:33
-
-

Are you planning to come to the first New York OpenLineage Meetup this Wednesday at Astronomer’s offices in the Flatiron District? Don’t forget to RSVP so we know much food and drink to order!

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhar Balaji - (sudharshan.dataaces@gmail.com) -
-
2023-04-25 03:20:57
-
-

Hi, I'm new to Open data lineage and I'm trying to connect snowflake database with marquez using airflow and getting the error in etl_openlineage while running the airflow dag on local ubuntu environment and unable to see the marquez UI once it etl_openlineage has ran completed as success.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-25 08:07:36
-
-

*Thread Reply:* What's the extract_openlineage.py file? Looks like your code?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhar Balaji - (sudharshan.dataaces@gmail.com) -
-
2023-04-25 08:43:04
-
-

*Thread Reply:* import json -import os -from pendulum import datetime

- -

from airflow import DAG -from airflow.decorators import task -from openlineage.client import OpenLineageClient -from snowflake.connector import connect

- -

SNOWFLAKEUSER = os.getenv('SNOWFLAKEUSER') -SNOWFLAKEPASSWORD = os.getenv('SNOWFLAKEPASSWORD') -SNOWFLAKEACCOUNT = os.getenv('SNOWFLAKEACCOUNT')

- -

@task -def sendolevents(): - client = OpenLineageClient.from_environment()

- -
with connect(
-    user=SNOWFLAKE_USER,
-    password=SNOWFLAKE_PASSWORD,
-    account=SNOWFLAKE_ACCOUNT,
-    database='OPENLINEAGE',
-    schema='PUBLIC',
-) as conn:
-    with conn.cursor() as cursor:
-        ol_view = 'OPENLINEAGE_ACCESS_HISTORY'
-        ol_event_time_tag = 'OL_LATEST_EVENT_TIME'
-
-        var_query = f'''
-            use warehouse {SNOWFLAKE_WAREHOUSE};
-        '''
-
-        cursor.execute(var_query)
-
-        var_query = f'''
-            set current_organization='{SNOWFLAKE_ACCOUNT}';
-        '''
-
-        cursor.execute(var_query)
-
-        ol_query = f'''
-            SELECT ** FROM {ol_view}
-            WHERE EVENT:eventTime &gt; system$get_tag('{ol_event_time_tag}', '{ol_view}', 'table')
-            ORDER BY EVENT:eventTime ASC;
-        '''
-
-        cursor.execute(ol_query)
-        ol_events = [json.loads(ol_event[0]) for ol_event in cursor.fetchall()]
-
-        for ol_event in ol_events:
-            client.emit(ol_event)
-
-        if len(ol_events) &gt; 0:
-            latest_event_time = ol_events[-1]['eventTime']
-            cursor.execute(f'''
-                ALTER VIEW {ol_view} SET TAG {ol_event_time_tag} = '{latest_event_time}';
-            ''')
-
- -

with DAG( - 'etlopenlineage', - startdate=datetime(2022, 4, 12), - scheduleinterval='@hourly', - catchup=False, - defaultargs={ - 'owner': 'openlineage', - 'dependsonpast': False, - 'emailonfailure': False, - 'emailonretry': False, - 'email': ['demo@openlineage.io'], - 'snowflakeconnid': 'openlineagesnowflake' - }, - description='Send OL events every minutes.', - tags=["extract"], -) as dag: - sendol_events()

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-25 09:52:33
-
-

*Thread Reply:* OpenLineageClient expects RunEvent classes and you're sending it raw json. I think at this point your options are either sending them by constructing your own HTTP client, using something like requests, or using something like https://github.com/python-attrs/cattrs to structure json to RunEvent

-
- - - - - - - -
-
Website
- <https://catt.rs> -
- -
-
Stars
- 625 -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-25 10:05:57
-
-

*Thread Reply:* @Jakub Dardziński suggested that you can -change client.emit(ol_event) to client.transport.emit(ol_event) and it should work

- - - -
- 👍 Ross Turk, Sudhar Balaji -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@rossturk.com) -
-
2023-04-25 12:24:08
-
-

*Thread Reply:* @Maciej Obuchowski I believe this is from https://github.com/Snowflake-Labs/OpenLineage-AccessHistory-Setup/blob/main/examples/airflow/dags/lineage/extract_openlineage.py

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@rossturk.com) -
-
2023-04-25 12:25:26
-
-

*Thread Reply:* I believe this example no longer works - perhaps a new access history pull/push example could be created that is simpler and doesn’t use airflow.

- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-26 08:34:02
-
-

*Thread Reply:* I think separating the actual getting data from the view and Airflow DAG would make sense

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@rossturk.com) -
-
2023-04-26 13:57:34
-
-

*Thread Reply:* Yeah - I also think that Airflow confuses the issue. You don’t need Airflow to get lineage from Snowflake Access History, the only reason Airflow is in the example is a) to simulate a pipeline that can be viewed in Marquez; b) to establish a mechanism that regularly pulls and emits lineage…

- -

but most people will already have A, and the simplest example doesn’t need to accomplish B.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@rossturk.com) -
-
2023-04-26 13:58:59
-
-

*Thread Reply:* just a few weeks ago 🙂 I was working on a script that you could run like SNOWFLAKE_USER=foo ./process_snowflake_lineage.py --from-date=xxxx-xx-xx --to-date=xxxx-xx-xx

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-27 11:13:58
-
-

*Thread Reply:* Hi @Ross Turk! Do you have a link to this script? Perhaps this script can fix the connection issue 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@rossturk.com) -
-
2023-04-27 11:47:20
-
-

*Thread Reply:* No, it never became functional before I stopped to take on another task 😕

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhar Balaji - (sudharshan.dataaces@gmail.com) -
-
2023-04-25 07:47:57
-
-

Hi, -Currently, In the .env file, we have using the OPENLINEAGE_URL as <http://marquez-api:5000> and got the error -requests.exceptions.HTTPError: 422 Client Error: for url: <http://marquez-api:5000/api/v1/lineage> -we have tried using OPENLINEAGE_URL as <http://localhost:5000> and getting the error as -requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /api/v1/lineage (Caused by NewConnectionError('&lt;urllib3.connection.HTTPConnection object at 0x7fc71edb9590&gt;: Failed to establish a new connection: [Errno 111] Connection refused')) -I'm not sure which variable value to use for OPENLINEAGE_URL, so please offer the correct variable value.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-25 09:54:07
-
-

*Thread Reply:* Looks like the first URL is proper, but there's something wrong with entity - Marquez logs would help here.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhar Balaji - (sudharshan.dataaces@gmail.com) -
-
2023-04-25 09:57:36
-
-

*Thread Reply:* This is my log in airflow, can you please prvide more info over it.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-25 10:13:37
-
-

*Thread Reply:* Airflow log does not tell us why Marquez rejected the event. Marquez logs would be more helpful

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhar Balaji - (sudharshan.dataaces@gmail.com) -
-
2023-04-26 05:48:08
-
-

*Thread Reply:* We investigated the marquez container logs and were unable to locate the error. Could you please specify the log file that belongs to marquez while connecting the airflow or snowflake?

- -

Is it correct that the marquez-web log points to <http://api:5000/>? -[HPM] Proxy created: /api/v1 -&gt; <http://api:5000/> -App listening on port 3000!

- -
- - - - - - - -
-
- - - - - - - -
- - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-26 11:26:36
-
-

*Thread Reply:* I've the same error at the moment but can provide some additional screenshots. The Event data in Snowflake seems fine and the data is being retrieved correctly by the Airflow DAG. However, there seems to be a warning in the Marquez API logs. Hopefully we can troubleshoot this together!

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-26 11:33:35
-
-

*Thread Reply:*

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-26 13:06:30
-
-

*Thread Reply:* Possibly the Python part between does some weird things, like double-jsonning the data? I can imagine it being wrapped in second, unnecessary JSON object

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-26 13:08:18
-
-

*Thread Reply:* I guess only way to check is print one of those events - in the form they are send in Python part, not Snowflake - and see how they are like. For example using ConsoleTransport or setting DEBUG log level in Airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-26 14:37:32
-
-

*Thread Reply:* Here is a code snippet by using logging in DEBUG on the snowflake python connector:

- -

[20230426T17:16:55.166+0000] {cursor.py:593} DEBUG - binding: [set currentorganization='[PRIVATE]';] with input=[None], processed=[{}] -[2023-04-26T17:16:55.166+0000] {cursor.py:800} INFO - query: [set currentorganization='[PRIVATE]';] -[2023-04-26T17:16:55.166+0000] {connection.py:1363} DEBUG - sequence counter: 2 -[2023-04-26T17:16:55.167+0000] {cursor.py:467} DEBUG - Request id: f7bca188-dda0-4fe6-8d5c-a92dc5f9c7ac -[2023-04-26T17:16:55.167+0000] {cursor.py:469} DEBUG - running query [set currentorganization='[PRIVATE]';] -[2023-04-26T17:16:55.168+0000] {cursor.py:476} DEBUG - isfiletransfer: True -[2023-04-26T17:16:55.168+0000] {connection.py:1035} DEBUG - _cmdquery -[2023-04-26T17:16:55.168+0000] {connection.py:1062} DEBUG - sql=[set currentorganization='[PRIVATE]';], sequenceid=[2], isfiletransfer=[False] -[2023-04-26T17:16:55.168+0000] {network.py:1162} DEBUG - Session status for SessionPool [PRIVATE]', SessionPool 1/1 active sessions -[2023-04-26T17:16:55.169+0000] {network.py:850} DEBUG - remaining request timeout: None, retry cnt: 1 -[2023-04-26T17:16:55.169+0000] {network.py:828} DEBUG - Request guid: 4acea1c3-6a68-4691-9af4-22f184e0f660 -[2023-04-26T17:16:55.169+0000] {network.py:1021} DEBUG - socket timeout: 60 -[2023-04-26T17:16:55.259+0000] {connectionpool.py:465} DEBUG - [PRIVATE]"POST /queries/v1/query-request?requestId=f7bca188-dda0-4fe6-8d5c-a92dc5f9c7ac&requestguid=4acea1c3-6a68-4691-9af4-22f184e0f660 HTTP/1.1" 200 1118 -[2023-04-26T17:16:55.261+0000] {network.py:1047} DEBUG - SUCCESS -[2023-04-26T17:16:55.261+0000] {network.py:1168} DEBUG - Session status for SessionPool [PRIVATE], SessionPool 0/1 active sessions -[2023-04-26T17:16:55.261+0000] {network.py:729} DEBUG - ret[code] = None, after post request -[2023-04-26T17:16:55.261+0000] {network.py:751} DEBUG - Query id: 01abe3ac-0603-4df4-0042-c78307975eb2 -[2023-04-26T17:16:55.262+0000] {cursor.py:807} DEBUG - sfqid: 01abe3ac-0603-4df4-0042-c78307975eb2 -[2023-04-26T17:16:55.262+0000] {cursor.py:813} INFO - query execution done -[2023-04-26T17:16:55.262+0000] {cursor.py:827} DEBUG - SUCCESS -[2023-04-26T17:16:55.262+0000] {cursor.py:846} DEBUG - PUT OR GET: False -[2023-04-26T17:16:55.263+0000] {cursor.py:941} DEBUG - Query result format: json -[2023-04-26T17:16:55.263+0000] {resultbatch.py:433} DEBUG - parsing for result batch id: 1 -[2023-04-26T17:16:55.263+0000] {cursor.py:956} INFO - Number of results in first chunk: 1 -[2023-04-26T17:16:55.263+0000] {cursor.py:735} DEBUG - executing SQL/command -[2023-04-26T17:16:55.263+0000] {cursor.py:593} DEBUG - binding: [SELECT * FROM OPENLINEAGE_ACCESS_HISTORY WHERE EVENT:eventTime > system$get_tag(...] with input=[None], processed=[{}] -[2023-04-26T17:16:55.264+0000] {cursor.py:800} INFO - query: [SELECT * FROM OPENLINEAGEACCESSHISTORY WHERE EVENT:eventTime > system$gettag(...] -[2023-04-26T17:16:55.264+0000] {connection.py:1363} DEBUG - sequence counter: 3 -[2023-04-26T17:16:55.264+0000] {cursor.py:467} DEBUG - Request id: 21e2ab85-4995-4010-865d-df06cf5ee5b5 -[2023-04-26T17:16:55.265+0000] {cursor.py:469} DEBUG - running query [SELECT ** FROM OPENLINEAGEACCESSHISTORY WHERE EVENT:eventTime > system$gettag(...] -[2023-04-26T17:16:55.265+0000] {cursor.py:476} DEBUG - isfiletransfer: True -[2023-04-26T17:16:55.265+0000] {connection.py:1035} DEBUG - cmdquery -[2023-04-26T17:16:55.265+0000] {connection.py:1062} DEBUG - sql=[SELECT ** FROM OPENLINEAGEACCESSHISTORY WHERE EVENT:eventTime > system$gettag(...], sequenceid=[3], isfiletransfer=[False] -[2023-04-26T17:16:55.266+0000] {network.py:1162} DEBUG - Session status for SessionPool '[PRIVATE}', SessionPool 1/1 active sessions -[2023-04-26T17:16:55.267+0000] {network.py:850} DEBUG - remaining request timeout: None, retry cnt: 1 -[2023-04-26T17:16:55.268+0000] {network.py:828} DEBUG - Request guid: aba82952-a5c2-4c6b-9c70-a10545b8772c -[2023-04-26T17:16:55.268+0000] {network.py:1021} DEBUG - socket timeout: 60 -[2023-04-26T17:17:21.844+0000] {connectionpool.py:465} DEBUG - [PRIVATE] "POST /queries/v1/query-request?requestId=21e2ab85-4995-4010-865d-df06cf5ee5b5&requestguid=aba82952-a5c2-4c6b-9c70-a10545b8772c HTTP/1.1" 200 None -[2023-04-26T17:17:21.879+0000] {network.py:1047} DEBUG - SUCCESS -[2023-04-26T17:17:21.881+0000] {network.py:1168} DEBUG - Session status for SessionPool '[PRIVATE}', SessionPool 0/1 active sessions -[2023-04-26T17:17:21.882+0000] {network.py:729} DEBUG - ret[code] = None, after post request -[2023-04-26T17:17:21.882+0000] {network.py:751} DEBUG - Query id: 01abe3ac-0603-4df4-0042-c78307975eb6 -[2023-04-26T17:17:21.882+0000] {cursor.py:807} DEBUG - sfqid: 01abe3ac-0603-4df4-0042-c78307975eb6 -[2023-04-26T17:17:21.882+0000] {cursor.py:813} INFO - query execution done -[2023-04-26T17:17:21.883+0000] {cursor.py:827} DEBUG - SUCCESS -[2023-04-26T17:17:21.883+0000] {cursor.py:846} DEBUG - PUT OR GET: False -[2023-04-26T17:17:21.883+0000] {cursor.py:941} DEBUG - Query result format: arrow -[2023-04-26T17:17:21.903+0000] {resultbatch.py:102} DEBUG - chunk size=256 -[2023-04-26T17:17:21.920+0000] {cursor.py:956} INFO - Number of results in first chunk: 112 -[2023-04-26T17:17:21.949+0000] {arrowiterator.cpython-37m-x8664-linux-gnu.so:0} DEBUG - Batches read: 1 -[2023-04-26T17:17:21.950+0000] {CArrowIterator.cpp:16} DEBUG - Arrow BatchSize: 1 -[2023-04-26T17:17:21.950+0000] {CArrowChunkIterator.cpp:50} DEBUG - Arrow chunk info: batchCount 1, columnCount 1, usenumpy: 0 -[2023-04-26T17:17:21.950+0000] {resultset.py:232} DEBUG - result batch 1 has id: data001 -[2023-04-26T17:17:21.951+0000] {resultset.py:232} DEBUG - result batch 2 has id: data002 -[2023-04-26T17:17:21.951+0000] {resultset.py:232} DEBUG - result batch 3 has id: data003 -[2023-04-26T17:17:21.951+0000] {resultset.py:232} DEBUG - result batch 4 has id: data010 -[2023-04-26T17:17:21.951+0000] {resultset.py:232} DEBUG - result batch 5 has id: data011 -[2023-04-26T17:17:21.951+0000] {resultset.py:232} DEBUG - result batch 6 has id: data012 -[2023-04-26T17:17:21.952+0000] {resultset.py:232} DEBUG - result batch 7 has id: data013 -[2023-04-26T17:17:21.952+0000] {resultset.py:232} DEBUG - result batch 8 has id: data020 -[2023-04-26T17:17:21.952+0000] {resultset.py:232} DEBUG - result batch 9 has id: data02_1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-26 14:45:26
-
-

*Thread Reply:* I don't see any Airflow standard logs here, but anyway I looked at it and debugging it would not work if you're bypassing OpenLineageClient.emit and going directly to transport - the logging is done on Client level https://github.com/OpenLineage/OpenLineage/blob/acc207d63e976db7c48384f04bc578409f08cc8a/client/python/openlineage/client/client.py#L73

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-27 11:16:20
-
-

*Thread Reply:* I'm sorry, do you have a code snippet on how to get these logs from https://github.com/Snowflake-Labs/OpenLineage-AccessHistory-Setup/blob/main/examples/airflow/dags/lineage/extract_openlineage.py? I still get the ValueError for OpenLineageClient.emit

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-05-04 10:56:34
-
-

*Thread Reply:* Hey does anyone have an idea on this? I'm still stuck on this issue 😞

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-05-05 08:58:49
-
-

*Thread Reply:* I've found the root cause. It's because facets don't have _producer and _schemaURL set. I'll provide a fix soon

- - - -
- ♥️ Tom van Eijk, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-26 11:36:23
-
-

The first New York OpenLineage Meetup is happening today at 5:30 pm ET at Astronomer’s offices in the Flatiron District! https://openlineage.slack.com/archives/C01CK9T7HKR/p1681931978353159

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-04-26 11:36:57
-
-

*Thread Reply:* I’ll be there! I’m looking forward to see you all.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-04-26 11:37:23
-
-

*Thread Reply:* We’ll talk about the evolution of the spec.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-27 02:55:00
-
-

delta_table = DeltaTable.forPath(spark, path) -delta_table.alias("source").merge(df.alias("update"),lookup_statement).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute() -If I write based on df operations like this, I notice that OL does not emit any event. May I know whether these or similar cases can be supported too? 🙇

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-28 04:23:24
-
-

*Thread Reply:* I've created an integration test based on your example. The Openlineage event gets sent, however it does not contain output dataset. I will look deeper into that.

- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-28 08:55:43
-
-

*Thread Reply:* Hey, sorry do you mean input dataset is empty? Or output dataset?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-28 08:55:51
-
-

*Thread Reply:* I am seeing that input dataset is empty

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-28 08:56:05
-
-

*Thread Reply:* ooh, I see input datasets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-28 08:56:11
-
-

*Thread Reply:* Hmm

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-28 08:56:12
-
-

*Thread Reply:* I see

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-28 08:57:07
-
-

*Thread Reply:* I create a test in SparkDeltaIntegrationTest class a test method: -```@Test - void testDeltaMergeInto() { - Dataset<Row> dataset = - spark - .createDataFrame( - ImmutableList.of( - RowFactory.create(1L, "bat"), - RowFactory.create(2L, "mouse"), - RowFactory.create(3L, "horse") - ), - new StructType( - new StructField[] { - new StructField("a", LongType$.MODULE$, false, Metadata.empty()), - new StructField("b", StringType$.MODULE$, false, Metadata.empty()) - })) - .repartition(1); - dataset.createOrReplaceTempView("temp");

- -
spark.sql("CREATE TABLE t1 USING delta LOCATION '/tmp/delta/t1' AS SELECT ** FROM temp");
-spark.sql("CREATE TABLE t2 USING delta LOCATION '/tmp/delta/t2' AS SELECT ** FROM temp");
-
-DeltaTable.forName("t1")
-    .merge(spark.read().table("t2"),"t1.a = t2.a")
-    .whenMatched().updateAll()
-    .whenNotMatched().insertAll()
-    .execute();
-
-verifyEvents(mockServer, "pysparkDeltaMergeIntoCompleteEvent.json");
-
- -

}```

- - - -
- 👍 Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-28 08:59:14
-
-

*Thread Reply:* Oh yeah my bad. I am seeing output dataset is empty.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-28 08:59:21
-
-

*Thread Reply:* Checks out with your observation

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-03 23:23:36
-
-

*Thread Reply:* Hi @Paweł Leszczyński just curious, has a fix for this been implemented alr?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-04 02:40:11
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, I had some days ooo. I will look into this soon.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-04 07:37:52
-
-

*Thread Reply:* Ahh okie! Thanks so much! Hope you had a good rest!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-04 07:38:38
-
-

*Thread Reply:* yeah. this was an amazing extended weekend 😉

- - - -
- 🎉 Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-05 02:09:10
-
-

*Thread Reply:* This should be it: https://github.com/OpenLineage/OpenLineage/pull/1823

-
- - - - - - - -
-
Labels
- integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-05 02:43:24
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, please let me know if there is still something to be done within #1747 PROPOSAL] Support for V2SessionCatalog. I could not reproduce exactly what you described but fixed some issue nearby.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-05 02:49:38
-
-

*Thread Reply:* Hmm yeah sure let me find out the exact cause of the issue. The pipeline that was causing the issue is now inactive haha. So I'm trying to backtrace from the limited logs I captured last time. Let me get back by next week thanks! 🙇

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-05 09:35:00
-
-

*Thread Reply:* Hi @Paweł Leszczyński I was trying to replicate the issue from my end, but couldn't do so. I think we can close the issue for now, and revisit later on if the issue resurfaces. Does that sound okay?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-05 09:40:33
-
-

*Thread Reply:* sounds cool. we can surely create a new issue later on.

- - - -
- 👍 Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-09 23:34:04
-
-

*Thread Reply:* @Paweł Leszczyński - I was trying to implement these new changes in databricks. I was wondering which java file should I use for building the jar file? Could you plese help me?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-09 00:46:34
-
-

*Thread Reply:* .

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-09 02:37:49
-
-

*Thread Reply:* Hi I found that these merge operations have no input datasets/col lineage: -```df.write.format(fileformat).mode(mode).option("mergeSchema", mergeschema).option("overwriteSchema", overwriteSchema).save(path)

- -

df.write.format(fileformat).mode(mode).option("mergeSchema", mergeschema).option("overwriteSchema", overwriteSchema)\ - .partitionBy(**partitions).save(path)

- -

df.write.format(fileformat).mode(mode).option("mergeSchema", mergeschema).option("overwriteSchema", overwriteSchema)\ - .partitionBy(**partitions).option("replaceWhere", where_clause).save(path)`` -I also noticed the same issue when using theMERGE INTO` command from spark sql. -Would it be possible to extend the support to these df operations. too please? Thanks! -CC: @Paweł Leszczyński

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-09 02:41:24
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, great to hear from you. Could you create an issue out of this? I am working at the moment on Spark 3.4. Once this is ready, I will look at the spark issues. And this one seems to be nicely reproducible. Thanks for that.

- - - -
- 👍 Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-09 02:49:56
-
-

*Thread Reply:* Sure let me create an issue! Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-09 02:55:21
-
-

*Thread Reply:* Created an issue here! https://github.com/OpenLineage/OpenLineage/issues/1919 -Thanks! 🙇

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-15 10:39:50
-
-

*Thread Reply:* Hi @Paweł Leszczyński I just realised, https://github.com/OpenLineage/OpenLineage/pull/1823/files -This PR doesn't actually capture column lineage for the MergeIntoCommand? It looks like there is no column lineage field in the events json.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-17 04:21:24
-
-

*Thread Reply:* Hi @Paweł Leszczyński Is there a potential timeline in mind to support column lineage for the MergeIntoCommand? We're really excited for this feature and would be a huge help to overcome a current blocker. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-28 14:11:34
-
-

Thanks to everyone who came out to Wednesday night’s meetup in New York! In addition to great pizza from Grimaldi’s (thanks for the tip, @Harel Shein), we enjoyed a spirited discussion of: -• the state of observability tooling in the data space today -• the history and high-level architecture of the project courtesy of @Julien Le Dem -• exciting news of an OpenLineage Scanner being planned at MANTA courtesy of @Ernie Ostic -• updates on the project roadmap and some exciting proposals from @Julien Le Dem, @Harel Shein and @Willy Lulciuc -• an introduction to and demo of Marquez from project lead @Willy Lulciuc -• and more. -Be on the lookout for an announcement about the next meetup!

- -
- - - - - - - -
- - -
- ❤️ Harel Shein, Maciej Obuchowski, Peter Hicks, Jakub Dardziński, Atif Tahir -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-28 16:02:22
-
-

As discussed during the April TSC meeting, comments are sought from the community on a proposal to support RunEvent-less (AKA static) lineage metadata emission. This is currently a WIP. For details and to comment, please see: -• https://docs.google.com/document/d/1366bAPkk0OqKkNA4mFFt-41X0cFUQ6sOvhSWmh4Iydo/edit?usp=sharing -• https://docs.google.com/document/d/1gKJw3ITJHArTlE-Iinb4PLkm88moORR0xW7I7hKZIQA/edit?usp=sharing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2023-04-30 21:35:47
-
-

Hi all. Probably I just need to study the spec further, but what is the significance of _producer vs producer in the context of where they are used? (same question also for _schemaURL vs schemaURL)? Thx!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-01 12:02:13
-
-

*Thread Reply:* “producer” is an element of the event run itself - e.g. what produced the JSON packet you’re studying. There is only one of these per event run. You can think of it as a top-level property.

- -

producer” (and “schemaURL”) are elements of a facet. They are the 2 required elements for any customized facet (though I don’t agree they should be required, or at least I believe they should be able to be compatible with a blank value and a null value).

- -

A packet sent to an API should only have one “producer” element, but can have many _producer elements in sub-objects (though, only one _producer per facet).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2023-05-01 12:06:52
-
-

*Thread Reply:* just curious --- is/was there any specific reason for the underscore prefix? If they are in a facet, they would already be qualified.......

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-01 13:13:28
-
-

*Thread Reply:* The facet “BaseFacet” that’s used for customization, has 2 required elements - _producer and _schemaURL. so I don’t believe it’s related to qualification.

- - - -
- 👍 Ernie Ostic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-01 11:33:02
-
-

I’m opening a vote to release OpenLineage 0.24.0, including: -• a new OpenLineage extractor for dbt Cloud -• a new interface - TransportBuilder - for creating custom transport types without modifying core components of OpenLineage -• a fix to the LogicalPlanSerializer in the Spark integration to make it operational again -• a new configuration parameter in the Spark integration for making dataset paths less verbose -• a fix to the Flink integration CI -• and more. - Three +1s from committers will authorize an immediate release.

- - - -
- ➕ Jakub Dardziński, Willy Lulciuc, Julien Le Dem -
- -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-02 19:43:12
-
-

*Thread Reply:* Thanks for voting. The release will commence within 2 days.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-01 12:03:19
-
-

Does the Spark integration for OpenLineage also support ETL that uses the Apache Spark Structured Streaming framework?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-04 02:33:32
-
-

*Thread Reply:* Although it is not documented, we do have an integration test for that: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/test/resources/spark_scripts/spark_kafka.py

- -

The test reads and writes data to Kafka and verifies if input/output datasets are collected.

-
- - - - - - - - - - - - - - - - -
- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-01 13:14:14
-
-

Also, does it work for pyspark jobs? (Forgive me if Spark job = pyspark, I don’t have a lot of depth on how Spark works.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-01 22:37:25
-
-

*Thread Reply:* From my experience, yeah it works for pyspark

- - - -
- 🙌 Paweł Leszczyński, Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-01 13:35:41
-
-

(and in a less generic question, would it work on top of this Spline agent/lineage harvester, or is it a replacement for it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-01 22:39:18
-
-

*Thread Reply:* Also from my experience, I think we can only use one of them as we can only configure one spark listener... correct me if I'm wrong. But it seems like the latest releases of spline are already using openlineage to some capacity?

- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-08 09:46:15
-
-

*Thread Reply:* In spark.extraListeners you can configure multiple listeners by comma separating them - I think you can use multiple ones with OpenLineage without obvious problems. I think we do pretty similar things to Spline though

- - - -
- 👍 Anirudh Shrinivason -
- -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-25 11:28:41
-
-

*Thread Reply:* (I never said thank you for this, so, thank you!)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-02 04:03:40
-
-

Hi Team,

- -

I have configured Open lineage with databricks and it is sending events to Marquez as expected. I have a notebook which joins 3 tables and write the result data frame to an azure adls location. Each time I run the notebook manually, it creates two start events and two complete events for one run as shown in the screenshot. Is this something expected or I am missing something?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-02 10:45:37
-
-

*Thread Reply:* Hello Sai, thanks for your question! A number of folks who could help with this are OOO, but someone will reply as soon as possible.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-04 02:44:46
-
-

*Thread Reply:* That is interesting @Sai. Are you able to reproduce this with a simple code snippet? Which Openlineage version are you using?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-05 01:16:20
-
-

*Thread Reply:* Yes @Paweł Leszczyński. Each join query I run on top of delta tables have two start and two complete events. We are using below jar for openlineage.

- -

openlineage-spark-0.22.0.jar

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-05 02:41:26
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/1828

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-08 04:05:26
-
-

*Thread Reply:* Hi @Paweł Leszczyński any updates on this issue?

- -

Also, OL is not giving column level lineage for group by operations on tables. Is this expected?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-08 04:07:04
-
-

*Thread Reply:* Hi @Sai, https://github.com/OpenLineage/OpenLineage/pull/1830 should fix duplication issue

-
- - - - - - - -
-
Labels
- documentation, integration/spark -
- -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-08 04:08:06
-
-

*Thread Reply:* this would be part of next release?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-08 04:08:30
-
-

*Thread Reply:* Regarding column lineage & group by issue, I think it's something on databricks side -> we do have an open issue for that #1821

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-08 04:09:24
-
-

*Thread Reply:* once #1830 is reviewed and merged, it will be the part of the next relase

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-08 04:11:01
-
-

*Thread Reply:* sure.. thanks @Paweł Leszczyński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-16 03:27:01
-
-

*Thread Reply:* @Paweł Leszczyński I have used the latest jar (0.25.0) and still this issue persists. I see two events for same input/output lineage.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas - (xsist10@gmail.com) -
-
2023-05-03 03:55:44
-
-

Has anyone used Open Lineage for application lineage? I'm particularly interested in how if/how you handled service boundaries like APIs and Kafka topics and what Dataset Naming (URI) you used.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas - (xsist10@gmail.com) -
-
2023-05-03 04:06:37
-
-

*Thread Reply:* For example, MySQL is stored as producer + host + port + database + table as something like <mysql://db.foo.com:6543/metrics.orders> -For an API (especially one following REST conditions), I was thinking something like method + host + port + path or GET <https://api.service.com:433/v1/users>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-03 10:13:25
-
-

*Thread Reply:* Hi Thomas, thanks for asking about this — it sounds cool! I don’t know of others working on this kind of thing, but I’ve been developing a SQLAlchemy integration and have been experimenting with job naming — which I realize isn’t exactly what you’re working on. Hopefully others will chime in here, but in the meantime, would you be willing to create an issue about this? It seems worth discussing how we could expand the spec for this kind of use case.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas - (xsist10@gmail.com) -
-
2023-05-03 10:58:32
-
-

*Thread Reply:* I suspect this will definitely be a bigger discussion. Let me ponder on the problem a bit more and come back with something a bit more concrete.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-03 10:59:21
-
-

*Thread Reply:* Looking forward to hearing more!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas - (xsist10@gmail.com) -
-
2023-05-03 11:05:47
-
-

*Thread Reply:* On a tangential note, does OpenLineage's column level lineage have support for (I see it can be extended but want to know if someone had to map this before): -• Properties as a path in a structure (like a JSON structure, Avro schema, protobuf, etc) maybe using something like JSON Path or XPath notation. -• Fragments (when a column is a JSON blob, there is an entire sub-structure that needs to be described) -• Transformation description (how an input affects an output. Is it a direct copy of the value or is it part of a formula)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-03 11:22:21
-
-

*Thread Reply:* I don’t know, but I’ll ping some folks who might.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-04 03:24:01
-
-

*Thread Reply:* Hi @Thomas. Column-lineage support currently does not include json fields. We have included in specification fields like transformationDescription and transformationType to store a string representation of the transformation applied and its type like IDENTITY|MASKED. However, those fields aren't filled within Spark integration at the moment.

- - - -
- 🙌 Thomas, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-03 09:54:57
-
-

@channel -We released OpenLineage 0.24.0, including: -Additions: -• Support custom transport types #1795 @nataliezeller1 -• Airflow: dbt Cloud integration #1418 @howardyoo -• Spark: support dataset name modification using regex #1796 @pawel-big-lebowski -Plus bug fixes and more. -Thanks to all the contributors! -For the bug fixes and details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.24.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.23.0...0.24.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🎉 Harel Shein, tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
GreetBot - -
-
2023-05-03 10:45:32
-
-

@GreetBot has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-04 11:25:23
-
-

@channel -This month’s TSC meeting is next Thursday, May 11th, at 10:00 am PT. The tentative agenda will be on the wiki. More info and the meeting link can be found on the website. All are welcome! Also, feel free to reply or DM me with discussion topics, agenda items, etc.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-05 12:11:37
-
-

Hello all, noticed that openlineage is not able to give column level lineage if there is a groupby operation on a spark dataframe. Has anyone else faced this issue and have any fixes or workarounds? Apache Spark 3.0.1 and Openlineage version 1 are being used. Also tried on Spark version 3.3.0

- -

Log4j error details follow:

- -

23/05/05 18:09:11 ERROR ColumnLevelLineageUtils: Error when invoking static method 'buildColumnLineageDatasetFacet' for Spark3 -java.lang.reflect.InvocationTargetException - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) - at java.lang.reflect.Method.invoke(Method.java:498) - at io.openlineage.spark.agent.lifecycle.plan.column.ColumnLevelLineageUtils.buildColumnLineageDatasetFacet(ColumnLevelLineageUtils.java:35) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.lambda$buildOutputDatasets$21(OpenLineageRunEventBuilder.java:424) - at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) - at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) - at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) - at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) - at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) - at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) - at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildOutputDatasets(OpenLineageRunEventBuilder.java:437) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.populateRun(OpenLineageRunEventBuilder.java:296) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:279) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:222) - at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:70) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:91) - at java.util.Optional.ifPresent(Optional.java:159) - at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:91) - at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:82) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:102) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:39) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:39) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:118) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:102) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:107) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:107) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:98) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1639) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:98) -Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.resultId()Lorg/apache/spark/sql/catalyst/expressions/ExprId; - at io.openlineage.spark3.agent.lifecycle.plan.column.ExpressionDependencyCollector.traverseExpression(ExpressionDependencyCollector.java:79) - at io.openlineage.spark3.agent.lifecycle.plan.column.ExpressionDependencyCollector.lambda$traverseExpression$4(ExpressionDependencyCollector.java:74) - at java.util.Iterator.forEachRemaining(Iterator.java:116) - at scala.collection.convert.Wrappers$IteratorWrapper.forEachRemaining(Wrappers.scala:31) - at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) - at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) - at io.openlineage.spark3.agent.lifecycle.plan.column.ExpressionDependencyCollector.traverseExpression(ExpressionDependencyCollector.java:74) - at io.openlineage.spark3.agent.lifecycle.plan.column.ExpressionDependencyCollector.lambda$null$2(ExpressionDependencyCollector.java:60) - at java.util.LinkedList$LLSpliterator.forEachRemaining(LinkedList.java:1235) - at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) - at io.openlineage.spark3.agent.lifecycle.plan.column.ExpressionDependencyCollector.lambda$collect$3(ExpressionDependencyCollector.java:60) - at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:285) - at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:286) - at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:286) - at scala.collection.Iterator.foreach(Iterator.scala:943) - at scala.collection.Iterator.foreach$(Iterator.scala:943) - at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) - at scala.collection.IterableLike.foreach(IterableLike.scala:74) - at scala.collection.IterableLike.foreach$(IterableLike.scala:73) - at scala.collection.AbstractIterable.foreach(Iterable.scala:56) - at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:286) - at io.openlineage.spark3.agent.lifecycle.plan.column.ExpressionDependencyCollector.collect(ExpressionDependencyCollector.java:38) - at io.openlineage.spark3.agent.lifecycle.plan.column.ColumnLevelLineageUtils.collectInputsAndExpressionDependencies(ColumnLevelLineageUtils.java:70) - at io.openlineage.spark3.agent.lifecycle.plan.column.ColumnLevelLineageUtils.buildColumnLineageDatasetFacet(ColumnLevelLineageUtils.java:40) - ... 36 more

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-08 07:38:19
-
-

*Thread Reply:* Hi @Harshini Devathi, I think this the same as issue: https://github.com/OpenLineage/OpenLineage/issues/1821

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- -
-
Labels
- integration/spark, integration/databricks -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-08 19:44:26
-
-

*Thread Reply:* Thank you @Paweł Leszczyński. So, is this an issue with databricks. The issue thread says that it was able to work on AWS Glue. If so, is there some kind of solution to make it work on Databricks?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-05 12:22:06
-
-

Hello all, is there a way to get lineage in azure synapse analytics with openlineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-05-09 20:17:38
-
-

*Thread Reply:* maybe @Will Johnson knows?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-08 07:06:37
-
-

Hi Team,

- -

I have a usecase where we are connecting to Azure sql database from databricks to extract, transform and load data to delta tables. I could see the lineage is getting build, but there is no column level lineage through its 1:1 mapping from source. Could you please check and update on this.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-09 10:06:02
-
-

*Thread Reply:* There are few possible issues:

- -
  1. The column-level lineage is not implemented for particular part of Spark LogicalPlan
  2. Azure SQL or Databricks have their own implementations of some Spark class, which does not exactly match our extractor. We've seen that happen
  3. You're using SQL JDBC connection with SELECT ** - in which case we can't do anything for now, since we don't know the input columns.
  4. Possibly something else 🙂 @Paweł Leszczyński might have an idea -To fully understand the issue, we'd have to see logs, LogicalPlan of the Spark job, or the job code itself
  5. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-10 02:35:32
-
-

*Thread Reply:* @Sai, providing a short code snippet that is able to reproduce this would be super helpful in examining that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-10 02:59:24
-
-

*Thread Reply:* sure Pawel -Will share the code I used in sometime

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-10 03:37:54
-
-

*Thread Reply:* Here is the code we use.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-16 03:23:13
-
-

*Thread Reply:* Hi Team, Any updates on this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-16 03:23:37
-
-

*Thread Reply:* I tried with putting a sql query having column names in it, still the lineage didn't show up..

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-09 10:00:39
-
-

2023-05-09T13:37:48.526698281Z java.lang.ClassCastException: class org.apache.spark.scheduler.ShuffleMapStage cannot be cast to class java.lang.Boolean (org.apache.spark.scheduler.ShuffleMapStage is in unnamed module of loader 'app'; java.lang.Boolean is in module java.base of loader 'bootstrap') -2023-05-09T13:37:48.526703550Z at scala.runtime.BoxesRunTime.unboxToBoolean(BoxesRunTime.java:87) -2023_05_09T13:37:48.526707874Z at scala.collection.LinearSeqOptimized.forall(LinearSeqOptimized.scala:85) -2023_05_09T13:37:48.526712381Z at scala.collection.LinearSeqOptimized.forall$(LinearSeqOptimized.scala:82) -2023_05_09T13:37:48.526716848Z at scala.collection.immutable.List.forall(List.scala:91) -2023_05_09T13:37:48.526723183Z at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.registerJob(OpenLineageRunEventBuilder.java:181) -2023_05_09T13:37:48.526727604Z at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.setActiveJob(SparkSQLExecutionContext.java:152) -2023_05_09T13:37:48.526732292Z at java.base/java.util.Optional.ifPresent(Unknown Source) -2023-05-09T13:37:48.526736352Z at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$onJobStart$10(OpenLineageSparkListener.java:150) -2023_05_09T13:37:48.526740471Z at java.base/java.util.Optional.ifPresent(Unknown Source) -2023-05-09T13:37:48.526744887Z at io.openlineage.spark.agent.OpenLineageSparkListener.onJobStart(OpenLineageSparkListener.java:147) -2023_05_09T13:37:48.526750258Z at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37) -2023_05_09T13:37:48.526753454Z at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) -2023_05_09T13:37:48.526756235Z at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) -2023_05_09T13:37:48.526759315Z at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) -2023_05_09T13:37:48.526762133Z at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) -2023_05_09T13:37:48.526764941Z at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) -2023_05_09T13:37:48.526767739Z at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) -2023_05_09T13:37:48.526776059Z at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) -2023_05_09T13:37:48.526778937Z at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) -2023_05_09T13:37:48.526781728Z at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) -2023_05_09T13:37:48.526786986Z at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) -2023_05_09T13:37:48.526789893Z at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) -2023_05_09T13:37:48.526792722Z at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446) -2023_05_09T13:37:48.526795463Z at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) -Hi, noticing this error message from OL... anyone know why its happening?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-09 10:02:25
-
-

*Thread Reply:* @Anirudh Shrinivason what's your OL and Spark version?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-09 10:03:29
-
-

*Thread Reply:* Some example job would also help, or logs/LogicalPlan 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-09 10:05:54
-
-

*Thread Reply:* OL version is 0.23.0 and spark version is 3.3.1

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-09 11:00:22
-
-

*Thread Reply:* Hmm actually, it seems like the error is intermittent actually. I ran the same job again, but did not notice any errors this time...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-10 02:27:19
-
-

*Thread Reply:* This is interesting and it happens within a line: -job.finalStage().parents().forall(toScalaFn(stage -&gt; stageMap.put(stage.id(), stage))); -The result of stageMap.put is Stage and for some reason which I don't undestand it tries doing unboxToBoolean . We could rewrite that to: -job.finalStage().parents().forall(toScalaFn(stage -&gt; { -stageMap.put(stage.id(), stage) -return true; -})); -but this is so weird that it is intermittent and I don't get why is it happening.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-11 02:22:25
-
-

*Thread Reply:* @Anirudh Shrinivason, please let us know if it is still a valid issue. If so, we can create an issue for that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-11 03:11:13
-
-

*Thread Reply:* Hi @Paweł Leszczyński Sflr. Yeah, I think if we are able to fix this, it'll be better. If this is the dedicated fix, then I can create an issue and raise an MR.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-11 04:12:46
-
-

*Thread Reply:* Opened an issue and PR. Do help check if its okay thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-11 04:29:33
-
-

*Thread Reply:* please run ./gradlew spotlessApply with Java 8

- - - -
- ✅ Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Pietro Brunetti - (pietrobrunetti89@gmail.com) -
-
2023-05-10 05:49:00
-
-

Hi all, -I’m new to openlineage (and marquez) so I’m trying to figure out if it could be the right option form a client usecase in which: -• a legacy custom data catalog (mongo backend + Java API backend for fronted in angular) -• AS-IS component lineage realations are retrieve in a custom way from the each component’s APIs -• the customer would like to bring in a basic data lineage feature based on already published metadata that represent custom workloads type (batch,streaming,interactive ones) + data access pattern (no direct relation with the datasources right now but only a abstraction layer upon them) -I’d like to exploit directly Marquez as the metastore to publish metadata about datasource, workload (the workload is the declaration + business logic code deployed into the customer platform) once the component is deployed (e.g. the service that exposes the specific access pattern, or the workload custom declaration), but I saw the openlinage spec is based on strictly coupling between run,job and datasource; I mean I want to be able to publish one item at a time and then (maybe in a future release of the customer product) be able to exploit runtime lineage also

- -

Am I in the right place? -Thanks anyway :)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-10 07:36:33
-
-

*Thread Reply:* > I mean I want to be able to publish one item at a time and then (maybe in a future release of the customer product) be able to exploit runtime lineage also -This is not something that we support yet - there are definitely a lot of plans and preliminary work for that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Pietro Brunetti - (pietrobrunetti89@gmail.com) -
-
2023-05-10 07:57:44
-
-

*Thread Reply:* Thanks for the response, btw I already took a look at the current capabilities provided by openlineage, so my “hidden” question is how do achieve what the customer want to in order to be integrated in some way with openalineage+marquez? -should I choose between make or buy (between already supported platforms) and then try to align “static” (aka declarative) lineage metadata within the openlinage conceptual model?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-10 11:04:20
-
-

@channel -This month’s TSC meeting is tomorrow at 10am PT. All are welcome! https://openlineage.slack.com/archives/C01CK9T7HKR/p1683213923529529

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-05-11 12:59:42
-
-

Does anyone here have experience with vendors in this space like Atlan or Manta? I’m advocating pretty heavily for OpenLineage at my company and have a strong suspicion that the LoE of enabling an equivalent solution from a vendor is equal or greater than that of OL/Marquez. Curious if anyone has first-hand experience with these tools they might be willing to share?

- - - -
- 👋 Eric Veleker -
- -
- 👀 Pietro Brunetti -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2023-05-11 13:58:28
-
-

*Thread Reply:* Hi John. Great question! [full disclosure, I am with Manta 🙂 ]. I'll let others answer as to their experience with ourselves or many other vendors that provide lineage, but want to mention that a variety of our customers are finding it beneficial to bring code based static lineage together with the event-based runtime lineage that OpenLineage provides. This gives them the best of both worlds, for analyzing the lineage of their existing systems, where rich parsers already exist (for everything from legacy ETL tools, reporting tools, rdbms, etc.), to newer or home-grown technologies where applying OpenLineage is a viable alternative.

- - - -
- 👍 John Lukenoff -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-05-11 14:12:04
-
-

*Thread Reply:* @Ernie Ostic do you see a single front-runner in the static lineage space? The static/event-based situation you describe is exactly the product roadmap I'm seeing here at Fivetran and I'm wondering if there's an opportunity to drive consensus towards a best-practice solution. If I'm not mistaken weren't there plans to start supporting non-run-based events in OL as well?

- - - -
- 👋 Eric Veleker -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-05-11 14:16:34
-
-

*Thread Reply:* I definitely like the idea of a 3rd party solution being complementary to OSS tools we can maintain ourselves while allowing us to offload maintenance effort where possible. Currently I have strong opinions on both sides of the build vs. buy aisle and this seems like the best of both worlds.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-05-11 14:52:40
-
-

*Thread Reply:* @Brad Paskewitz that’s 100% our plan to extend the OL spec to support “run-less” events. We want to collect that static metadata for Datasets and Jobs outside of the context of a run through OpenLineage. -happy to get your feedback here as well: https://github.com/OpenLineage/OpenLineage/pull/1839

- - - -
- :gratitude_thank_you: Brad Paskewitz -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Eric Veleker - (eric@atlan.com) -
-
2023-05-11 14:57:46
-
-

*Thread Reply:* Hi @John Lukenoff. Here at Atlan we've been working with the OpenLineage community for quite some time to unlock the use case you describe. These efforts are adjacent to our ongoing integration with Fivetran. Happy to connect and give you a demo of what we've built and dig into your use case specifics.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-05-12 11:26:32
-
-

*Thread Reply:* Thanks all! These comments are really informative, it’s exciting to hear about vendors leaning into the project to let us continue to benefit from the tremendous progress being made by the community. Had a great discussion with Atlan yesterday and plan to connect with Manta next week to discuss our use-cases.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2023-05-12 12:34:32
-
-

*Thread Reply:* Reach out anytime, John. @John Lukenoff Looking forward to engaging further with you on these topics!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-12 11:15:10
-
-

Hello all, I would like to have a new release of Openlineage as the new code base seems to have some issues fixed. I need these fixes for my project.

- - - -
- ➕ Michael Robinson, Maciej Obuchowski, Julien Le Dem, Jakub Dardziński, Anirudh Shrinivason, Harshini Devathi, Paweł Leszczyński, pankaj koti -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-12 11:19:02
-
-

*Thread Reply:* Thank you for requesting an OpenLineage release. As stated here, three +1s from committers will authorize an immediate release. Our policy is not to release on Fridays, so the earliest we could initiate would be Monday.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-12 13:12:43
-
-

*Thread Reply:* A release on Monday is totally fine @Michael Robinson.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-15 08:37:39
-
-

*Thread Reply:* The release will be initiated today. Thanks @Harshini Devathi

- - - -
- 👍 Anirudh Shrinivason, Harshini Devathi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-16 20:16:07
-
-

*Thread Reply:* Appreciate it @Michael Robinson and thanks to all the committers for the prompt response

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-15 12:09:24
-
-

@channel -We released OpenLineage 0.25.0, including: -Additions: -• Spark: merge into query support #1823 @pawel-big-lebowski -Fixes: -• Spark: fix JDBC query handling #1808 @nataliezeller1 -• Spark: filter Delta adaptive plan events #1830 @pawel-big-lebowski -• Spark: fix Java class cast exception #1844 @Anirudh181001 -• Flink: include missing fields of Openlineage events #1840 @pawel-big-lebowski -Plus doc changes and more. -Thanks to all the contributors! -For the details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.25.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.24.0...0.25.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Jakub Dardziński, Sai, pankaj koti, Paweł Leszczyński, Perttu Salonen, Maciej Obuchowski, Fraser Marlow, Ross Turk, Harshini Devathi, tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-16 14:03:01
-
-

@channel -If you’re planning on being in San Francisco at the end of June — perhaps for this year’s Data+AI Summit — please stop by Astronomer’s offices on California Street on 6/27 for the first SF OpenLineage Meetup. We’ll be discussing spec changes planned for OpenLineage v1.0.0, progress on Airflow AIP 53, and more. Plus, dinner will be provided! For more info and to sign up, check out the OL blog. Join us!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🙌 alexandre bergere, Anirudh Shrinivason, Harel Shein, Willy Lulciuc, Jarek Potiuk, Ross Turk, John Lukenoff, tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-05-16 14:13:16
-
-

*Thread Reply:* Can’t wait! 💯

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-17 00:09:23
-
-

Hi, I've been noticing this error that is intermittently popping up in some of the spark jobs: -AsyncEventQueue: Dropping event from queue appStatus. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler. -spark.scheduler.listenerbus.eventqueue.size Increasing this spark config did not help either. -Any ideas on how to mitigate this issue? Seeing this in spark 3.1.2 btw

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-17 01:58:28
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, are you able to send the OL events to console. This would let us confirm if the issue is related with event generation or emitting it and waiting for the backend to repond.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-17 01:59:03
-
-

*Thread Reply:* Ahh okay sure. Let me see if I can do that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-17 01:52:15
-
-

Hi Team

- -

We are seeing an issue with OL configured cluster where delta table merge is failing with below error. It is running fine when we run with other clusters where OL is not configured. I ran it multiple times assuming its intermittent issue with memory, but it keeps on failing with same error. Attached the code for reference. We are using the latest release (0.25.0)

- -

org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError

- -

@Paweł Leszczyński @Michael Robinson

- -
- - - - - - - -
- - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-19 03:55:51
-
-

*Thread Reply:* Hi @Paweł Leszczyński

- -

Thanks for fixing the issue and with new release merge is working. But I could not see any input and output datasets for this. Let me know if you need any further details to look into this.

- -
},
-"job": {
-    "namespace": "openlineage_poc",
-    "name": "spark_ol_integration_execute_merge_into_command_edge",
-    "facets": {}
-},
-"inputs": [],
-"outputs": [],
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-19 04:00:01
-
-

*Thread Reply:* Oh man, it's just that vanilla spark differs from the one available in databricks platform. our integration tests do verify behaviour on vanilla spark which still leaves a possibility for inconsistency. will need to get back to it then at some time.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-06-02 02:11:28
-
-

*Thread Reply:* Hi @Paweł Leszczyński

- -

Did you get chance to look into this issue?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 02:13:18
-
-

*Thread Reply:* Hi Sai, I am going back to spark. I am working on support for Spark 3.4, which is going to add some event filtering on internal delta operations that trigger unncecessarly the events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 02:13:28
-
-

*Thread Reply:* this may be releated to issue you created

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 02:14:13
-
-

*Thread Reply:* I do have planned creating integration test for databricks which will be helpful to tackle the issues you raised

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 02:14:27
-
-

*Thread Reply:* so yes, I am looking at the Spark

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-06-02 02:20:06
-
-

*Thread Reply:* thanks much Pawel.. I am looking more into the merge part as first priority as we use is frequently.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 02:21:01
-
-

*Thread Reply:* I know, this is important.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 02:21:14
-
-

*Thread Reply:* It just need still some time.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 02:21:46
-
-

*Thread Reply:* thank you for your patience and being so proactive on those issues.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-06-02 02:22:12
-
-

*Thread Reply:* no problem.. Please do keep us posted with updates..

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-17 10:47:27
-
-

Our recent Openlineage release (0.25.0) proved there are many users that use Openlineage on databricks, which is incredible. I am super happy to know that, although we realised that as a side effect of a bug. Sorry for that.

- -

I would like to opt for a new release which contains PR #1858 and should unblock databricks users.

- - - -
- ➕ Paweł Leszczyński, Maciej Obuchowski, Harshini Devathi, Jakub Dardziński, Sai, Anirudh Shrinivason, Anbarasi -
- -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-18 10:26:48
-
-

*Thread Reply:* The release request has been approved and will be initiated shortly.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-17 22:49:41
-
-

Actually, I noticed a few other stack overflow errors on 0.25.0. Let me raise an issue. Could we cut a release once this bug are fixed too please?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-18 02:29:55
-
-

*Thread Reply:* Hi Anirudh, I saw your issue and I think it is the same one as solved within #1858. Are you able to reproduce it on a version built on the top of main?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-18 06:21:05
-
-

*Thread Reply:* Hi I haven't managed to try with the main branch. But if its the same error then all's good! If the error resurfaces then we can look into it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lovenish Goyal - (lovenishgoyal@gmail.com) -
-
2023-05-18 02:21:13
-
-

Hi All,

- -

We are in POC phase OpenLineage integration with our core DBT, can anyone help me with document to start with.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-18 02:28:31
-
-

*Thread Reply:* I know this one: https://openlineage.io/docs/integrations/dbt

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lovenish Goyal - (lovenishgoyal@gmail.com) -
-
2023-05-18 02:41:39
-
-

*Thread Reply:* Hi @Paweł Leszczyński Thanks for the revert, I tried same but facing below issue

- -

requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url:

- -

Looks like I need to start the service

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-05-18 02:44:09
-
-

*Thread Reply:* @Lovenish Goyal, exactly. You need to start Marquez. -More about it: https://marquezproject.ai/quickstart

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-05-18 10:27:52
-
-

*Thread Reply:* @Lovenish Goyal how are you running dbt core currently?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lovenish Goyal - (lovenishgoyal@gmail.com) -
-
2023-05-19 01:55:20
-
-

*Thread Reply:* Trying but facing issue while running marquezproject @Jakub Dardziński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lovenish Goyal - (lovenishgoyal@gmail.com) -
-
2023-05-19 01:56:03
-
-

*Thread Reply:* @Harel Shein we have created custom docker image of DBT + Airflow and running it on an EC2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-05-19 09:05:31
-
-

*Thread Reply:* for running dbt core on Airflow, we have a utility that helps develop dbt natively on Airflow. There’s also built in support for collecting lineage if you have the airflow-openlineage provider installed. -https://astronomer.github.io/astronomer-cosmos/#quickstart

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-05-19 09:06:30
-
-

*Thread Reply:* RE issues running Marquez, can you share what those are? I’m guessing that since you are running both of them in individual docker images, the airflow deployment might not be able to communicate with the Marquez endpoints?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-05-19 09:06:53
-
-

*Thread Reply:* @Harel Shein I've already helped with running Marquez 🙂

- - - -
- :first_place_medal: Harel Shein, Paweł Leszczyński, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anbarasi - (anbujothi@gmail.com) -
-
2023-05-18 02:29:53
-
-

@Paweł Leszczyński We are facing the following issue with Azure databricks. When we use aggregate functions in databricks notebooks, Open lineage is not able to provide column level lineage. I understand its an existing issue. Can you please let me know in which release this issue will be fixed ? It is one of the most needed feature for us to implement openlineage in our current project. Kindly let me know

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-18 02:34:35
-
-

*Thread Reply:* I am not sure if this is the same. If you see OL events collected with column-lineage missing, then it's a different one.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-18 02:41:11
-
-

*Thread Reply:* Please also be aware, that it is extremely helpful to investigate the issues on your own before creating them.

- -

Our integration traverses spark's logical plans and extracts lineage events from plan nodes that it understands. Some plan nodes are not supported yet and, from my experience, when working on an issue, 80% of time is spent on reproducing the scenario.

- -

So, if you are able to produce a minimal amount of spark code that reproduces an issue, this can be extremely helpful and significantly speed up resolution time.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anbarasi - (anbujothi@gmail.com) -
-
2023-05-18 03:52:30
-
-

*Thread Reply:* @Paweł Leszczyński Thanks for the prompt response.

- -

Provided sample codes with and without using aggregate functions and its respective lineage events for reference.

- -
  1. Please find the code without using aggregate function: - finaldf=spark.sql(""" - select productid - ,OrderQty as TotalOrderQty - ,ReceivedQty as TotalReceivedQty - ,StockedQty as TotalStockedQty - ,RejectedQty as TotalRejectedQty - from openlineagepoc.purchaseorder - --group by productid - order by productid""")

    - -
       final_df.write.mode("overwrite").saveAsTable("openlineage_poc.productordertest1")
    -
  2. -
- -

Please find the Openlineage Events for the Input, Ouput datasets. We could find the column lineage in this.

- -

"inputs": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/gold/purchaseorder", - "facets": { - "dataSource": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", - "name": "dbfs", - "uri": "dbfs" - }, - "schema": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", - "fields": [ - { - "name": "PurchaseOrderID", - "type": "integer" - }, - { - "name": "PurchaseOrderDetailID", - "type": "integer" - }, - { - "name": "DueDate", - "type": "timestamp" - }, - { - "name": "OrderQty", - "type": "short" - }, - { - "name": "ProductID", - "type": "integer" - }, - { - "name": "UnitPrice", - "type": "decimal(19,4)" - }, - { - "name": "LineTotal", - "type": "decimal(19,4)" - }, - { - "name": "ReceivedQty", - "type": "decimal(8,2)" - }, - { - "name": "RejectedQty", - "type": "decimal(8,2)" - }, - { - "name": "StockedQty", - "type": "decimal(9,2)" - }, - { - "name": "RevisionNumber", - "type": "integer" - }, - { - "name": "Status", - "type": "integer" - }, - { - "name": "EmployeeID", - "type": "integer" - }, - { - "name": "NationalIDNumber", - "type": "string" - }, - { - "name": "JobTitle", - "type": "string" - }, - { - "name": "Gender", - "type": "string" - }, - { - "name": "MaritalStatus", - "type": "string" - }, - { - "name": "VendorID", - "type": "integer" - }, - { - "name": "ShipMethodID", - "type": "integer" - }, - { - "name": "ShipMethodName", - "type": "string" - }, - { - "name": "ShipMethodrowguid", - "type": "string" - }, - { - "name": "OrderDate", - "type": "timestamp" - }, - { - "name": "ShipDate", - "type": "timestamp" - }, - { - "name": "SubTotal", - "type": "decimal(19,4)" - }, - { - "name": "TaxAmt", - "type": "decimal(19,4)" - }, - { - "name": "Freight", - "type": "decimal(19,4)" - }, - { - "name": "TotalDue", - "type": "decimal(19,4)" - } - ] - }, - "symlinks": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet", - "identifiers": [ - { - "namespace": "/mnt/dlzones/warehouse/openlineagepoc/gold", - "name": "openlineagepoc.purchaseorder", - "type": "TABLE" - } - ] - } - }, - "inputFacets": {} - } - ], - "outputs": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/productordertest1", - "facets": { - "dataSource": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", - "name": "dbfs", - "uri": "dbfs" - }, - "schema": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", - "fields": [ - { - "name": "productid", - "type": "integer" - }, - { - "name": "TotalOrderQty", - "type": "short" - }, - { - "name": "TotalReceivedQty", - "type": "decimal(8,2)" - }, - { - "name": "TotalStockedQty", - "type": "decimal(9,2)" - }, - { - "name": "TotalRejectedQty", - "type": "decimal(8,2)" - } - ] - }, - "storage": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/StorageDatasetFacet.json#/$defs/StorageDatasetFacet", - "storageLayer": "unity", - "fileFormat": "parquet" - }, - "columnLineage": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-1/ColumnLineageDatasetFacet.json#/$defs/ColumnLineageDatasetFacet", - "fields": { - "productid": { - "inputFields": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/gold/purchaseorder", - "field": "ProductID" - } - ] - }, - "TotalOrderQty": { - "inputFields": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/gold/purchaseorder", - "field": "OrderQty" - } - ] - }, - "TotalReceivedQty": { - "inputFields": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/gold/purchaseorder", - "field": "ReceivedQty" - } - ] - }, - "TotalStockedQty": { - "inputFields": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/gold/purchaseorder", - "field": "StockedQty" - } - ] - }, - "TotalRejectedQty": { - "inputFields": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/gold/purchaseorder", - "field": "RejectedQty" - } - ] - } - } - }, - "symlinks": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet", - "identifiers": [ - { - "namespace": "/mnt/dlzones/warehouse/openlineagepoc", - "name": "openlineagepoc.productordertest1", - "type": "TABLE" - } - ] - }, - "lifecycleStateChange": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet", - "lifecycleStateChange": "OVERWRITE" - } - }, - "outputFacets": {} - } - ]

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anbarasi - (anbujothi@gmail.com) -
-
2023-05-18 03:55:04
-
-

*Thread Reply:* 2. Please find the code using aggregate function:

- -
    final_df=spark.sql("""
-    select productid
-    ,sum(OrderQty) as TotalOrderQty
-    ,sum(ReceivedQty) as TotalReceivedQty
-    ,sum(StockedQty) as TotalStockedQty
-    ,sum(RejectedQty) as TotalRejectedQty
-    from openlineage_poc.purchaseorder
-    group by productid
-    order by productid""")
-
-    final_df.write.mode("overwrite").saveAsTable("openlineage_poc.productordertest2")
-
- -

Please find the Openlineage Events for the Input, Ouput datasets. We couldnt find the column lineage in output section. Please find the sample

- -

"inputs": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/gold/purchaseorder", - "facets": { - "dataSource": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", - "name": "dbfs", - "uri": "dbfs" - }, - "schema": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", - "fields": [ - { - "name": "PurchaseOrderID", - "type": "integer" - }, - { - "name": "PurchaseOrderDetailID", - "type": "integer" - }, - { - "name": "DueDate", - "type": "timestamp" - }, - { - "name": "OrderQty", - "type": "short" - }, - { - "name": "ProductID", - "type": "integer" - }, - { - "name": "UnitPrice", - "type": "decimal(19,4)" - }, - { - "name": "LineTotal", - "type": "decimal(19,4)" - }, - { - "name": "ReceivedQty", - "type": "decimal(8,2)" - }, - { - "name": "RejectedQty", - "type": "decimal(8,2)" - }, - { - "name": "StockedQty", - "type": "decimal(9,2)" - }, - { - "name": "RevisionNumber", - "type": "integer" - }, - { - "name": "Status", - "type": "integer" - }, - { - "name": "EmployeeID", - "type": "integer" - }, - { - "name": "NationalIDNumber", - "type": "string" - }, - { - "name": "JobTitle", - "type": "string" - }, - { - "name": "Gender", - "type": "string" - }, - { - "name": "MaritalStatus", - "type": "string" - }, - { - "name": "VendorID", - "type": "integer" - }, - { - "name": "ShipMethodID", - "type": "integer" - }, - { - "name": "ShipMethodName", - "type": "string" - }, - { - "name": "ShipMethodrowguid", - "type": "string" - }, - { - "name": "OrderDate", - "type": "timestamp" - }, - { - "name": "ShipDate", - "type": "timestamp" - }, - { - "name": "SubTotal", - "type": "decimal(19,4)" - }, - { - "name": "TaxAmt", - "type": "decimal(19,4)" - }, - { - "name": "Freight", - "type": "decimal(19,4)" - }, - { - "name": "TotalDue", - "type": "decimal(19,4)" - } - ] - }, - "symlinks": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet", - "identifiers": [ - { - "namespace": "/mnt/dlzones/warehouse/openlineagepoc/gold", - "name": "openlineagepoc.purchaseorder", - "type": "TABLE" - } - ] - } - }, - "inputFacets": {} - } - ], - "outputs": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/productordertest2", - "facets": { - "dataSource": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", - "name": "dbfs", - "uri": "dbfs" - }, - "schema": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", - "fields": [ - { - "name": "productid", - "type": "integer" - }, - { - "name": "TotalOrderQty", - "type": "long" - }, - { - "name": "TotalReceivedQty", - "type": "decimal(18,2)" - }, - { - "name": "TotalStockedQty", - "type": "decimal(19,2)" - }, - { - "name": "TotalRejectedQty", - "type": "decimal(18,2)" - } - ] - }, - "storage": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/StorageDatasetFacet.json#/$defs/StorageDatasetFacet", - "storageLayer": "unity", - "fileFormat": "parquet" - }, - "symlinks": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet", - "identifiers": [ - { - "namespace": "/mnt/dlzones/warehouse/openlineagepoc", - "name": "openlineagepoc.productordertest2", - "type": "TABLE" - } - ] - }, - "lifecycleStateChange": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet", - "lifecycleStateChange": "OVERWRITE" - } - }, - "outputFacets": {} - } - ]

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-18 04:09:17
-
-

*Thread Reply:* amazing. https://github.com/OpenLineage/OpenLineage/issues/1861

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anbarasi - (anbujothi@gmail.com) -
-
2023-05-18 04:11:56
-
-

*Thread Reply:* Thanks for considering the request and looking into it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-18 13:12:35
-
-

@channel -We released OpenLineage 0.26.0, including: -Additions: -• Proxy: Fluentd proxy support (experimental) #1757 @pawel-big-lebowski -Changes: -• Python client: use Hatchling over setuptools to orchestrate Python env setup #1856 @gaborbernat -Fixes: -• Spark: fix logicalPlan serialization issue on Databricks #1858 @pawel-big-lebowski -Plus an additional fix, doc changes and more. -Thanks to all the contributors, including new contributor @gaborbernat! -For the details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.26.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.25.0...0.26.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- ❤️ Paweł Leszczyński, Maciej Obuchowski, Anirudh Shrinivason, Peter Hicks, pankaj koti -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-05-18 14:42:49
-
-

Hi Team , can someone please address https://github.com/OpenLineage/OpenLineage/issues/1866.

-
- - - - - - - -
-
Labels
- proposal -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-05-18 20:13:09
-
-

*Thread Reply:* Hi @Bramha Aelem I replied in the ticket. Thank you for opening it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-05-18 21:15:30
-
-

*Thread Reply:* Hi @Julien Le Dem - Thanks for quick response. I replied in the ticket. Please let me know if you need any more details.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-19 02:13:57
-
-

*Thread Reply:* Hi @Bramha Aelem - asked for more details in the ticket.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-05-22 11:08:58
-
-

*Thread Reply:* Hi @Paweł Leszczyński - I replied with necessary details in the ticket. Please let me know if you need any more details.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-05-25 15:22:42
-
-

*Thread Reply:* Hi @Paweł Leszczyński - any further updates on issue?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-26 01:56:47
-
-

*Thread Reply:* hi @Bramha Aelem, i was out of office for a few days. will get back into this soon. thanks for update.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-05-27 18:46:27
-
-

*Thread Reply:* Hi @Paweł Leszczyński -Thanks for your reply. will wait for your response to proceed further on the issue.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-06-02 19:29:08
-
-

*Thread Reply:* Hi @Paweł Leszczyński -Hope you are doing well. Did you get a chance to look into the samples which are provided in the ticket. Kindly let me know your observations/recommendations.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-06-09 12:43:54
-
-

*Thread Reply:* Hi @Paweł Leszczyński - Hope you are doing well. Did you get a chance to look into the samples which are provided in the ticket. Kindly let me know your observations/recommendations.

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-07-06 10:29:01
-
-

*Thread Reply:* Hi @Paweł Leszczyński - Good day. Did you get a chance to look into query which I have posted. can you please provide any thoughts on my observation/query.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Doe - (adarsh.pansari@tigeranalytics.com) -
-
2023-05-19 03:42:21
-
-

Hello Everyone, I was trying to integrate openlineage with Jupyter Notebooks, I followed the docs but when I run the sample notebook I am getting an error -23/05/19 07:39:08 ERROR EventEmitter: Could not emit lineage w/ exception -Can someone Please help understand why am I getting this error and the resolution.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-19 03:49:27
-
-

*Thread Reply:* Hello @John Doe, this mostly means there's somehting wrong with your transport config for emitting Openlineage events.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-19 03:49:41
-
-

*Thread Reply:* what do you want to do with the events?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Doe - (adarsh.pansari@tigeranalytics.com) -
-
2023-05-19 04:10:24
-
-

*Thread Reply:* Hi @Paweł Leszczyński, I am working on a PoC to understand the use cases of OL and how it build Lineages.

- -

As for the transport config I am using the codes from the documentation to setup OL. -https://openlineage.io/docs/integrations/spark/quickstart_local

- -

Apart from these I dont have anything else in my nb

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-19 04:38:58
-
-

*Thread Reply:* ok, I am wondering if what you experience isn't similar to issue #1860. Could you try openlineage 0.23.0 to see if get the same error?

- -

<https://github.com/OpenLineage/OpenLineage/issues/1860>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Doe - (adarsh.pansari@tigeranalytics.com) -
-
2023-05-19 10:05:59
-
-

*Thread Reply:* I tried with 0.23.0 still getting the same error

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Doe - (adarsh.pansari@tigeranalytics.com) -
-
2023-05-23 02:34:52
-
-

*Thread Reply:* @Paweł Leszczyński any other way I can try to setup. The issue still persists

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-29 03:53:04
-
-

*Thread Reply:* hmyy, I've just redone steps from https://openlineage.io/docs/integrations/spark/quickstart_local with 0.26.0 and could not reproduce behaviour you encountered.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-05-22 09:41:55
-
-

Hello Team!!! A part of my master thesis's case study was about data lineage in data mesh and how open-source initiatives such as OpenLineage and Marquez can realize this. Can you recommend me some material that can support the writing part of my thesis (more context: I tried to extract lineage events from Snowflake through Airlfow and used Docker Compose on EC2 to connect Airflow and the Marquez webserver)? We will divide the thesis into a few academic papers to make the content more digestible and publish one of them soon hopefully!

- - - -
- 👍 Ernie Ostic, Maciej Obuchowski, Ross Turk, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-22 16:34:00
-
-

*Thread Reply:* Tom, thanks for your question. This is really exciting! I assume you’ve already started checking out the docs, but there are many other resources on the website, as well (on the blog and resources pages in particular). And don’t skip the YouTube channel, where we’ve recently started to upload short, more digestible excerpts from the community meetings. Please keep us updated as you make progress!

- - - -
- 👀 Tom van Eijk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-05-22 16:48:06
-
-

*Thread Reply:* Hi Michael! Thank you so much for sending these resources! I've been working on this thesis for quite some time already and it's almost finished. I just needed some additional information to help in accurately describing some of the processes in OpenLineage and Marquez. Will send you the case study chapter later this week to get some feedback if possible. Keep you posted on things such as publication! Perhaps it can make OpenLineage even more popular than it already is 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-22 16:52:18
-
-

*Thread Reply:* Yes, please share it! Looking forward to checking it out. Super cool!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2023-05-22 09:57:50
-
-

Hi Tom. Good luck. Sounds like a great case study. You might want to compare and contrast various kinds of lineage solutions....all of which complement each other, as well as having their own pros and cons. (...code based lineage via parsing, data similarity lineage, run-time lineage reporting, etc.) ...and then focus on open source and OpenLineage with Marquez in particular.

- - - -
- 🙏 Tom van Eijk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-05-22 10:04:44
-
-

*Thread Reply:* Thank you so much Ernie! That sounds like a very interesting direction to keep in mind during research!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-22 16:37:44
-
-

@channel -For an easily digestible recap of recent events, communications and releases in the community, please sign up for our new monthly newsletter! Look for it in your inbox soon.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-05-22 23:32:16
-
-

looking here https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.json#L64 it show that the schemaURL must be set, but then the examples in https://openlineage.io/getting-started#step-1-start-a-run do not contain it, is this a bug, expected? 😄

-
-
openlineage.io
- - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-23 07:24:09
-
-

*Thread Reply:* yeah, it's a bug

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-05-23 12:00:48
-
-

*Thread Reply:* so it's optional then? 😄 or bug in the example?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-05-23 12:02:09
-
-

I noticed that DataQualityAssertionsDatasetFacet inherits from InputDatasetFacet, https://github.com/OpenLineage/OpenLineage/blob/main/spec/facets/DataQualityAssertionsDatasetFacet.json though I think should do from DatasetFacet like all else 🤔

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-23 14:20:09
-
-

@channel -Two years ago last Saturday, we released the first version of OpenLineage, a test release of the Python client. So it seemed like an appropriate time to share our first annual ecosystem survey, which is both a milestone in the project’s growth and an important effort to set our course. This survey has been designed to help us learn more about who is using OpenLineage, what your lineage needs are, and what new tools you hope the project will support. Thank you in advance for taking the time to share your opinions and vision for the project! (Please note: the survey might seem longer than it actually is due to the large number of optional questions. Not all questions apply to all use cases.)

-
-
Google Docs
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Harel Shein, Maciej Obuchowski, Atif Tahir, Peter Hicks, Tamara Fingerlin, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2023-05-23 18:59:46
-
-

Open Lineage Spark Integration our spark workloads on Spark 2.4 are correctly setting .config("spark.sql.catalogImplementation", "hive") however sql queries for CREATE/INSERT INTO dont recoognize the datasets as “Hive”. As per https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/supported-commands.md USING HIVE is needed for appropriate parsing. Why is that the case ? Why cant HQL format for CREATE/INSERT be supported?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2023-05-23 19:01:43
-
-

*Thread Reply:* @Michael Collado wondering if you could shed some light here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-24 05:39:01
-
-

*Thread Reply:* can you show logical plan of your Spark job? I think using hive is not the most important part, but whether job's LogicalPlan parses to CreateHiveTableAsSelectCommand or InsertIntoHiveTable

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2023-05-24 19:37:02
-
-

*Thread Reply:* It parses into InsertIntoHadoopFsRelationCommand. example -== Optimized Logical Plan == -InsertIntoHadoopFsRelationCommand <s3a://uchmsdev03/default/sharanyaOutputTable>, false, [id#89], Parquet, [serialization.format=1, mergeSchema=false, partitionOverwriteMode=dynamic], Append, CatalogTable( -Database: default -Table: sharanyaoutputtable -Owner: 2700940971 -Created Time: Thu Jun 09 11:13:35 PDT 2022 -Last Access: UNKNOWN -Created By: Spark 3.2.0 -Type: EXTERNAL -Provider: hive -Table Properties: [transient_lastDdlTime=1654798415] -Location: <s3a://uchmsdev03/default/sharanyaOutputTable> -Serde Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe -InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat -OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat -Storage Properties: [serialization.format=1] -Partition Provider: Catalog -Partition Columns: [`id`] -Schema: root - |-- displayName: string (nullable = true) - |-- serialnum: string (nullable = true) - |-- osversion: string (nullable = true) - |-- productfamily: string (nullable = true) - |-- productmodel: string (nullable = true) - |-- id: string (nullable = true) -), org.apache.spark.sql.execution.datasources.CatalogFileIndex@5fe23214, [displayName, serialnum, osversion, productfamily, productmodel, id] -+- Union false, false - :- Relation default.tablea[displayName#84,serialnum#85,osversion#86,productfamily#87,productmodel#88,id#89] parquet - +- Relation default.tableb[displayName#90,serialnum#91,osversion#92,productfamily#93,productmodel#94,id#95] parquet

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2023-05-24 19:39:54
-
-

*Thread Reply:* using spark 3.2 & this is the query -spark.sql(s"INSERT INTO default.sharanyaOutput select ** from (SELECT ** from default.tableA union all " + - s"select ** from default.tableB)")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-05-24 01:09:58
-
-

Is there any example of how sourceCodeLocation / git info can be used from a spark job? What do we need to set to be able to see that as part of metadata?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-24 05:37:06
-
-

*Thread Reply:* I think we can't really get it from Spark context, as Spark jobs are submitted in compiled, jar form, instead of plain text like for example Airflow dags.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-05-25 02:15:35
-
-

*Thread Reply:* How about Jupyter Notebook based spark job?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-25 08:44:18
-
-

*Thread Reply:* I don't think it changes much - but maybe @Paweł Leszczyński knows more

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-25 11:24:21
-
-

@channel -Deprecation notice: support for Airflow 2.1 will end in about two weeks, when it will be removed from testing. The exact date will be announced as we get closer to it — this is just a heads up. After that date, use 2.1 at your own risk! (Note: the next release, 0.27.0, will still support 2.1.)

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-25 11:27:39
-
-

For the OpenLineageSparkListener, is there a way to configure it to send packets locally, e.g. save to a file? (instead of pushing to a URL destination)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
alexandre bergere - (alexandre.pro.bergere@gmail.com) -
-
2023-05-25 12:00:04
-
-

*Thread Reply:* We developed a FileTransport class in order to save locally in json file our metrics if you interested in

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-25 12:00:37
-
-

*Thread Reply:* Does it also save the openlineage information, e.g. inputs/outputs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
alexandre bergere - (alexandre.pro.bergere@gmail.com) -
-
2023-05-25 12:02:07
-
-

*Thread Reply:* yes it save all json information, inputs / ouputs included

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-25 12:03:03
-
-

*Thread Reply:* Yes! then I am very interested. Is there guidance on how to use the FileTransport class?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-25 13:06:22
-
-

*Thread Reply:* @alexandre bergere it would be pretty useful contribution if you can submit it 🙂

- - - -
- 🙌 alexandre bergere -
- -
-
-
-
- - - - - -
-
- - - - -
- -
alexandre bergere - (alexandre.pro.bergere@gmail.com) -
-
2023-05-25 13:08:28
-
-

*Thread Reply:* We are using it on a transformed OpenLineage library we developed ! I'm going to make a PR in order to share it with you :)

- - - -
- 👍 Julien Le Dem, Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-25 13:56:48
-
-

*Thread Reply:* would be great to have. I had it in mind to implement as an enabler for databricks integration tests. great to hear that!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
alexandre bergere - (alexandre.pro.bergere@gmail.com) -
-
2023-05-29 08:19:46
-
-

*Thread Reply:* PR sent: https://github.com/OpenLineage/OpenLineage/pull/1891 🙂 -@Maciej Obuchowski could you tell me how to update the documentation once approved please?

-
- - - - - - - -
-
Labels
- client/python -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-29 08:36:21
-
-

*Thread Reply:* @alexandre bergere we have separate repo for website + docs: https://github.com/OpenLineage/docs

-
- - - - - - - -
-
Website
- <https://openlineage.io/docs> -
- -
-
Stars
- 5 -
- - - - - - - - -
- - - -
- 🙏 alexandre bergere -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-05-25 16:40:26
-
-

Hi Team- When we run databricks job, lot of dbfs path namespaces are getting created. Can someone please let us know how to overwrite the symlink namespaces and link with the spark app name or openlineage namespace marquez UI.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-26 09:09:09
-
-

Hello,

- -

I am looking to connect the common data model in postgres marquez database and Azure Purview (which uses Apache Atlas API's) lineage endpoint. Does anyone have any how-to on this or can point me to some useful links for this?

- -

Thanks in advance.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-26 13:08:56
-
-

*Thread Reply:* I wonder if this blog post might help? https://openlineage.io/blog/openlineage-microsoft-purview

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-26 16:13:38
-
-

*Thread Reply:* This might not fully match your use case, either, but might help: https://learn.microsoft.com/en-us/samples/microsoft/purview-adb-lineage-solution-accelerator/azure-databricks-to-purview-lineage-connector/

-
-
learn.microsoft.com
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-06-01 23:23:49
-
-

*Thread Reply:* Thanks @Michael Robinson

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-05-26 12:44:09
-
-

Are there any constraints on facets? Such as is reasonable to expect that a single job will have a single parent? The schema hints to this by making the parent a single entry; but then one can send different parents for the START and COMPLETE event? 🤔

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-29 05:04:32
-
-

*Thread Reply:* I think, for now such thing is not defined other than by implementation of consumers.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-05-30 10:32:09
-
-

*Thread Reply:* Any reason for that?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-01 10:25:33
-
-

*Thread Reply:* The idea is that for particular run, facets can be attached to any event type.

- -

This has advantages, for example, job that modifies dataset that it's also reading from, can get particular version of dataset it's reading from and attach it on start; it would not work if you tried to do it on complete as the dataset would change by then.

- -

Similarly, if the job is creating dataset, we could not get additional metadata on it, so we can attach those information only on complete.

- -

There are also cases where we want facets to be cumulative. The reason for this are streaming jobs. For example, with Apache Flink, we could emit metadata on each checkpoint (or every N checkpoints) that contain metadata that show us how the job is progressing.

- -

Generally consumers should be agnostic for that, but we don't want to overspecify what consumers should do - as people might want to use OL data in different ways, or even ignore some data we're sending.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-05-30 17:49:54
-
-

Any reason why the lifecycle state change facet is not just on the output? But is also allowed on the inputs? 🤔 https://openlineage.io/docs/spec/facets/dataset-facets/lifecycle_state_change I can't see how would it be interpreted for an input 🤔

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-01 10:18:48
-
-

*Thread Reply:* I think it should be output-only, yes.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-01 10:19:14
-
-

*Thread Reply:* @Paweł Leszczyński what do you think?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 08:35:13
-
-

*Thread Reply:* yes, should be output only I think

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-05 13:39:07
-
-

*Thread Reply:* should we move it over then? 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-05 13:39:31
-
-

*Thread Reply:* under Output Dataset Facets that is

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-01 12:30:00
-
-

@channel -The first issue of OpenLineage News is now available. To get it directly in your inbox when it’s published, become a subscriber.

- - - -
- 🚀 Willy Lulciuc, Jakub Dardziński, Maciej Obuchowski, Bernat Gabor, Harel Shein, Laurent Paris, Tamara Fingerlin, Perttu Salonen -
- -
- 🔥 Willy Lulciuc, Natalie Zeller, Ernie Ostic, Laurent Paris -
- -
- 💯 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-01 14:23:17
-
-

*Thread Reply:* Correction: Julien and Willy’s talk at Data+AI Summit will take place on June 28

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-01 13:50:23
-
-

Hello all, I’m opening a vote to release 0.27.0, featuring: -• Spark: fixed column lineage from databricks in the case of aggregate queries -• Python client: configurable job-name filtering -• Airflow: fixed urllib.parse.urlparse in case of [] values -Three +1s from committers will authorize an immediate release.

- - - -
- ➕ Jakub Dardziński, Maciej Obuchowski, Willy Lulciuc, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-02 10:30:39
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated on Monday in accordance with our policy here.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-02 13:13:18
-
-

@channel -This month’s TSC meeting is next Thursday, June 8th, at 10:00 am PT. On the tentative agenda: announcements, meetup updates, recent releases, static lineage progress, and open discussion. More info and the meeting link can be found on the website. All are welcome! Also, feel free to reply or DM me with discussion topics, agenda items, etc.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🙌 Sheeri Cabral (Collibra), Maciej Obuchowski, Harel Shein, alexandre bergere, Paweł Leszczyński, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-05 12:34:29
-
-

@channel -We released OpenLineage 0.27.1, including: -Additions: -• Python client: add emission filtering mechanism and exact, regex filters #1878 @mobuchowski -Fixes: -• Spark: fix column lineage for aggregate queries on databricks #1867 @pawel-big-lebowski -• Airflow: fix unquoted [ and ] in Snowflake URIs #1883 @JDarDagran -Plus a CI fix and a proposal. -For the details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.27.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.26.0...0.27.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-05 13:01:06
-
-

Looking for a reviewer under: https://github.com/OpenLineage/OpenLineage/pull/1892 🙂

-
- - - - - - - -
-
Labels
- documentation, spec -
- - - - - - - - - - -
- - - -
- 🙌 Sheeri Cabral (Collibra), Paweł Leszczyński, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-06-05 15:47:08
-
-

*Thread Reply:* @Bernat Gabor thanks for the PR!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-06 08:17:47
-
-

Hey, I request release 0.27.2 to fix potential breaking change in Python client in 0.27.1: https://github.com/OpenLineage/OpenLineage/pull/1908

- - - -
- ➕ Jakub Dardziński, Paweł Leszczyński, Michael Robinson, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-06 10:58:23
-
-

*Thread Reply:* Thanks @Maciej Obuchowski. The release is authorized and will be initiated as soon as possible.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-06 12:33:55
-
-

@channel -We released OpenLineage 0.27.2, including: -Fixes: -• Python client: deprecate client.from_environment, do not skip loading config #1908 @Maciej Obuchowski
-For the details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.27.2 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.27.1...0.27.2 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-06 14:22:18
-
-

Found a major bug in the python client - https://github.com/OpenLineage/OpenLineage/pull/1917, if someone can review

-
- - - - - - - -
-
Labels
- client/python, common -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-06 14:54:47
-
-

And also https://github.com/OpenLineage/OpenLineage/pull/1913 🙂 that fixes the type information not being packaged

-
- - - - - - - -
-
Labels
- integration/airflow, integration/great-expectations, client/python, common, integration/dagster, extractor -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-07 09:48:58
-
-

@channel -This month’s TSC meeting is tomorrow, and all are welcome! https://openlineage.slack.com/archives/C01CK9T7HKR/p1685725998982879

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 11:11:31
-
-

Hi team,

- -

I wanted a lineage of my data for my tables and column level. -I am using jupyter notebook and spark code.

- -

spark = (SparkSession.builder.master('local') - .appName('samplespark') - .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') - .config('spark.jars.packages', 'io.openlineage:openlineagespark:0.12.0') - .config('spark.openlineage.host', 'http://marquez-api:5000') - .config('spark.openlineage.namespace', 'spark_integration') - .getOrCreate())

- -

I used this and then opened the localhost:3000 for marquez

- -

I can see my job there but when i click on the job when its supposed to show lineage, its just an empty screen

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 12:39:20
-
-

*Thread Reply:* Do you get any output in your devtools? I just ran into this yesterday and it looks like it’s related to this issue: https://github.com/MarquezProject/marquez/issues/2410

-
- - - - - - - -
-
Labels
- bug -
- -
-
Comments
- 2 -
- - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 12:40:01
-
-

*Thread Reply:* Seems like more of a Marquez client-side issue than something with OL

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 12:43:02
-
-

*Thread Reply:* ohh but if i try using the console output, it throws ClientProtocolError

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 12:43:41
-
-

*Thread Reply:* Sorry I mean in the dev console of your web browser

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 12:44:43
-
-

*Thread Reply:* this is the dev console in browser

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 12:47:59
-
-

*Thread Reply:* Seems like it’s coming from this line. Are there any job facets defined when you fetch from the API directly? That seems like kind of an old version of OL so maybe the schema is incompatible with the version Marquez is expecting

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 12:51:21
-
-

*Thread Reply:* from pyspark.sql import SparkSession

- -

spark = (SparkSession.builder.master('local') - .appName('sample_spark') - .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') - .config('spark.jars.packages', 'io.openlineage:openlineage_spark:0.12.0') - .config('spark.openlineage.host', '<http://marquez-api:5000>') - .config('spark.openlineage.namespace', 'spark_integration')
- .getOrCreate())

- -

spark.sparkContext.setLogLevel("INFO")

- -

spark.createDataFrame([ - {'a': 1, 'b': 2}, - {'a': 3, 'b': 4} -]).write.mode("overwrite").saveAsTable("temp_table8")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 12:51:49
-
-

*Thread Reply:* This is my only code, I havent done anything apart from this

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 12:52:30
-
-

*Thread Reply:* I would try a more recent version of OL. Looks like you’re using 0.12.0 and I think the project is on 0.27.x currently

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 12:55:07
-
-

*Thread Reply:* so i should change io.openlineage:openlineage_spark:0.12.0 to io.openlineage:openlineage_spark:0.27.1?

- - - -
- 👍 John Lukenoff, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:10:03
-
-

*Thread Reply:* it executed well, unable to see it in marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:18:16
-
-

*Thread Reply:* marquez didnt get updated

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:20:44
-
-

*Thread Reply:* I am actually doing a POC on OpenLineage to find table and column level lineage for my team at Amazon. -If this goes through, the team could use openlineage to track data lineage on a larger scale..

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 13:24:49
-
-

*Thread Reply:* Maybe marquez is still pulling the data from the previous run using the old OL version. Do you still get the same error in the browser console? Do you get the same result if you rebuild and start with a clean marquez db?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:25:10
-
-

*Thread Reply:* yes i did that as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:25:49
-
-

*Thread Reply:* the error was present only once you clicked on any of the jobs in marquez, -since my job isnt showing up i cant check for the error itself

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:26:29
-
-

*Thread Reply:* docker run --network sparkdefault -p 3000:3000 -e MARQUEZHOST=marquez-api -e MARQUEZ_PORT=5000 --link marquez-api:marquez-api marquezproject/marquez-web:0.19.1

- -

used this to rebuild marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 13:26:54
-
-

*Thread Reply:* That’s odd, sorry, that’s probably the most I can help, I’m kinda new to OL/Marquez as well 😅

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:27:41
-
-

*Thread Reply:* no problem, can you refer me to someone who would know, so that i can ask them?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 13:29:25
-
-

*Thread Reply:* Actually looking at in now I think you’re using a slightly outdated version of marquez-web too. I would update that tag to at least 0.33.0. that’s what I’m using

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 13:30:10
-
-

*Thread Reply:* Other than that I would ask in the marquez slack channel or raise an issue in github on that project. Seems like more of an issue with Marquez since some at least some data is rendering in the UI initially

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:32:58
-
-

*Thread Reply:* nope that version also didnt help

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:33:19
-
-

*Thread Reply:* can you share their slack link?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 13:34:52
-
-

*Thread Reply:* http://bit.ly/MarquezSlack

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:35:08
-
-

*Thread Reply:* that link is no longer active

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-06-09 18:44:25
-
-

*Thread Reply:* Hello @Rachana Gandhi could you point to the doc where you found the example .config(‘spark.jars.packages’, ‘io.openlineage:openlineage_spark:0.12.0’) ? We should update it to have the latest version instead.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-09 18:54:49
-
-

*Thread Reply:* https://openlineage.io/docs/integrations/spark/quickstart_local/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-09 18:59:17
-
-

*Thread Reply:* https://openlineage.io/docs/guides/spark

- -

also the docker compose here has an earlier version of marquez

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshit Soni - (harshit.soni@angelbroking.com) -
-
2023-07-13 17:00:54
-
-

*Thread Reply:* Facing same issue with my initial POC. Did we get any solution for this?

- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-08 14:36:38
-
-

Approve a new release 🙂

- - - -
- ➕ Michael Robinson, Willy Lulciuc, Maciej Obuchowski, Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-08 14:43:55
-
-

*Thread Reply:* Requesting a release? 3 +1s from committers will authorize. More info here: https://github.com/OpenLineage/OpenLineage/blob/main/GOVERNANCE.md

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-08 14:44:14
-
-

*Thread Reply:* Yeah, that one 😊

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-08 14:44:44
-
-

*Thread Reply:* Because the python client is broken as is today without a new release

- - - -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-08 18:45:04
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated by EOB next Tuesday, but in all likelihood well before then.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-08 19:06:34
-
-

*Thread Reply:* cool

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-12 13:15:26
-
-

@channel -We released OpenLineage 0.28.0, including: -Added -• dbt: add Databricks compatibility #1829 @Ines70 -Fixed -• Fix type-checked marker and packaging #1913 @gaborbernat -• Python client: add schemaURL to run event #1917 @gaborbernat -For the details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.28.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.27.2...0.28.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🚀 Maciej Obuchowski, Willy Lulciuc, Francis McGregor-Macdonald -
- -
- 👍 Ines DAHOUMANE -COWORKING PARIS- -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-12 14:35:56
-
-

@channel -Meetup announcement: there’s another meetup happening soon! This one will be an evening event on 6/22 in New York at Collibra’s HQ. For details and to sign up, please join the meetup group: https://www.meetup.com/data-lineage-meetup/events/294065396/. Thanks to @Sheeri Cabral (Collibra) for cohosting and providing a space.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-12 23:27:16
-
-

Hi, just curious, does openlineage have a log4j integration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-13 04:44:28
-
-

*Thread Reply:* Do you mean to just log events to logging backend?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-13 04:54:30
-
-

*Thread Reply:* Hmm more like have a separate logging config for sending all the logs to a backend

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-13 04:54:38
-
-

*Thread Reply:* Not the events itself

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-13 05:01:10
-
-

*Thread Reply:* @Anirudh Shrinivason with Spark integration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-13 05:01:59
-
-

*Thread Reply:* It uses slf4j so you should be able to set up your log4j logger

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-13 05:10:55
-
-

*Thread Reply:* Yeah with the spark integration. Ahh I see. Okay sure thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-21 23:21:14
-
-

*Thread Reply:* ~Hi @Maciej Obuchowski May I know what the class path I should be using for setting up the log4j if I want to set it up for OL related logs? Is there some guide or runbook to setting up the log4j with OL? Thanks!~ -Nvm lol found it! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-13 12:19:01
-
-

Hello all, we are just starting to use Marquez as part of our POC. We are following the getting started guide at https://openlineage.io/getting-started/ to set the environment on an AWS Ec2 instance. When we are running ./docker/up.sh, it is not bringing up marquez-web container. Also, we are not able to access Admin UI at 5000 and 5001 ports.

- -

Docker version: 24.0.2 -Docker compose version: 2.18.1 -OS: Ubuntu_20.04

- -

Can someone please let me know what I am missing? -Note: I had to modify docker-compose command in up.sh as per docker compose V2.

- -

Also, we are seeing following log when our loadbalancer is checking for health:

- -

WARN [2023-06-13 15:35:31,040] marquez.logging.LoggingMdcFilter: status: 404 -172.30.1.206 - - [13/Jun/2023:15:35:42 +0000] "GET / HTTP/1.1" 200 535 "-" "ELB-HealthChecker/2.0" 1 -172.30.1.206 - - [13/Jun/2023:15:35:42 +0000] "GET / HTTP/1.1" 404 43 "-" "ELB-HealthChecker/2.0" 2 -WARN [2023-06-13 15:35:42,866] marquez.logging.LoggingMdcFilter: status: 404

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-06-14 10:42:41
-
-

*Thread Reply:* Hello, is anyone eho has recently installed latest version of marquez/open-lineage-spark using docker image available to help Vamshi and I or provide any pointers? Thank you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-15 03:38:38
-
-

*Thread Reply:* if you're working on mac, you can have an issue related to port 5000. Instructions here https://github.com/MarquezProject/marquez#quickstart provides a workaround for that ./docker/up.sh --api-port 9000

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-06-15 08:43:33
-
-

*Thread Reply:* @Paweł Leszczyński, thank you, we were using ubuntu on an EC2 instance and each time we are running into different errors and are never able to access the application page, web server, the admin interface, we have run out of ideas of what else to try differently to get this setup up and running

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-22 14:47:00
-
-

*Thread Reply:* @Michael Robinson Can you please help us here?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-22 14:58:57
-
-

*Thread Reply:* @Vamshi krishna I’m sorry you’re still blocked. Thanks for the information about your system. Would you please share some of the errors you are getting? More details would help us reproduce and diagnose.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-06-22 16:35:00
-
-

*Thread Reply:* @Michael Robinson, thank you, vamshi and i will share the errors that we are running into shortly

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 09:48:16
-
-

*Thread Reply:* We are following https://openlineage.io/getting-started/ guide and trying to set up Marquez on a ubuntu ec2 instance. Following are versions of docker, docker compose and ubuntu

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 09:49:51
-
-

*Thread Reply:* @Michael Robinson When we follow the documentation without changing anything and run sudo ./docker/up.sh we are seeing following errors:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:00:38
-
-

*Thread Reply:* So, I edited up.sh file and modified docker compose command by removing --log-level flag and ran sudo ./docker/up.sh and found following errors:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:02:29
-
-

*Thread Reply:* Then I copied .env.example to .env since compose needs .env file

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:05:04
-
-

*Thread Reply:* I got this error:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:09:24
-
-

*Thread Reply:* since I am getting timeouts, I thought it might be an issue with proxy. So, I followed this doc: https://stackoverflow.com/questions/58841014/set-proxy-on-docker and added my outbound proxy and tried

-
-
Stack Overflow
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:23:46
-
-

*Thread Reply:* @Michael Robinson Then it kind of worked but seeing following errors:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:24:31
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:25:29
-
-

*Thread Reply:* @Michael Robinson @Paweł Leszczyński Can you please see above steps and let us know what are we missing/doing wrong? I appreciate your help and time.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-23 10:45:39
-
-

*Thread Reply:* The latest errors look to me like they’re being caused by postgres and might reflect a port conflict. Are you using the default port for the API (5000)? You might try using a different port. More info about this in the Marquez readme: https://github.com/MarquezProject/marquez/blob/0.35.0/README.md.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:46:55
-
-

*Thread Reply:* Yes we are using default ports: -APIPORT=5000 -APIADMINPORT=5001 -WEBPORT=3000 -TAG=0.35.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:47:40
-
-

*Thread Reply:* We see these postgres permission issues only occasionally. Other times we only see db and api containers up but not the web

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-23 10:52:38
-
-

*Thread Reply:* I would try running ./docker/up.sh --api-port 9000 (see Pawel’s message above for more context.)

- - - -
- 👍 Vamshi krishna -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:54:18
-
-

*Thread Reply:* Still no luck. Seeing same errors.

- -

2023-06-23 14:53:23.971 GMT [1] LOG: could not open configuration file "/etc/postgresql/postgresql.conf": Permission denied -marquez-db | 2023-06-23 14:53:23.971 GMT [1] FATAL: configuration file "/etc/postgresql/postgresql.conf" contains errors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:54:43
-
-

*Thread Reply:* ERROR [2023-06-23 14:53:42,269] org.apache.tomcat.jdbc.pool.ConnectionPool: Unable to create initial connections of pool. -marquez-api | ! java.net.UnknownHostException: postgres -marquez-api | ! at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:567) -marquez-api | ! at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) -marquez-api | ! at java.base/java.net.Socket.connect(Socket.java:633) -marquez-api | ! at org.postgresql.core.PGStream.createSocket(PGStream.java:243) -marquez-api | ! at org.postgresql.core.PGStream.&lt;init&gt;(PGStream.java:98) -marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:132) -marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258) -marquez-api | ! ... 26 common frames omitted -marquez-api | ! Causing: org.postgresql.util.PSQLException: The connection attempt failed. -marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:354) -marquez-api | ! at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54) -marquez-api | ! at org.postgresql.jdbc.PgConnection.&lt;init&gt;(PgConnection.java:253) -marquez-api | ! at org.postgresql.Driver.makeConnection(Driver.java:434) -marquez-api | ! at org.postgresql.Driver.connect(Driver.java:291) -marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:346) -marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:227) -marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:768) -marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:696) -marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.init(ConnectionPool.java:495) -marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.&lt;init&gt;(ConnectionPool.java:153) -marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.pCreatePool(DataSourceProxy.java:118) -marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.createPool(DataSourceProxy.java:107) -marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:131) -marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcUtils.openConnection(JdbcUtils.java:48) -marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcConnectionFactory.&lt;init&gt;(JdbcConnectionFactory.java:75) -marquez-api | ! at org.flywaydb.core.FlywayExecutor.execute(FlywayExecutor.java:147) -marquez-api | ! at <a href="http://org.flywaydb.core.Flyway.info">org.flywaydb.core.Flyway.info</a>(Flyway.java:190) -marquez-api | ! at marquez.db.DbMigration.hasPendingDbMigrations(DbMigration.java:73) -marquez-api | ! at marquez.db.DbMigration.migrateDbOrError(DbMigration.java:27) -marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:105) -marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:48) -marquez-api | ! at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:67) -marquez-api | ! at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:98) -marquez-api | ! at io.dropwizard.cli.Cli.run(Cli.java:78) -marquez-api | ! at io.dropwizard.Application.run(Application.java:94) -marquez-api | ! at marquez.MarquezApp.main(MarquezApp.java:60) -marquez-api | INFO [2023-06-23 14:53:42,274] marquez.MarquezApp: Stopping app...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-23 11:06:32
-
-

*Thread Reply:* Why do you run docker up with sudo? some of your screenshots suggest docker is not able to access docker registry. The last error java.net.UnknownHostException: postgres may be just a result of container being down. Could you verify if all the containers are up and running and if not what's the error? Are you able to test this docker.up in your laptop or other environment?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 11:08:34
-
-

*Thread Reply:* Docker commands require sudo and cannot run with other user. -Postgres container is not coming up. It is failing with following errors:

- -

2023-06-23 14:53:23.971 GMT [1] LOG: could not open configuration file "/etc/postgresql/postgresql.conf": Permission denied -marquez-db | 2023-06-23 14:53:23.971 GMT [1] FATAL: configuration file "/etc/postgresql/postgresql.conf" contains errors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-23 11:10:19
-
-

*Thread Reply:* and what does docker ps -a say about postgres container? why did it fail?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 11:11:36
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-23 11:25:17
-
-

*Thread Reply:* hmyy, no changes on our side have been done in postgresql.conf since August 2022. Did you apply any changes or have a clean clone of a repo?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 11:29:46
-
-

*Thread Reply:* No we didn't make any changes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-23 11:32:21
-
-

*Thread Reply:* you did write earlier Note: I had to modify docker-compose command in up.sh as per docker compose V2.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 11:34:54
-
-

*Thread Reply:* Yes all I did was modified this line: docker-compose --log-level ERROR $compose_files up $ARGS to -docker compose $compose_files up $ARGS since docker compose v2 doesn't support --log-level flag

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 11:37:03
-
-

*Thread Reply:* Let me pull an older version and try

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 12:09:43
-
-

*Thread Reply:* Still no luck same exact errors. Tried on a different ubuntu instance. Still seeing same errors with postgres

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 15:06:32
-
-

*Thread Reply:* @Jeremy W

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-15 10:40:47
-
-

Hi all, a general doubt. Would the column lineage associated with a job be present in both the start events and the complete events? Or could there be cases where the column lineage, and any output information is only present in one of the events, but not the other?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-15 10:49:42
-
-

*Thread Reply:* > Or could there be cases where the column lineage, and any output information is only present in one of the events, but not the other? -Yes. Generally events regarding single run are cumulative

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-15 11:07:03
-
-

*Thread Reply:* Ahh I see... Is it fair to assume that if I see column lineage in a start event, it's the full column lineage? Or could it be possible that half the lineage is in the start event, and half the lineage is in the complete event?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-15 22:50:51
-
-

*Thread Reply:* Hi @Maciej Obuchowski just pinging in case you'd missed the above message. 🙇

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-16 04:48:57
-
-

*Thread Reply:* Actually, in this case this definitely should not happen. @Paweł Leszczyński am I right?

- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-16 04:50:16
-
-

*Thread Reply:* @Maciej Obuchowski yes, you're

- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
nivethika R - (nivethikar8@gmail.com) -
-
2023-06-15 11:14:33
-
-

Hi All.. Is JDBC supported for openLineage and marquez for columnlineage? I did some POC using tables in postgresdb and I am able to see all events but for columnLineage Iam getting it as NULL. Not sure where I am missing.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-16 02:14:19
-
-

*Thread Reply:* ~No, we do have an open issue for that: https://github.com/OpenLineage/OpenLineage/issues/1758~

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-16 05:02:26
-
-

*Thread Reply:* @nivethika R, I am sorry for misleading response, we've merged PR for that https://github.com/OpenLineage/OpenLineage/pull/1636. It does not support select ** but besides that, it should be operational.

- -

Could you please try a query from our integration tests to verify if this is working for you or not: https://github.com/OpenLineage/OpenLineage/pull/1636/files#diff-137aa17091138b69681510e13e3b7d66aa9c9c7c81fe8fe13f09f0de76448dd5R46 ?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nagendra Kolisetty - (nkolisetty@geico.com) -
-
2023-06-16 12:12:00
-
-

Hi There,

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nagendra Kolisetty - (nkolisetty@geico.com) -
-
2023-06-16 12:12:43
-
-

We are trying to install the image on the private AKS cluster and we ended up in below error

- -

kubectl : pod marquez/pgsql-postgresql-client terminated (StartError) -At line:1 char:1

  • kubectl run pgsql-postgresql-client --rm --tty -i --restart='Never' `
  • ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -
    • CategoryInfo : NotSpecified: (pod marquez/pgs...ed (StartError):String) [], RemoteException
    • FullyQualifiedErrorId : NativeCommandError
    • -
  • -
- -

failed to create containerd task: failed to create shim task: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "PGPASSWORD=macondo": executable file not found in $PATH: -unknown

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nagendra Kolisetty - (nkolisetty@geico.com) -
-
2023-06-16 12:13:13
-
-

We followed the below article to install Marquez in AKS (Azure). -By the way, we pulled the images from docker pushed it to our acr. -tried installing the postgresql via ACR and it failed with the error

- -

https://github.com/MarquezProject/marquez/blob/main/docs/running-on-aws.md

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-21 11:07:04
-
-

*Thread Reply:* Hi Nagendra, sorry you’re running into this error. We’re looking into it!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-18 09:53:19
-
-

Hi, found this error in couple of the spark jobs: https://github.com/OpenLineage/OpenLineage/issues/1930 -Would request your help to kindly help patch thanks!

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-19 09:37:20
-
-

*Thread Reply:* Hey @Anirudh Shrinivason, me and Paweł are at Berlin Buzzwords right now. Will definitely look at it later

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-19 10:47:06
-
-

*Thread Reply:* Oh nice! Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ayush mittal - (ayushmittal057@gmail.com) -
-
2023-06-20 03:14:02
-
-

Hi Team, we are not able to generate lineage for aggregate functions while joining two tables. below is the query -df2 = spark.sql("select th.ProductID as Pid, pd.Name as N, sum(th.quantity) as TotalQuantity, sum(th.ActualCost) as TotalCost from silveradventureworks.transactionhistory as th join productdescription_dim as pd on th.ProductID = pd.ProductID group by th.ProductID, pd.Name ")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul - (rahul812ry@gmail.com) -
-
2023-06-20 03:47:50
-
-

*Thread Reply:* This is the event generated for above query.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
ayush mittal - (ayushmittal057@gmail.com) -
-
2023-06-20 03:18:22
-
-

and one more issue, we are not able to generate the open lineage events on top of view being created by joining multiple tables. -i have attached log events for your reference.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
ayush mittal - (ayushmittal057@gmail.com) -
-
2023-06-20 03:31:11
-
-

this is event for view for which no lineage is being generated

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-20 13:59:00
-
-

Has anyone here successfully implemented the Amundsen OpenLineage extractor? I’m a little confused on the best way to output my lineage events to ndjson files in a scalable way as the docs seem to suggest. Currently I’m pushing all my lineage events to Marquez via REST API. I suppose I could change my transports to Kinesis and write the events to s3 but that comes with the cost of having to build some new way of getting the events to Marquez.

- -

In any case, this seems like a problem someone must have solved before?

- -

Edit: looking at the source code for this Amundsen extractor, it seems like it should be pretty straightforward to just implement our own extractor that can pull these records from the Marquez backend. Will give that a shot and see about getting that merged into Amundsen later.

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-20 17:34:08
-
-

*Thread Reply:* Hi John, glad to hear you figured out a path forward on this! Please let us know what you learn 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-20 14:21:03
-
-

Our New York meetup with Collibra is happening in just two days! https://openlineage.slack.com/archives/C01CK9T7HKR/p1686594956030059

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-06-20 14:31:56
-
-

Hello all, Do you know if we have th possibility of persisting column orders while creating lineage as it may be available in the table or data set from which it originates. Or, is there some way in which we can get the column order (id or something).

- -

For example, if a dataset has columns xyz, abc, fgh, dec, I would like to know which column shows first in the dataset in the common data model. Please let me know. m

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-20 17:33:36
-
-

*Thread Reply:* Hi Harshini, I’ve alerted our resident Spark and column-lineage expert about this. Hope to have an answer for you soon.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-06-20 19:39:46
-
-

*Thread Reply:* Thank you Michael, looking forward to it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-21 02:58:41
-
-

*Thread Reply:* Hello @Harshini Devathi. An interesting topic which I have never thought about. The ordering of the fields, we get for Spark Apps, comes from Spark logical plans we extract information from and we do not apply any sorting on them. So, if Spark plan contains columns a , b, c we trust it's the order of columns for a dataset and don't want to check it on our own.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-21 02:59:45
-
-

*Thread Reply:* btw. please let us know how do you obtain your lineage: within a Spark app or from some SQL's scheduled by Airflow?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-06-23 14:40:31
-
-

*Thread Reply:* Hello @Paweł Leszczyński, thank you for the response. We do not need you to check the ordering specifically but I assume that the spark logical plan maintains the column order based on the input datasets. Can we retain that order by adding column id or some sequence number which helps to represent the lineage in the same order.

- -

The lineage we are capturing using Spark openlineage connector, by posting custom lineage to Marquez through API calls, and also in process of leveraging SQL connector feature using Airflow.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-26 04:35:43
-
-

*Thread Reply:* Hi @Harshini Devathi, are you asking about schema facet within a dataset? This should have an order from spark logical plans. Or, are you asking about columnLineage facet? Or Marquez API responses? It's not clear to me why do you need it. Each column, is identified by a dataset (dataset namespace + dataset name) and field name. You can, on your side, generate and column id based on that and order columns based on the id, but still I think I am missing some arguments behind doing so.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-21 17:41:48
-
-

Attention all Bay-area data friends and Data+AI Summit attendees: our first San Francisco meetup is next Tuesday! https://www.meetup.com/meetup-group-bnfqymxe/events/293448130/

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 alexandre bergere -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-23 16:41:29
-
-

Last night in New York we held a meetup with Collibra at their lovely HQ in the Financial District! Many thanks to @Sheeri Cabral (Collibra) for inviting us. -Over a bunch of tasty snacks (thanks for the idea @Harel Shein), we discussed: -• the history and evolution of the spec, and trends in adoption -• progress on the OpenLineage Provider in Airflow (AIP 53) -• progress on “static” AKA design lineage support (expected soon in OpenLineage 1.0.0) -• progress in the LFAI program -• a proposal to add “jobless run” support for auditing use cases and similar edge cases -• an idea to throw a hackathon for creating validation tests and example payloads (would you be interested in participating? let us know!) -• and more. -Many thanks to: -• @Julien Le Dem for making the trip -• Sheeri & Collibra for hosting -• everyone for coming, including second-timer @Ernie Ostic and new member @Shirley Lu -It was great meeting/catching up with everyone. Hope to see you and more new faces at the next one!

- -
- - - - - - - -
- - -
- 🎉 Harel Shein, Peter Hanssens, Ernie Ostic, Paweł Leszczyński, Maciej Obuchowski, Shirley Lu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-26 10:59:08
-
-

Our first San Francisco meetup is tomorrow at 5:30 PM at Astronomer’s offices in the Financial District. https://openlineage.slack.com/archives/C01CK9T7HKR/p1687383708927189

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 🚀 alexandre bergere -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-27 03:43:10
-
-

I can’t seem to get OL logging working with Spark. Any guidance please?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-27 03:45:31
-
-

*Thread Reply:* Is it because the logLevel is set to WARN or ERROR?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-27 12:07:12
-
-

*Thread Reply:* No, I set it to INFO, may be I need to add some jars?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-27 12:30:02
-
-

*Thread Reply:* Hmm have you set the relevant spark configs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-27 12:32:50
-
-

*Thread Reply:* yep, I have http working. But not the console -spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener -spark.openlineage.transport.type=console

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-27 12:35:27
-
-

*Thread Reply:* Oh wait http works but not console...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-27 12:37:02
-
-

*Thread Reply:* If you want to see the console events which are emitted, then need to set logLevel to DEBUG

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-27 12:37:44
-
-

*Thread Reply:* tried that too, still nothing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-27 12:38:54
-
-

*Thread Reply:* Is the openlienage jar installed and added to config?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-27 12:39:09
-
-

*Thread Reply:* yep, that’s why http works

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-27 12:39:26
-
-

*Thread Reply:* the only thing I see in the logs is this: -23/06/27 07:39:11 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerJobEnd

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-27 12:40:59
-
-

*Thread Reply:* Hmm if an event is still emitted for this case, but logs not showing up then I'm not sure... Maybe someone with more knowledge on this can help

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-27 12:42:37
-
-

*Thread Reply:* sure, thanks for trying @Anirudh Shrinivason

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-28 05:23:36
-
-

*Thread Reply:* What job are you trying this on? If there's this message, then logging is working afaik

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-28 12:16:52
-
-

*Thread Reply:* Hi @Maciej Obuchowski Actually I also noticed a similar issue... For some spark pipelines, the log level is set to debug, but I'm not seeing any events being logged. I am however receiving these events in the backend. Have any of the logging been removed from some places?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-28 20:57:45
-
-

*Thread Reply:* yep, exactly same thing here also @Maciej Obuchowski, I can get the events on http, but changing to console gets me nothing from ConsoleTransport.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-27 20:45:15
-
-

@here A bunch of us are downstairs in the lobby at 8 California but no one is down here to let us up. Anyone here to help?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-29 03:36:36
-
-

Hi guys, I noticed a few of the jobs getting OOMed while running with openlineage. Even increasing the number of executors and doubling the memory does not seem to fix it actually. This is observed especially when using the graphx libs. Is this a known issue? Just curious as to what the cause might be... The same jobs run fine once openlineage is disabled. Are there some rogue threads from the listener or any connections we aren't closing properly?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-29 05:57:59
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, could you disable serializing spark.logicalPlan to see if the behaviour is the same?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-29 05:58:28
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark -> spark.openlineage.facets.disabled -> [spark_unknown;spark.logicalPlan]

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-29 05:59:55
-
-

*Thread Reply:* We do serialize logicalPlan because this is useful in many cases, but sometimes can lead to serializing things that shouldn't be serialized

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-29 15:49:35
-
-

*Thread Reply:* Ahh I see. Yeah okay let me try that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-30 08:01:34
-
-

Hello all, I’m opening a vote to release OpenLineage 0.29.0, including: -• support for Spark 3.4 -• support for Flink 1.17.1 -• a fix in the Flink integration to enable dataset schema extraction for a KafkaSource when GenericRecord is used -• removal of the unused Golang proxy client (made redundant by the fluentd proxy) -• security vulnerability fixes, doc changes, test improvements, and more. -Three +1s from committers will authorize an immediate release.

- - - -
- ➕ Jakub Dardziński, Paweł Leszczyński, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-30 08:05:53
-
-

*Thread Reply:* Thanks, all. The release is authorized.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-30 13:27:35
-
-

@channel -We released OpenLineage 0.29.2, including: -Added -• Flink: support Flink version 1.17.1 #1947 @pawel-big-lebowski -• Spark: support Spark version 3.4 #1790 @pawel-big-lebowski -Removed -• Proxy: remove unused Golang client approach #1926 @mobuchowski -• Req: bump minimum supported Python version to 3.8 #1950 @mobuchowski - ◦ Note: this removes support for Python 3.7, which is at EOL. -Plus test improvements, docs changes, bug fixes and more. -Thanks to all the contributors! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.29.2 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.28.0...0.29.2 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👍 Shirley Lu, Maciej Obuchowski, Paweł Leszczyński, Tamara Fingerlin -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-30 17:23:04
-
-

@channel -The latest issue of OpenLineage News is now available, featuring a recap of recent events, releases, and more. To get it directly in your inbox each month, sign up https://openlineage.us14.list-manage.com/track/click?u=fe7ef7a8dbb32933f30a10466&id=e598962936&e=ef0563a7f8|here.

-
-
apache.us14.list-manage.com
- - - - - - - - - - - - - - - -
- -
- - - - - - - -
- - -
- 👍 Maciej Obuchowski, Paweł Leszczyński, Tristan GUEZENNEC -CROIX-, Tamara Fingerlin, Jeremy W, Anirudh Shrinivason, Julien Le Dem, Sheeri Cabral (Collibra), alexandre bergere -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-06 13:36:44
-
-

@channel -This month’s TSC meeting is next Thursday, 7/13, at a special time: 8 am PT. -All are welcome! -On the tentative agenda: -• announcements -• updates -• recent releases -• a new DataGalaxy integration -• open discussion

- - - -
- ✅ Sheeri Cabral (Collibra), Maciej Obuchowski, alexandre bergere, Paweł Leszczyński, Willy Lulciuc, Anirudh Shrinivason, Shirley Lu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-07-07 10:35:08
-
-

Wow, I just got finished watching @Julien Le Dem and @Willy Lulciuc’s presentation of OpenLineage at databricks and it’s really fantastic! There isn’t a better 30 minutes of content on theory + practice than this, IMO. https://www.databricks.com/dataaisummit/session/cross-platform-data-lineage-openlineage/ (you can watch for free by making an account. I’m not affiliated with databricks…)

-
-
databricks.com
- - - - - - - - - - - - - - - - - -
- - - -
- ❤️ Willy Lulciuc, Harel Shein, Yuanli Wang, Ross Turk, Michael Robinson, Jakub Dardziński, Conor Beverland, Maciej Obuchowski, Jarek Potiuk, Julien Le Dem, Chris Folkes, Anirudh Shrinivason, Shirley Lu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-07 10:37:49
-
-

*Thread Reply:* thanks for watching and sharing! the recording is also on youtube 😉 https://www.youtube.com/watch?v=rO3BPqUtWrI

-
-
YouTube
- -
- - - } - - Databricks - (https://www.youtube.com/@Databricks) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-07-07 10:38:01
-
-

*Thread Reply:* ❤️

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jarek Potiuk - (jarek@potiuk.com) -
-
2023-07-08 13:35:10
-
-

*Thread Reply:* Very much agree. I’ve even forwarded to a few people here and there, those who I think should learn about it.

- - - -
- ❤️ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-07-08 13:47:17
-
-

*Thread Reply:* You’re both too kind :) -Thank you for your support and being part of the community.

- - - -
- ❤️ Sheeri Cabral (Collibra), Jarek Potiuk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-07 15:44:33
-
-

@channel -If you registered for TSC meetings through AddEvent, first of all, thank you! Second of all, I’ve had to create a new event series there to enable the editing of individual events. When you have a moment, would you please register for next week’s meeting? Apologies for the inconvenience.

-
-
addevent.com
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Kiran Hiremath, Willy Lulciuc, Shirley Lu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-10 12:29:31
-
-

Hi community, we are interested in capturing time-travel usage for Iceberg Spark sql in column lineage. For instance, INSERT INTO schema.table select ** from schema.another_table version as of 'some_version' . Column lineage is currently missing the version, if used, which it’s actually quite relevant. I’ve gone through the open issues and didn’t see anything similar. Does it look like a valid use case scenario? We started going through the OL, iceberg and Spark code in trying to capture/expose it, but so far we haven’t been able to. If anyone can give a hint/idea/pointer, we are willing to give it try a contribute back with the code

- - - -
- 👀 Rakesh Jain, Nitin Ramchandani -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-07-11 05:46:36
-
-

*Thread Reply:* I think yes this is a great use case. @Paweł Leszczyński is more familiar with the spark integration code than I. -I think in this case, we would add the datasetVersion facet with the underlying Iceberg version: https://github.com/OpenLineage/OpenLineage/blob/main/spec/facets/DatasetVersionDatasetFacet.json -We extract this information in a few places: -https://github.com/search?q=repo%3AOpenLineage%2FOpenLineage%20%20DatasetVersionDatasetFacet&type=code

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-11 05:57:17
-
-

*Thread Reply:* Yes, we do have datasetVersion which captures for output and input datasets their iceberg or delta version. Input versions are collected on START while output are collected on COMPLETE in case a job reads and writes to the same dataset. So, even though column-lineage facet is missing the version, it should be available within events related to a particular run.

- -

If it is not, then perhaps the case here is the lack of support of as of syntax. As far as I remeber, we always get a current version of a dataset and this may be a missing part here.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-11 05:58:49
-
-

*Thread Reply:* link to a method that gets dataset version for iceberg: https://github.com/OpenLineage/OpenLineage/blob/0.29.2/integration/spark/spark3/sr[…]lineage/spark3/agent/lifecycle/plan/catalog/IcebergHandler.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-11 10:57:26
-
-

*Thread Reply:* Thank you @Julien Le Dem and @Paweł Leszczyński -Based on what I’ve seen so far, indeed it seems that only the current snapshot is tracked. When IcebergHandler.getDatasetVersion() -Initially I was expecting to be able to obtain the snapshotId from the SparkTable which comes within getDatasetVersion() but now I realize that OL is using an older version of Iceberg runtime, (0.12.1) which does not support time travel (introduced in 0.14.1). -The evidence is: -• Iceberg documentation for release 0.14.1: https://iceberg.apache.org/docs/0.14.0/spark-queries/#sql -• Iceberg release notes https://iceberg.apache.org/releases/#0140-release -• Comparing the source code, I see the SparkTable from 0.14.1 onward does have a snapshotId instance variable, while previous versions don’t -https://github.com/apache/iceberg/blob/0.14.x/spark/v3.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L82 -https://github.com/apache/iceberg/blob/0.12.x/spark3/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L78

- -

I don’t see anyone complaining about the old version of Iceberg runtime being used and there is no open issue to upgrade so I’ll open the issue and please let me know if that seems reasonable as the immediate next step to take

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-11 15:48:53
-
-

*Thread Reply:* Created issues: #1969 and #1970

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-12 07:15:14
-
-

*Thread Reply:* Thanks @Juan Manuel Cappi. openlineage-spark jar contains modules like spark3 , spark32 , spark33 and spark34 that is going to be merged soon (we do have a ready PR for that). spark34 will be compiled against latest iceberg version. Once this is done #1969 can be closed. For 1970, one would need to implement datasetBuilder within spark34 module and visits node within spark's logical plan that is responsible for as of and creates dataset for OpenLineage event other way than getting latest snapshot version.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-13 12:51:19
-
-

*Thread Reply:* @Paweł Leszczyński I’ve see PR #1971 and I see a new spark34 project with the latest iceberg-spark dependency version, but other versions (spark33, spark32, etc) have not being upgraded in that PR. Since the change is small and does not break any tests, I’ve created PR #1976 for to fix #1969. That alone is unlocking some time travel lineage (i.e. dataset identifier now becomes schema.table.version or schema.table.snapshot_id). Hope it makes sense

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-14 04:37:55
-
-

*Thread Reply:* Hi @Juan Manuel Cappi, You're right and after discussion with you I realized we support some version of iceberg (for spark 3.3 it's still 0.14.0) but this is not the latest iceberg version matching spark version.

- -

There's some tricky part here. Although we wan't our code to succeed with latest spark, we don't want it to fail in a nasty way (class not found exception) when a user is working with an old iceberg version. There are places in our code where we do check are iceberg classes on the classpath? We need to extend this to are iceberg classes on classpath is iceberg version above 0.14 or not For sure this is the case for merge into commands I am working on at the moment. Let's see if the other integration tests are affected in your PR

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Amod Bhalerao - (amod.bhalerao@gmail.com) -
-
2023-07-11 08:09:57
-
-

HI Team, I Seen that Kafka lineage is not coming properly in for Spark streaming, Are we working on this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-11 08:28:59
-
-

*Thread Reply:* what do you mean by that? there is a pyspark & kafka integration test that verifies event being sent when reading or writing to kafka topic: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/tes[…]a/io/openlineage/spark/agent/SparkContainerIntegrationTest.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-11 09:28:56
-
-

*Thread Reply:* We do have an old issue https://github.com/OpenLineage/OpenLineage/issues/372 to support more spark plans that are stream related. But, if you had an example of streaming that is not working for you, this would have been really helpful.

-
- - - - - - - -
-
Labels
- integration/spark, streaming -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Amod Bhalerao - (amod.bhalerao@gmail.com) -
-
2023-07-26 08:03:30
-
-

*Thread Reply:* I have a pipeline Which reads from topic and send data to 3 HIVE tables and one postgres , Its not emitting any lineage for this pipeline

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Amod Bhalerao - (amod.bhalerao@gmail.com) -
-
2023-07-26 08:06:51
-
-

*Thread Reply:* just one task is getting created

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-12 05:55:19
-
-

Hi guys, I notice that with the below spark configs: -```from pyspark.sql import SparkSession -import os

- -

os.environ["TEST_VAR"] = "1"

- -

spark = (SparkSession.builder.master('local') - .appName('samplespark') - .config('spark.jars.packages', 'io.openlineage:openlineagespark:0.29.2,io.delta:deltacore2.12:1.0.1') - .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') - .config('spark.openlineage.transport.type', 'console') - .config('spark.sql.catalog.sparkcatalog', "org.apache.spark.sql.delta.catalog.DeltaCatalog") - .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") - .config("hive.metastore.schema.verification", False) - .config("spark.sql.warehouse.dir","/tmp/") - .config("hive.metastore.warehouse.dir","/tmp/") - .config("javax.jdo.option.ConnectionURL","jdbc:derby:;databaseName=/tmp/metastoredb;create=true") - .config("spark.openlineage.facets.customenvironmentvariables","[TESTVAR;]") - .config("spark.openlineage.facets.disabled","[sparkunknown;spark.logicalPlan]") - .config("spark.hadoop.fs.permissions.unmask-mode","000") - .enableHiveSupport() - .getOrCreate())``` -The custom environment variables facet is not kicking in. However, when all the delta related spark configs are removed, it is working fine. Is this a known issue? Are there any workarounds for it? Thanks!

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-12 06:14:41
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, I’m not familiar with Delta, but enabling debugging helped me a lot to understand what’s going when things fail silently. Just add at the end: -spark.sparkContext.setLogLevel("DEBUG")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-12 06:20:47
-
-

*Thread Reply:* Yeah I checked on debug

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-12 06:20:50
-
-

*Thread Reply:* There are no errors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-12 06:21:10
-
-

*Thread Reply:* Just that there is no environment-properties in the event that is being emitted

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-12 07:31:01
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, what spark version is that? i see you delta version is pretty old. Anyway, the observation is weird and don't know how come delta interferes with environment facet builder. These are so disjoint features. Are you sure you create a new session (there is getOrCreate) ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Glen M - (glen_m@apple.com) -
-
2023-07-12 19:29:06
-
-

*Thread Reply:* @Paweł Leszczyński its because of this line : https://github.com/OpenLineage/OpenLineage/blob/0.29.2/integration/spark/app/src/m[…]nlineage/spark/agent/lifecycle/InternalEventHandlerFactory.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Glen M - (glen_m@apple.com) -
-
2023-07-12 19:32:44
-
-

*Thread Reply:* Assuming this is https://learn.microsoft.com/en-us/azure/databricks/delta/ ... delta .. which is azure datbricks. @Anirudh Shrinivason

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-12 22:58:13
-
-

*Thread Reply:* Hmm I wasn't using databricks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-12 22:59:12
-
-

*Thread Reply:* @Paweł Leszczyński I'm using spark 3.1 btw

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-13 08:05:49
-
-

*Thread Reply:* @Anirudh Shrinivason This should resolve the issue https://github.com/OpenLineage/OpenLineage/pull/1973

-
- - - - - - - -
-
Labels
- documentation, integration/spark -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-13 08:06:11
-
-

*Thread Reply:* PR description contains info on how come the observed behaviour was possible

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-13 08:07:47
-
-

*Thread Reply:* As always, thank you @Anirudh Shrinivason for providing clear information on how to reproduce the issue 🚀 :medal: 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-13 09:52:29
-
-

*Thread Reply:* Ohh that is really great! Thankss so much for the help! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-12 13:50:51
-
-

@channel -A friendly reminder: this month’s TSC meeting — open to all — is tomorrow at 8 am PT. https://openlineage.slack.com/archives/C01CK9T7HKR/p1688665004736219

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Dongjin Seo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-07-12 14:54:29
-
-

Hi Team -How are you ? -Is there any chance to use airflow to run queries against Access file? -Sorry to bother with a question that is not directly related to openlineage ... but I am kind of stuck

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-07-12 15:22:52
-
-

*Thread Reply:* what do you mean by Access file?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-07-12 16:09:03
-
-

*Thread Reply:* ... accdb file, Microsoft Access File: I am in a reverse engineering projet facing a spaghetti style development and would have loved to use, airflow and openlineage as a magic wand, to help me in this damn work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-07-12 21:44:21
-
-

*Thread Reply:* oof.. I’d look into https://airflow.apache.org/docs/apache-airflow-providers-odbc/4.0.0/ -but I really have no clue..

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-07-13 09:47:02
-
-

*Thread Reply:* Thank you Harel -I started from that too ... but it became foggy after the initial step

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Aaman Lamba - (aamanlamba@gmail.com) -
-
2023-07-12 16:30:41
-
-

Hi folks, having an issue ingesting the seed metadata when starting the docker container. The output shows "seed-marquez-with-metadata exited with code 0" but no information is visible in marquez What can be the issue?

- - - -
- ✅ Aaman Lamba -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-12 16:55:00
-
-

*Thread Reply:* Did you check the namespace menu in the top right for a food_delivery namespace?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-12 16:55:12
-
-

*Thread Reply:* (Hi Aaman!)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Aaman Lamba - (aamanlamba@gmail.com) -
-
2023-07-12 16:55:45
-
-

*Thread Reply:* Hi! Thank you that helped!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Aaman Lamba - (aamanlamba@gmail.com) -
-
2023-07-12 16:55:55
-
-

*Thread Reply:* I think that should be added to the quickstart guide

- - - -
- 🙌 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-12 16:56:23
-
-

*Thread Reply:* Great idea, thank you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-07-13 12:09:29
-
-

As discussed in the Monthly meeting, I have opened a PR to propose adding deletion to facets for static lineage metadata: https://github.com/OpenLineage/OpenLineage/pull/1975

-
- - - - - - - -
-
Labels
- documentation, proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-07-13 23:21:29
-
-

Hi, I'm using OL python client. -client.emit( - DatasetEvent( - _eventTime_=datetime.now().isoformat(), - _producer_=producer, - _schemaURL_="<https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/DatasetEvent>", - _dataset_=Dataset(_namespace_=namespace, _name_=f"input-file"), - ) - ) -I want to send a dataset event once files been uploaded. But I received 422 from api/v1/lineage, saying that run and job must not be null. I don't have a job or run yet. How can I solve this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-14 04:09:15
-
-

*Thread Reply:* Hi @Steven, I assume you send your Openlineage events to Marquez. 422 http code is a response from backend and Marquez is still waiting for the PR https://github.com/MarquezProject/marquez/pull/2495 to be merged and released. This PR makes Marquez understand DatasetEvents. They won't be saved in Marquez database (this is to be implemented in future), but at least one will not experience error response code.

- -

To sum up: what you do is correct. You are using a feature that is allowed on a client side but still not implemented on a backend.

-
- - - - - - - -
-
Labels
- docs, api, client/java -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- ✅ Steven -
- -
- 🥳 Steven -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-07-14 04:10:30
-
-

*Thread Reply:* Thanks!!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshit Soni - (harshit.soni@angelbroking.com) -
-
2023-07-14 08:36:23
-
-

@here Hi Team, I am trying to run a spark application with openLineage -Spark :- 3.3.3 -Openlineage :- 0.29.2 -I am getting below error can you please me, what I could be doing wrong.

- -

``` spark = (SparkSession - .builder - .config('spark.port.maxRetries', 100) - .appName(app_name) - .config("spark.openlineage.url","http://localhost/api/v1/namespaces/spark_integration/") - .config("spark.extraListeners","io.openlineage.spark.agent.OpenLineageSparkListener") - .getOrCreate())

- -

23/07/14 18:04:01 ERROR Utils: uncaught error in thread spark-listener-group-shared, stopping SparkContext -java.lang.UnsatisfiedLinkError: /private/var/folders/z6/pl8p30z11v50zf6pv51p259m0000gp/T/native-lib4983292552717270883/libopenlineagesqljava.dylib: dlopen(/private/var/folders/z6/pl8p30z11v50zf6pv51p259m0000gp/T/native-lib4983292552717270883/libopenlineagesqljava.dylib, 0x0001): tried: '/private/var/folders/z6/pl8p30z11v50zf6pv51p259m0000gp/T/native-lib4983292552717270883/libopenlineagesqljava.dylib' (mach-o file, but is an incompatible architecture (have 'x8664', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/private/var/folders/z6/pl8p30z11v50zf6pv51p259m0000gp/T/native-lib4983292552717270883/libopenlineagesqljava.dylib' (no such file), '/private/var/folders/z6/pl8p30z11v50zf6pv51p259m0000gp/T/native-lib4983292552717270883/libopenlineagesqljava.dylib' (mach-o file, but is an incompatible architecture (have 'x8664', need 'arm64')) - at java.lang.ClassLoader$NativeLibrary.load(Native Method)```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-18 02:35:18
-
-

*Thread Reply:* Hi @Harshit Soni, where are you deploying your spark? locally or not? is it on mac? Calling @Maciej Obuchowski to help with ibopenlineage_sql_java architecture compilation issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshit Soni - (harshit.soni@angelbroking.com) -
-
2023-07-18 02:38:03
-
-

*Thread Reply:* Currently, was testing on local.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshit Soni - (harshit.soni@angelbroking.com) -
-
2023-07-18 02:39:43
-
-

*Thread Reply:* We have created a centralised utility for all data ingestion needs and want to see how lineage is created for same using Openlineage.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-18 05:16:55
-
-

*Thread Reply:* 👀

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-14 13:00:29
-
-

@channel -If you missed this month’s TSC meeting, the recording is now available on our YouTube channel: https://youtu.be/2vD6-Uwr7ZE. -A clip of Alexandre Bergere’s DataGalaxy integration demo is also available: https://youtu.be/l_HbEtpXphY.

-
-
YouTube
- -
- - - } - - OpenLineage Project - (https://www.youtube.com/@openlineageproject6897) -
- - - - - - - - - - - - - - - - - -
-
-
YouTube
- -
- - - } - - OpenLineage Project - (https://www.youtube.com/@openlineageproject6897) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Kiran Hiremath, alexandre bergere, Harel Shein, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Robin Fehr - (robin.fehr@acosom.com) -
-
2023-07-16 17:39:26
-
-

Hey guys - trying to get a grip on the ecosystem regarding flink lineage 🙂 as far as my research has revealed, the openlineage project is the only one that supports flink lineage with an out of the box library that can be integrated in jobs. at least as far as i've seen the for other toolings such as datahub we'd have to write our custom hooks that implement their api. as for my question - is my current assumption correct that an integration into the openlineage project of for example datahub/openmetadata would also require support from datahub/openmetadata itself so that they can work with the openlineage spec? or would it somewhat work to write a mapper in between to support their spec? (more of an architectural decision i assume but would be interested in knowing what the openlinage's approach is regarding that)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-17 08:13:49
-
-

*Thread Reply:* > or would it somewhat work to write a mapper in between to support their spec? -I think yeah - maybe https://github.com/Natural-Intelligence/openLineage-openMetadata-transporter would work out of the box if I understand correctly?

-
- - - - - - - -
-
Website
- <https://www.top10.com> -
- -
-
Language
- Java -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-07-17 08:38:59
-
-

*Thread Reply:* Tagging @Natalie Zeller in case you want to collaborate

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Natalie Zeller - (natalie.zeller@naturalint.com) -
-
2023-07-17 08:47:34
-
-

*Thread Reply:* Hi, -We've implemented a transporter that transmits lineage from OpenLineage to OpenMetadata, you can find the github project here. -I've also published a blog post that explains this integration and how to use it. -I'll be happy to help if you have any question

-
- - - - - - - -
-
Website
- <https://www.top10.com> -
- -
-
Language
- Java -
- - - - - - - - -
- - - -
- 🙌 Robin Fehr -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Robin Fehr - (robin.fehr@acosom.com) -
-
2023-07-17 09:49:30
-
-

*Thread Reply:* very cool! thanks a lot for responding so quickly

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-17 18:23:53
-
-

🚀 We recently hit the 1000-member mark on here! Thank you for joining the movement to establish an open standard for data lineage across the data ecosystem! Tell your friends 🙂! -💯💯💯💯💯💯💯💯💯💯 -https://bit.ly/lineageslack

- - - -
- 🎉 Juan Manuel Cappi, Harel Shein, Paweł Leszczyński, Maciej Obuchowski, Willy Lulciuc, Viraj Parekh -
- -
- 💯 Harel Shein, Anirudh Shrinivason, Paweł Leszczyński, Maciej Obuchowski, Willy Lulciuc, Robin Fehr, Viraj Parekh, Ernie Ostic -
- -
- 👏 thebruuu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-18 04:58:14
-
-

Btw, just curious what exactly does the runId correspond to in the OL spark integration? Is it possible to obtain the spark application id from the event too?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-18 05:10:31
-
-

*Thread Reply:* runId is an UUID assigned per spark action (compute trigger within a spark job). A single spark script can result in multiple runs then

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-18 05:13:17
-
-

*Thread Reply:* adding an extra facet with applicationId looks like a good idea to me: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/SparkContext.html#applicationId:String

-
-
spark.apache.org
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-18 23:06:01
-
-

*Thread Reply:* Got it thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-18 09:47:47
-
-

Hi, I have an usecase to integrate queries run in Jupyter notebook using pandas integrate with OpenLineage to get the Lineage in Marquez. Did anyone implemented this before? please let me know. Thanks

- - - -
- 🤩 thebruuu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-20 06:48:54
-
-

*Thread Reply:* I think we don't have pandas support so far. So, if one uses pandas to read local files on disk, then perhaps Openlineage (OL) has little sense to do. There is an old pandas issues in our backlog (over 2 years old) -> https://github.com/OpenLineage/OpenLineage/issues/108

- -

Surely one can use use python OL client to create manully events and send them to MQZ, which may be less convenient (https://github.com/OpenLineage/OpenLineage/tree/main/client/python)

- -

Anyway, we would like to know what's you usecase? this would be super helpful in understanding why OL & pandas integration may be useful.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-20 06:52:32
-
-

*Thread Reply:* Thanks Pawel for responding

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-19 02:57:57
-
-

Hi guys, when can we expect the next Openlineage release? Excited for MergeIntoCommand column lineage feature!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-19 03:40:20
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, I am still working on that. It's kind of complex because I want to refactor column level lineage so that it can work with multiple Spark versions and multiple delta jars as merge into implementation for delta differs for different delta releases. I thought it's ready, but this needs some extra work to be done in next days. I am excited about that too!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-19 03:54:37
-
-

*Thread Reply:* Ahh I see... Got it! Is there a tentative timeline for when we can expect this? So sorry haha don't mean to rush you. Just curious to know thats all! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-19 22:06:10
-
-

*Thread Reply:* Can we author a release sometime soon? Would like to use the CustomEnvironmentFacetBuilder for delta catalog!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-20 05:28:43
-
-

*Thread Reply:* we're pretty close i think with merge into delta which is under review. waiting for it would be nice. anyway, we're 3 weeks after the last release.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-20 06:50:56
-
-

*Thread Reply:* @Anirudh Shrinivason releases are available basically on-demand using our process in GOVERNANCE.md. I recommend watching 1958 and then making a request in #general once it’s been merged. But, as Paweł suggested, we have a scheduled release coming soon, anyway. Thanks for your interest in the fix!

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-20 11:01:14
-
-

*Thread Reply:* Ahh I see. Got it. Thanks! @Michael Robinson @Paweł Leszczyński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-21 03:12:22
-
-

*Thread Reply:* @Anirudh Shrinivason it's merged -> https://github.com/OpenLineage/OpenLineage/pull/1958

-
- - - - - - - -
-
Labels
- documentation, integration/spark -
- -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-21 04:19:15
-
-

*Thread Reply:* Awesome thanks so much! @Paweł Leszczyński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-19 06:59:31
-
-

Hi there, related to my question a few days ago about usage of time travel in iceberg, currently only the alias used (i.e. tag, branch) is captured as part of the dataset identifier for lineage. If the tag is removed, or even worse, if it’s removed and re-created with the same name pointing to a difference snapshotid, the lineage will be capturing an inaccurate history. So, ideally, we’d like to capture the actual snapshotid behind the named reference, as part of the lineage. Anyone else thinking this is a reasonable scenario? => more in 🧵

- - - -
- 👀 Paweł Leszczyński, Dongjin Seo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-19 07:14:54
-
-

*Thread Reply:* One hacky approach would be to update the current dataset identifier to include the snapshot_id, so, for schema.table.tag we would have something like schema.table.tag-snapshot_id. The benefit is that it’s explicit and it doesn’t require a change in the OL schemas. The obvious downside (though not that serious in my opinion) is that impacts readability. Not sure though if there are other non-obvious side-effects.

- -

Another alternative would be to add a dedicated property. For instance, the job > latestRun schema, the input/output dataset version objects could look like this: -"inputDatasetVersions": [ - { - "datasetVersionId": { - "namespace": "<s3a://warehouse>", - "name": "schema.table.tag", - "snapshot_id": "7056736771450556218", - "version": "1c634e18-e357-347b-b758-4337ac352d6d" - }, - "facets": {} - } -] -And column lineage could look like: -```"columnLineage": [ - { - "name": "somefield", - "inputFields": [ - { - "namespace": "s3a:warehouse", - "dataset": "schema.table.tag", - "snapshotid": "7056736771450556218", - "field": "some_field", - ... - }, - ...

- -
  ],
-
- -

...```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-19 08:33:43
-
-

*Thread Reply:* @Paweł Leszczyński what do you think?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-19 08:38:16
-
-

*Thread Reply:* 1. How does snapshotId differ from version? Could one make OL version property to be a string concat of iceberg-snapshot-id.iceberg-version

- -
  1. I don't think it's necessary (or don't understand why) to add snapshot-id within column-linegea. Each entry within inputFields of columnLineage is already available within inputs of the OL event related to this run.
  2. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-19 18:43:31
-
-

*Thread Reply:* Yes, I think follow the idea. The problem with that is the version is tied to the dataset name, i.e. my_namespace.table_A.tag_v1 which stays the same for the source dataset, which is the one being used with time travel. -Suppose the following sequence: -step 1 => -tableA.tagv1 has snapshot id 123-abc -run job: table_A.tag_v1 -> job x -> table_B -the inputDatasetVersions > datasetVersionId > version for table_B points to an object which represents table_A.tag_v1 with snapshot id 123-abc correctly captured within facets > version > datasetVersion

- -

step 2 => -delete tag_v1, insert some data, create tag_v1 again -now table_A.tag_v1 has snapshot id 456-def -run job again: table_A.tag_v1 -> job x -> table_B -the inputDatasetVersions > datasetVersionId > version for table_B points to the same object which represents table_A.tag_v1 only now snapshot id has been replaced by 456-def within facets > version > datasetVersion which means I don’t have way to know which was the snapshot id used in the step 1

- -

The “hack” I mentioned above though seems to solve the issue, since a new dataset is captured for each combination, so no information is overwritten/lost, i.e., the datasets referenced in inputDatasetVersions are now named: -table_A.tag_v1-123-abc -table_A.tag_v1-456-def

- -

As a side effect, the column lineage also gets “fixed”, since the lineage for the step 1 and step 2 job runs, without the “hack” both referenced table_A.tag_v1 as the source of input field, though in each run the snapshot id was different. With the hack, one run references table_A.tag_v1-123-abc and the other one table_A.tag_v1-456-def

- -

Hope it makes sense. If it helps, I can put together a few json files with the examples I’ve been using to experiment

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-20 06:35:22
-
-

*Thread Reply:* So, my understanding of the problem is that icberg version is not unique. So, if you have version 3, revert to version 2, and then write something again, one ends up again with version 3.

- -

I would not like to mess with dataset names because on the backend sides like Marquez, dataset names being the same in different jobs and runs allow creating lineage graph. If dataset names are different, then there is no way to build lineage graph across multiple jobs.

- -

Adding snapshot_id to datasetVersion is one option to go. My concern here is that this is so iceberg specific while we're aiming to have a general solution to dataset versioning.

- -

Some other options are: send concat of version+snapshotId as a version or send only snapshot_id as a version. The second ain't that bad as actually snapashotId is something we're aiming to get as a version, isn't it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-21 04:21:26
-
-

Hi guys, I’d like to open a vote to release the next OpenLineage version! We'd really like to use the fixed CustomEnvironmentFacetBuilder for delta catalogs, and column lineage for Merge Into command in the spark integration! Thanks! 🙂

- - - -
- ➕ Jakub Dardziński, Willy Lulciuc, Michael Robinson, Maciej Obuchowski, Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-21 13:09:39
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated within two business days per our policy here.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-25 13:44:47
-
-

*Thread Reply:* @Anirudh Shrinivason and others waiting on this release: the release process isn’t working as expected due to security improvements recently made to the website, ironically enough, which is the source for the spec. But we’re working on a fix and hope to complete the release soon.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-25 15:19:49
-
-

*Thread Reply:* @Anirudh Shrinivason the release (0.30.1) is out now. Thanks for your patience 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-25 23:21:14
-
-

*Thread Reply:* Hi @Michael Robinson Thanks a lot!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-26 08:52:24
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 06:38:16
-
-

Hi, I am running a job in Marquez with 180 rows of metadata but it is running for more than an hour. Is there a way to check the log on Marquez? Below is the screenshot of the job:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-21 08:10:58
-
-

*Thread Reply:* > I am running a job in Marquez with 180 rows of metadata -Do you mean that you have +100 rows of metadata in the jobs table for Marquez? Or that the job never finishes?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-21 08:11:47
-
-

*Thread Reply:* Also, yes, we have an even viewer that allows you to query the raw OL events

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-21 08:12:19
-
-

*Thread Reply:* If you post a sample of your events, it’d be helpful to troubleshoot your issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 08:53:25
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 08:53:31
-
-

*Thread Reply:* Sure Willy thanks for your response. The job is still running. This is the code I am running from jupyter notebook using Python client:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 08:54:33
-
-

*Thread Reply:* as you can see my input and output datasets are just 1 row

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 08:55:02
-
-

*Thread Reply:* included column lineage but job keeps running so I don't know if it is working

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 06:38:49
-
-

Please ignore 'UPDATED AT' timestamp

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 07:56:48
-
-

@Paweł Leszczyński there is lot of interest in our organisation to implement Openlineage in several project and we might take the spark route so on that note a small question: Does open lineage works from extracting data from the Catalyst optimiser's Physical/Logical plans etc?

- - - -
- 👍 Paweł Leszczyński -
- -
- ❤️ Willy Lulciuc, Paweł Leszczyński, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-21 08:20:33
-
-

*Thread Reply:* spark integration is based on extracting lineage from optimized plans

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-21 08:25:35
-
-

*Thread Reply:* https://youtu.be/rO3BPqUtWrI?t=1326 i recommend whole presentation but in case you're just interested in Spark integration, there few mins that explain how this is achieved (link points to 22:06 min of video)

-
-
YouTube
- -
- - - } - - Databricks - (https://www.youtube.com/@Databricks) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 08:43:47
-
-

*Thread Reply:* Thanks Pawel for sharing. I will take a look. Have a nice weekend.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jens Pfau - (jenspfau@google.com) -
-
2023-07-21 08:22:51
-
-

Hello everyone!

- - - -
- 👋 Jakub Dardziński, Maciej Obuchowski, Willy Lulciuc, Michael Robinson, Harel Shein, Ross Turk, Robin Fehr, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-21 09:57:51
-
-

*Thread Reply:* Welcome, @Jens Pfau!

- - - -
- 😀 Jens Pfau -
- -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-23 08:36:38
-
-

hello everyone! I am trying to follow your guide -https://openlineage.io/docs/integrations/spark/quickstart_local -and when i execute -spark.createDataFrame([ - {'a': 1, 'b': 2}, - {'a': 3, 'b': 4} -]).write.mode("overwrite").saveAsTable("temp1")

- -

i not getting the expected result

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-23 08:37:55
-
-

``23/07/23 12:35:20 INFO OpenLineageRunEventBuilder: Visiting query plan Optional[== Parsed Logical Plan == -'CreateTabletemp1`, Overwrite -+- LogicalRDD [a#6L, b#7L], false

- -

== Analyzed Logical Plan ==

- -

CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Optimized Logical Plan == -CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Physical Plan == -Execute CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- **(1) Scan ExistingRDD[a#6L,b#7L] -] with input dataset builders [<function1>, <function1>, <function1>, <function1>, <function1>] -23/07/23 12:35:20 INFO OpenLineageRunEventBuilder: Visiting query plan Optional[== Parsed Logical Plan == -'CreateTable temp1, Overwrite -+- LogicalRDD [a#6L, b#7L], false

- -

== Analyzed Logical Plan ==

- -

CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Optimized Logical Plan == -CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Physical Plan == -Execute CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- **(1) Scan ExistingRDD[a#6L,b#7L] -] with output dataset builders [<function1>, <function1>, <function1>, <function1>, <function1>, <function1>, <function1>] -23/07/23 12:35:20 INFO CreateDataSourceTableAsSelectCommandVisitor: Matched io.openlineage.spark.agent.lifecycle.plan.CreateDataSourceTableAsSelectCommandVisitor<org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand,io.openlineage.client.OpenLineage$OutputDataset> to logical plan CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

23/07/23 12:35:20 INFO CreateDataSourceTableAsSelectCommandVisitor: Matched io.openlineage.spark.agent.lifecycle.plan.CreateDataSourceTableAsSelectCommandVisitor<org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand,io.openlineage.client.OpenLineage$OutputDataset> to logical plan CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

23/07/23 12:35:20 ERROR EventEmitter: Could not emit lineage w/ exception -io.openlineage.client.OpenLineageClientException: io.openlineage.spark.shaded.org.apache.http.client.ClientProtocolException - at io.openlineage.client.transports.HttpTransport.emit(HttpTransport.java:105) - at io.openlineage.client.OpenLineageClient.emit(OpenLineageClient.java:34) - at io.openlineage.spark.agent.EventEmitter.emit(EventEmitter.java:71) - at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:77) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:99) - at java.base/java.util.Optional.ifPresent(Optional.java:183) - at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:99) - at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:90) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1381) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) -Caused by: io.openlineage.spark.shaded.org.apache.http.client.ClientProtocolException - at io.openlineage.spark.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:187) - at io.openlineage.spark.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) - at io.openlineage.spark.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) - at io.openlineage.client.transports.HttpTransport.emit(HttpTransport.java:100) - ... 21 more -Caused by: io.openlineage.spark.shaded.org.apache.http.ProtocolException: Target host is not specified - at io.openlineage.spark.shaded.org.apache.http.impl.conn.DefaultRoutePlanner.determineRoute(DefaultRoutePlanner.java:71) - at io.openlineage.spark.shaded.org.apache.http.impl.client.InternalHttpClient.determineRoute(InternalHttpClient.java:125) - at io.openlineage.spark.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) - ... 24 more -23/07/23 12:35:20 INFO ParquetFileFormat: Using default output committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter -23/07/23 12:35:20 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 -23/07/23 12:35:20 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false -23/07/23 12:35:20 INFO SQLHadoopMapReduceCommitProtocol: Using user defined output committer class org.apache.parquet.hadoop.ParquetOutputCommitter -23/07/23 12:35:20 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 -23/07/23 12:35:20 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false -23/07/23 12:35:20 INFO SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.parquet.hadoop.ParquetOutputCommitter -23/07/23 12:35:20 INFO CodeGenerator: Code generated in 120.989125 ms -23/07/23 12:35:21 INFO SparkContext: Starting job: saveAsTable at NativeMethodAccessorImpl.java:0 -23/07/23 12:35:21 INFO DAGScheduler: Got job 0 (saveAsTable at NativeMethodAccessorImpl.java:0) with 1 output partitions -23/07/23 12:35:21 INFO DAGScheduler: Final stage: ResultStage 0 (saveAsTable at NativeMethodAccessorImpl.java:0) -23/07/23 12:35:21 INFO DAGScheduler: Parents of final stage: List() -23/07/23 12:35:21 INFO DAGScheduler: Missing parents: List() -23/07/23 12:35:21 INFO OpenLineageRunEventBuilder: Visiting query plan Optional[== Parsed Logical Plan == -'CreateTable temp1, Overwrite -+- LogicalRDD [a#6L, b#7L], false

- -

== Analyzed Logical Plan ==

- -

CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Optimized Logical Plan == -CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Physical Plan == -Execute CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- **(1) Scan ExistingRDD[a#6L,b#7L] -] with input dataset builders [<function1>, <function1>, <function1>, <function1>, <function1>] -23/07/23 12:35:21 INFO OpenLineageRunEventBuilder: Visiting query plan Optional[== Parsed Logical Plan == -'CreateTable temp1, Overwrite -+- LogicalRDD [a#6L, b#7L], false

- -

== Analyzed Logical Plan ==

- -

CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Optimized Logical Plan == -CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Physical Plan == -Execute CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- **(1) Scan ExistingRDD[a#6L,b#7L] -] with output dataset builders [<function1>, <function1>, <function1>, <function1>, <function1>, <function1>, <function1>] -23/07/23 12:35:21 INFO CreateDataSourceTableAsSelectCommandVisitor: Matched io.openlineage.spark.agent.lifecycle.plan.CreateDataSourceTableAsSelectCommandVisitor<org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand,io.openlineage.client.OpenLineage$OutputDataset> to logical plan CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

23/07/23 12:35:21 INFO CreateDataSourceTableAsSelectCommandVisitor: Matched io.openlineage.spark.agent.lifecycle.plan.CreateDataSourceTableAsSelectCommandVisitor<org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand,io.openlineage.client.OpenLineage$OutputDataset> to logical plan CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

23/07/23 12:35:21 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[10] at saveAsTable at NativeMethodAccessorImpl.java:0), which has no missing parents -23/07/23 12:35:21 ERROR EventEmitter: Could not emit lineage w/ exception -io.openlineage.client.OpenLineageClientException: io.openlineage.spark.shaded.org.apache.http.client.ClientProtocolException - at io.openlineage.client.transports.HttpTransport.emit(HttpTransport.java:105) - at io.openlineage.client.OpenLineageClient.emit(OpenLineageClient.java:34) - at io.openlineage.spark.agent.EventEmitter.emit(EventEmitter.java:71) - at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:174) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$onJobStart$9(OpenLineageSparkListener.java:153) - at java.base/java.util.Optional.ifPresent(Optional.java:183) - at io.openlineage.spark.agent.OpenLineageSparkListener.onJobStart(OpenLineageSparkListener.java:149) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1381) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) -Caused by: io.openlineage.spark.shaded.org.apache.http.client.ClientProtocolException - at io.openlineage.spark.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:187) - at io.openlineage.spark.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) - at io.openlineage.spark.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) - at io.openlineage.client.transports.HttpTransport.emit(HttpTransport.java:100) - ... 20 more -Caused by: io.openlineage.spark.shaded.org.apache.http.ProtocolException: Target host is not specified - at io.openlineage.spark.shaded.org.apache.http.impl.conn.DefaultRoutePlanner.determineRoute(```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-23 08:38:46
-
-

23/07/23 12:35:20 ERROR EventEmitter: Could not emit lineage w/ exception -io.openlineage.client.OpenLineageClientException: io.openlineage.spark.shaded.org.apache.http.client.ClientProtocolException - at io.openlineage.client.transports.HttpTransport.emit(HttpTransport.java:105) - at io.openlineage.client.OpenLineageClient.emit(OpenLineageClient.java:34) - at io.openlineage.spark.agent.EventEmitter.emit(EventEmitter.java:71) - at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:77) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:99)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-23 13:31:53
-
-

*Thread Reply:* That looks like your URL provided to OpenLineage is missing http:// or https:// in the front

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-23 14:54:55
-
-

*Thread Reply:* sorry how can i resolve this ? do i need to add this ? i just follow the guide step by step . You dont mention anywhere to add anything. You provide smth that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-23 14:55:05
-
-

*Thread Reply:* really does not work out of the box

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-23 14:55:13
-
-

*Thread Reply:* anbd this is supposed to be a demo

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-07-23 17:07:49
-
-

*Thread Reply:* bumping e.g. to io.openlineage:openlineage_spark:0.29.2 seems to be fixing the issue

- -

not sure why it stopped working for 0.12.0 but we’ll take a look and fix accordingly

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-24 04:51:34
-
-

*Thread Reply:* ...probably by bumping the version on this page 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-24 05:00:28
-
-

*Thread Reply:* thank you both for coming back to me , I bumped to 0.29 and i think that it now runs.Is this the expected output ? -23/07/24 08:43:55 INFO ConsoleTransport: {"eventTime":"2023_07_24T08:43:55.941Z","producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/integration/spark>","schemaURL":"<https://openlineage.io/spec/2-0-0/OpenLineage.json#/$defs/RunEvent>","eventType":"COMPLETE","run":{"runId":"186c06c0_e79c_43cf_8bb7_08e1ab4c86a5","facets":{"spark.logicalPlan":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/integration/spark>","_schemaURL":"<https://openlineage.io/spec/2-0-0/OpenLineage.json#/$defs/RunFacet>","plan":[{"class":"org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand","num-children":1,"table":{"product-class":"org.apache.spark.sql.catalyst.catalog.CatalogTable","identifier":{"product-class":"org.apache.spark.sql.catalyst.TableIdentifier","table":"temp2","database":"default"},"tableType":{"product-class":"org.apache.spark.sql.catalyst.catalog.CatalogTableType","name":"MANAGED"},"storage":{"product_class":"org.apache.spark.sql.catalyst.catalog.CatalogStorageFormat","compressed":false,"properties":null},"schema":{"type":"struct","fields":[]},"provider":"parquet","partitionColumnNames":[],"owner":"","createTime":1690188235517,"lastAccessTime":-1,"createVersion":"","properties":null,"unsupportedFeatures":[],"tracksPartitionsInCatalog":false,"schemaPreservesCase":true,"ignoredProperties":null},"mode":null,"query":0,"outputColumnNames":"[a, b]"},{"class":"org.apache.spark.sql.execution.LogicalRDD","num_children":0,"output":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"a","dataType":"long","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":12,"jvmId":"173725f4_02c4_4174_9d18_3a61aa311d62"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num_children":0,"name":"b","dataType":"long","nullable":true,"metadata":{},"exprId":{"product_class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":13,"jvmId":"173725f4-02c4-4174-9d18-3a61aa311d62"},"qualifier":[]}]],"rdd":null,"outputPartitioning":{"product_class":"org.apache.spark.sql.catalyst.plans.physical.UnknownPartitioning","numPartitions":0},"outputOrdering":[],"isStreaming":false,"session":null}]},"spark_version":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/integration/spark>","_schemaURL":"<https://openlineage.io/spec/2-0-0/OpenLineage.json#/$defs/RunFacet>","spark-version":"3.1.2","openlineage_spark_version":"0.29.2"}}},"job":{"namespace":"default","name":"sample_spark.execute_create_data_source_table_as_select_command","facets":{}},"inputs":[],"outputs":[{"namespace":"file","name":"/home/jovyan/spark-warehouse/temp2","facets":{"dataSource":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/integration/spark>","_schemaURL":"<https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet>","name":"file","uri":"file"},"schema":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/integration/spark>","_schemaURL":"<https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet>","fields":[{"name":"a","type":"long"},{"name":"b","type":"long"}]},"symlinks":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/integration/spark>","_schemaURL":"<https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet>","identifiers":[{"namespace":"/home/jovyan/spark-warehouse","name":"default.temp2","type":"TABLE"}]},"lifecycleStateChange":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/integration/spark>","_schemaURL":"<https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet>","lifecycleStateChange":"CREATE"}},"outputFacets":{}}]} -? Also i then proceeded to run -docker run --network spark_default -p 3000:3000 -e MARQUEZ_HOST=marquez-api -e MARQUEZ_PORT=5000 --link marquez-api:marquez-api marquezproject/marquez-web:0.19.1 -but the page is empty

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-24 11:11:08
-
-

*Thread Reply:* You'd need to set up spark.openlineage.transport.url to send OpenLineage events to Marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-24 11:12:28
-
-

*Thread Reply:* where n how can i do this ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-24 11:13:04
-
-

*Thread Reply:* do i need to edit the conf ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-24 11:37:09
-
-

*Thread Reply:* yes, in the spark conf

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-24 11:37:48
-
-

*Thread Reply:* what this url should be ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-24 11:37:51
-
-

*Thread Reply:* http://localhost:3000/ ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-24 11:43:30
-
-

*Thread Reply:* That depends how you ran Marquez, but looking at your screenshot UI is at 3000, I guess API would be at 5000

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-24 11:43:46
-
-

*Thread Reply:* as that's default in Marquez docker-compose

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-24 11:44:14
-
-

*Thread Reply:* i cannot see spark conf

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-24 11:44:23
-
-

*Thread Reply:* is it in there or do i need to create it ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-24 16:42:53
-
-

*Thread Reply:* Is something like -```from pyspark.sql import SparkSession

- -

spark = (SparkSession.builder.master('local') - .appName('samplespark') - .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') - .config('spark.jars.packages', 'io.openlineage:openlineagespark:0.29.2') - .config('spark.openlineage.transport.url', 'http://marquez:5000') - .config('spark.openlineage.transport.type', 'http') - .getOrCreate())``` -not working?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-25 05:08:08
-
-

*Thread Reply:* OK when i use the snippet you provided and then execute -docker run --network sparkdefault -p 3000:3000 -e MARQUEZHOST=marquez-api -e MARQUEZ_PORT=5000 --link marquez-api:marquez-api marquezproject/marquez-web:0.19.1

- -

I can now see this

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-25 05:08:52
-
-

*Thread Reply:* but when i click on the job i then get this

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-25 05:09:05
-
-

*Thread Reply:* so i cannot see any details of the job

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sarwat Fatima - (sarwatfatimam@gmail.com) -
-
2023-09-05 05:54:50
-
-

*Thread Reply:* @George Polychronopoulos Hi, I am facing the same issue. After adding spark conf and using the docker run command, marquez is still showing empty. Do I need to change something in the run command?

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 05:55:15
-
-

*Thread Reply:* yes i will tell you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sarwat Fatima - (sarwatfatimam@gmail.com) -
-
2023-09-05 07:36:41
-
-

*Thread Reply:* For the docker command that I used, I updated the marquez-web version to 0.40.0 and I also updated the Marquez_host which I am not sure if I have to or not. The UI is running but not showing anything docker run --network spark_default -p 3000:3000 -e MARQUEZ_HOST=localhost -e MARQUEZ_PORT=5000 --link marquez-api:marquez-api marquez/marquez-web:0.40.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:36:52
-
-

*Thread Reply:* is because you are running this command right

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:36:55
-
-

*Thread Reply:* yes thats it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:36:58
-
-

*Thread Reply:* you need 0.40

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:37:03
-
-

*Thread Reply:* and there is a lot of stuff

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:37:07
-
-

*Thread Reply:* you need rto chwange

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:37:10
-
-

*Thread Reply:* in the Docker

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:37:24
-
-

*Thread Reply:* so the spark

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:37:25
-
-

*Thread Reply:* version

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:37:27
-
-

*Thread Reply:* the python

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:38:05
-
-

*Thread Reply:* version: "3.10" -services: - notebook: - image: jupyter/pyspark-notebook:spark-3.4.1 - ports: - - "8888:8888" - volumes: - - ./docker/notebooks:/home/jovyan/notebooks - - ./build:/home/jovyan/openlineage - links: - - "api:marquez" - depends_on: - - api

- -

Marquez as an OpenLineage Client

- -

api: - image: marquezproject/marquez - containername: marquez-api - ports: - - "5000:5000" - - "5001:5001" - volumes: - - ./docker/wait-for-it.sh:/usr/src/app/wait-for-it.sh - links: - - "db:postgres" - dependson: - - db - entrypoint: [ "./wait-for-it.sh", "db:5432", "--", "./entrypoint.sh" ]

- -

db: - image: postgres:12.1 - containername: marquez-db - ports: - - "5432:5432" - environment: - - POSTGRESUSER=postgres - - POSTGRESPASSWORD=password - - MARQUEZDB=marquez - - MARQUEZUSER=marquez - - MARQUEZPASSWORD=marquez - volumes: - - ./docker/init-db.sh:/docker-entrypoint-initdb.d/init-db.sh - # Enables SQL statement logging (see: https://www.postgresql.org/docs/12/runtime-config-logging.html#GUC-LOG-STATEMENT) - # command: ["postgres", "-c", "log_statement=all"]

-
-
PostgreSQL Documentation
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:38:10
-
-

*Thread Reply:* this is hopw mine looks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:38:20
-
-

*Thread Reply:* it is all tested and letest version

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:38:31
-
-

*Thread Reply:* postgres does not work beyond 12

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:38:56
-
-

*Thread Reply:* if you run this docker-compose up

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:38:58
-
-

*Thread Reply:* the notebooks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:39:02
-
-

*Thread Reply:* are 10 faster

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:39:06
-
-

*Thread Reply:* and give no errors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:39:14
-
-

*Thread Reply:* also you need to update other stuff

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:39:18
-
-

*Thread Reply:* such as

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:39:26
-
-

*Thread Reply:* dont run what is in the docs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:39:34
-
-

*Thread Reply:* but run what is in github

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:40:13
- -
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:40:22
-
-

*Thread Reply:* run in your notebooks what is in here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:40:32
-
-

*Thread Reply:* ```from pyspark.sql import SparkSession

- -

spark = (SparkSession.builder.master('local') - .appName('samplespark') - .config('spark.jars.packages', 'io.openlineage:openlineagespark:1.1.0') - .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') - .config('spark.openlineage.transport.url', 'http://{openlineage.client.host}/api/v1/namespaces/spark_integration/') - .getOrCreate())```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:40:38
-
-

*Thread Reply:* the dont update documentation

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:40:44
-
-

*Thread Reply:* it took me 4 weeks to get here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-23 08:39:13
-
-

is this a known error ? does anyone know how to debug this ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-07-23 23:57:43
-
-

Hi, -Using Marquez. I tried to get the dataset version through two apis. -First: -http://host/api/v1/namespaces/{namespace}/datasets/{dataset} -It will include a currentVersion in the response. -Then: -http://host/api/v1/namespaces/{namespace}/datasets/{dataset}/versions/{currentVersion} -But the version used here refers to the "version" column in table dataset_versions. Not the primary key "uuid". Which leads to 404 not found. -I checked other apis but seemed that there are no other way to get the version through "currentVersion". -Any help?

- - - -
- 👀 Maciej Obuchowski, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-07-24 00:14:43
-
-

*Thread Reply:* Like I want to change the facets of a specific dataset.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-24 16:45:18
-
-

*Thread Reply:* @Willy Lulciuc do you have any idea? 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-07-25 05:02:47
-
-

*Thread Reply:* I solved this by adding a new job which outputs to the same dataset. This ended up in a newer dataset version.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-25 06:20:58
-
-

*Thread Reply:* @Steven great to hear that you solved the issue! but there are some minor logical inconsistencies that we’d like to address with versioning (for both datasets and jobs) in Marquez. The tl;dr is the version column wasn’t meant to be used externally, but internally within Marquez. The issue is “minor” as it’s more of a pointer thing. We’ll be addressing soon. For some background, you can look at: -• https://github.com/MarquezProject/marquez/issues/2071 -• https://github.com/MarquezProject/marquez/pull/2153

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-07-25 05:06:48
-
-

Hi, -Are there any keys to set in marquez.yaml to skip db initialization and use existing db? I am deploying the marquez client on k8s client, which uses a cloud postgres. Every time I restart the marquez deployment I have to drop all those tables otherwise it will raise table already exists ERROR

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-25 06:43:32
-
-

*Thread Reply:* @Steven ahh very good point, it’s technically not “error” in the true sense, but annoying nonetheless. I think you’re referencing the init container in the Marquez helm chart? https://github.com/MarquezProject/marquez/blob/main/chart/templates/marquez/deployment.yaml#L37

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-25 06:45:24
-
-

*Thread Reply:* hmm, actually what raises the error you’re referencing? the Maruez http server?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-25 06:49:08
-
-

*Thread Reply:* > Every time I restart the marquez deployment I have to drop all those tables otherwise it will raise table already exists ERROR -This shouldn’t be an error. I’m trying to understand the scenario in which this error is thrown (any info is helpful). We use flyway to manage our db schema, but you may have gotten in an odd state somehow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-07-25 12:52:51
-
-

For Databricks notebooks, does the Spark listener work without any notebook changes? (I see that Azure Databricks -> purview needs no changes, but I’m not sure if that applies to anywhere….e.g. if I have an existing databricks notebook, and I add a spark listener, can I get column-level lineage? or do I need to change my notebook to use openlineage libraries, like I do with an arbitrary Python script?)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-31 03:35:58
-
-

*Thread Reply:* Nope, one should modify the cluster as per doc <https://openlineage.io/docs/integrations/spark/quickstart_databricks> but no changes in notebook are required.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-08-02 10:59:00
-
-

*Thread Reply:* Right, great, that’s exactly what I was hoping 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-25 15:24:17
-
-

@channel -We released OpenLineage 0.30.1, including: -Added -• Flink: support Iceberg sinks #1960 @pawel-big-lebowski -• Spark: column-level lineage for merge into on delta tables #1958 @pawel-big-lebowski -• Spark: column-level lineage for merge into on Iceberg tables #1971 @pawel-big-lebowski -• Spark: add supprt for Iceberg REST catalog #1963 @juancappi -• Airflow: add possibility to force direct-execution based on environment variable #1934 @mobuchowski -• SQL: add support for Apple Silicon to openlineage-sql-java #1981 @davidjgoss -• Spec: add facet deletion #1975 @julienledem -• Client: add a file transport #1891 @alexandre bergere -Changed -• Airflow: do not run plugin if OpenLineage provider is installed #1999 @JDarDagran -• Python: rename config to config_class #1998 @mobuchowski -Plus test improvements, docs changes, bug fixes and more. -Thanks to all the contributors, including new contributors @davidjgoss, @alexandre bergere and @Juan Manuel Cappi! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.30.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.29.2...0.30.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👏 Julian Rossi, Bernat Gabor, Anirudh Shrinivason, Maciej Obuchowski, Jens Pfau, Sheeri Cabral (Collibra) -
- -
- 👍 Athitya Kumar, Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Codrut Stoicescu - (codrut.stoicescu@gmail.com) -
-
2023-07-27 11:53:09
-
-

Hello everyone! I’m part of a team trying to integrate OpenLineage and Marquez with multiple tools in our ecosystem. Integration with Spark and Iceberg was fairly easy with the listener you guys developed. We are now trying to integrate with Ray and we are having some trouble there. I was wondering if anybody has tried any work in that direction, so we can chat and exchange ideas. Thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-27 14:47:18
-
-

*Thread Reply:* This is the first I’ve heard of someone trying to do this, but others have tried getting lineage from pandas. There isn’t support for this currently, but this thread contains a link to an issue that might be helpful: https://openlineage.slack.com/archives/C01CK9T7HKR/p1689850134978429?thread_ts=1689688067.729469&cid=C01CK9T7HKR.

-
- - -
- - - } - - Paweł Leszczyński - (https://openlineage.slack.com/team/U02MK6YNAQ5) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Codrut Stoicescu - (codrut.stoicescu@gmail.com) -
-
2023-07-28 02:10:14
-
-

*Thread Reply:* Thank you for your response. We have implemented the “manual way” of emitting events with python OL client. We are now looking for a more automated way, so that updates to the scripts that run in Ray are minimal to none

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-28 13:03:43
-
-

*Thread Reply:* If you're actively using Ray, then you know way more about it than me, or probably any other OL contributor 🙂 -I don't know how it works or is deployed, but I would recommend checking if there's robust way of being notified in the runtime about processing occuring there.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Codrut Stoicescu - (codrut.stoicescu@gmail.com) -
-
2023-07-31 12:17:07
-
-

*Thread Reply:* Thank you for the tip. That’s the kind of details I’m looking for, but couldn’t find yet

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tereza Trojanová - (tereza.trojanova@revolt.bi) -
-
2023-07-28 09:20:34
-
-

Hi, does anyone have experience integrating OpenLineage and Marquez with Keboola? I am new to OpenLineage and struggling with the KBC component configuration.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-28 10:53:35
-
-

*Thread Reply:* @Martin Fiser can you share any resources or pointers that might be helpful?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2023-08-21 19:17:17
-
-

*Thread Reply:* Hi, apologies - vacation period has hit m. However here are the resources:

- -

API endpoint: -https://app.swaggerhub.com/apis-docs/keboola/job-queue-api/1.3.4#/Jobs/getJobOpenApiLineage|job-queue-api | 1.3.4 | keboola | SwaggerHub -Dedicated component to push data into openlineage(Marquez instance): -https://components.keboola.com/components/keboola.wr-openlineage|OpenLineage data destination | Keboola Developer Portal

- - - -
- 🙌 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-07-31 12:32:22
-
-

Hi folks. I'm looking to find the complete spec in openapi format. For example, if I want to find the complete spec of 1.0.5 , where would I find that? I've looked here: https://openlineage.io/apidocs/openapi/ however when I download the spec, things are missing, specifically the facets. This makes it difficult to generate clients / backend interfaces from the (limited) openapi spec.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Silvia Pina - (silviampina@gmail.com) -
-
2023-08-01 05:14:58
-
-

*Thread Reply:* +1, I could also really use this!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Silvia Pina - (silviampina@gmail.com) -
-
2023-08-01 05:27:34
-
-

*Thread Reply:* Found a way: you download it as json in the above link (“Download OpenAPI specification”), then if you copy paste it to editor.swagger.io it asks f you want to convert to yaml :)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-08-01 10:25:49
-
-

*Thread Reply:* Whilst that works, it isn't complete. The issue is that the "facets" are not resolved. Exploring the website repository (https://github.com/OpenLineage/website/tree/main/static/spec) shows that facets aren't published alongside the spec, beyond 1.0.1 - which means its hard to know which revisions of the facets belong to which version of the spec.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Silvia Pina - (silviampina@gmail.com) -
-
2023-08-01 10:26:54
-
-

*Thread Reply:* Good point! Would be good if we could clarify how to get the full spec, in that case

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-08-01 10:30:57
-
-

*Thread Reply:* Granted. If the spec follows backwards compatible evolution rules, then this shouldn't be a problem, i.e., new fields must be optional, you can not remove existing fields, you can not modify existing fields, etc.

- - - -
- 🙌 Silvia Pina -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-01 12:15:22
-
-

*Thread Reply:* We don't have facets with newer version than 1.1.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-01 12:15:56
-
-

*Thread Reply:* @Damien Hawes we've moved to merge docs and website repos here: https://github.com/OpenLineage/docs

-
- - - - - - - -
-
Website
- <https://openlineage.io/docs> -
- -
-
Stars
- 5 -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-01 12:18:23
-
-

*Thread Reply:* > Would be good if we could clarify how to get the full spec, in that case -Is using https://github.com/OpenLineage/OpenLineage/tree/main/spec not enough? We have separate files with facets definition to be able to evolve them separetely from main spec

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-08-02 04:53:03
-
-

*Thread Reply:* @Maciej Obuchowski - thanks for your input. I understand the desire to want to evolve the facets independently from the main spec, yet I keep running into a mental wall.

- -

If I say, 'My application is compatible with OpenLineage 1.0.5' - what does that mean exactly? Does it mean that I am at least compatible with the base definition of RunEvent and its nested components, but not facets?

- -

That's what I'm finding difficult to wrap my head around. Right now, I can not define (for my own sake and the sake of my org) what 'OpenLineage 1.0.5' means.

- -

When I read the Marquez source code, I see that they state they implement 1.0.5, but again, it isn't clear what that completely entails.

- -

I hope I am making sense.

- - - -
- 👍 Silvia Pina -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-08-02 04:56:36
-
-

*Thread Reply:* If I approach this from a conventional software engineering standpoint, where I provide a library to my consumers. The library has a version associated with it, and that version encompasses all the objects located within that particular library. If I release a new version of my library, it implies that some form of evolution has happened. Whether it is a bug fix, a documentation change, or evolving the API of my objects it means something has changed and the new version is there to indicate that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-02 04:56:53
-
-

*Thread Reply:* Yes - it means you can read and understand base spec. Facets are completely optional - reading them might provide you additional information, but you as a event consumer need to define what you do with them. Basically, the needs can be very different between consumers, spec should not define behavior of a consumer.

- - - -
- 🙌 Silvia Pina -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-08-02 05:01:26
-
-

*Thread Reply:* OK. Thanks for the clarification. That clears things up for me.

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-31 16:42:48
-
-

This month’s issue of OpenLineage News was just sent out. Please to get it directly in your inbox each month!

-
-
apache.us14.list-manage.com
- - - - - - - - - - - - - - - -
- -
- - - - - - - -
- - -
- 👍 Ross Turk, Maciej Obuchowski, Shirley Lu -
- -
- 🎉 Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-01 12:35:22
-
-

Hello, I request OpenLineage release, especially for two things: -• Snowflake/HTTP/Airflow bugfix: https://github.com/OpenLineage/OpenLineage/pull/2025 -• Spec: removing refs from core: https://github.com/OpenLineage/OpenLineage/pull/1997 -Three approvals from committers will authorize release. @Michael Robinson

- - - -
- ➕ Jakub Dardziński, Harel Shein, Michael Robinson, George Polychronopoulos, Willy Lulciuc, Shirley Lu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-01 13:26:30
-
-

*Thread Reply:* Thanks, @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-01 15:43:00
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated within two business days.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-01 16:42:32
-
-

@channel -We released OpenLineage 1.0.0, featuring static lineage capability! -Added: -• Airflow: convert lineage from legacy File definition #2006 @Maciej Obuchowski -Removed: -• Spec: remove facet ref from core #1997 @JDarDagran -Changed -• Airflow: change log level to DEBUG when extractor isn’t found #2012 @kaxil -• Airflow: make sure we cannot fail in thread despite direct execution #2010 @Maciej Obuchowski -Plus test improvements, docs changes, bug fixes and more. -*See prior releases for additional changes related to static lineage. -Thanks to all the contributors, including new contributors @kaxil and @Mars Lan! -*Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.0.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.30.1...1.0.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Julian LaNeve, Bernat Gabor, Maciej Obuchowski, Peter Hicks, Ross Turk, Harel Shein, Willy Lulciuc, Paweł Leszczyński, Peter Hicks -
- -
- 🥳 Julian LaNeve, alexandre bergere, Maciej Obuchowski, Peter Hicks, Juan Manuel Cappi, Ross Turk, Harel Shein, Paweł Leszczyński, Peter Hicks -
- -
- 🚀 alexandre bergere, Peter Hicks, Ross Turk, Harel Shein, Paweł Leszczyński, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-02 08:51:57
-
-

hi folks! so happy to see that static lineage is making its way through OL. one question: is the OpenAPI spec up to date? https://openlineage.io/apidocs/openapi/ IIUC, proposal 1837 says that JobEvent and DatasetEvent can be emitted independently from RunEvents now, but it's not clear how this affected the spec.

- -

I see the Python client https://pypi.org/project/openlineage-python/1.0.0/ includes these changes already, so I assume I can go ahead and use it already? (I'm also keeping tabs on https://github.com/MarquezProject/marquez/issues/2544)

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/wslulciuc">@wslulciuc</a> -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-02 10:09:33
-
-

*Thread Reply:* I think the apidocs are not up to date 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-02 10:09:43
-
-

*Thread Reply:* https://openlineage.io/spec/2-0-2/OpenLineage.json has the newest spec

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-02 10:44:23
-
-

*Thread Reply:* thanks for the pointer @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-02 10:49:17
-
-

*Thread Reply:* Also working on updating the apidocs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-02 11:21:14
-
-

*Thread Reply:* The API docs are now up to date @Juan Luis Cano Rodríguez! Thank you for raising this issue.

- - - -
- 🙌:skin_tone_3: Juan Luis Cano Rodríguez -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-02 12:58:15
-
-

@channel -If you can, please join us in San Francisco for a meetup at Astronomer on August 30th at 5:30 PM PT. -On the agenda: a presentation by special guest @John Lukenoff plus updates on the Airflow Provider, static lineage, and more. -Food will be provided, and all are welcome. -Please https://www.meetup.com/meetup-group-bnfqymxe/events/295195280/?utmmedium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|RSVP to let us know you’re coming.

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-03 03:18:08
-
-

Hey, I hope this is the right channel for this kind of question - I’m running a tests to integrate Airflow (2.4.3) with Marquez (Openlineage 0.30.1). Currently, I’m testing the postgres operator and for some reason queries like “Copy” and “Unload” are being sent as events, but doesn’t appear in the graph. Any idea how to solve it?

- -

You can see attached

- -
  1. The graph of an airflow DAG with all the tasks beside the copy and unload.
  2. The graph with the unload task that isn’t connected to the other flow.
  3. -
- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-03 05:36:04
-
-

*Thread Reply:* I think our underlying SQL parser does not hancle the Postgres versions of those queries

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-03 05:36:14
-
-

*Thread Reply:* Can you post the (anonymized?) queries?

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-03 07:03:09
-
-

*Thread Reply:* for example

- -
copy bi.marquez_test_2 from '******' iam_role '**********' delimiter as '^' gzi
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-07 13:35:30
-
-

*Thread Reply:* @Zahi Fail iam_role suggests you want redshift version of this supported, not Postgres one right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-08 04:04:35
-
-

*Thread Reply:* @Maciej Obuchowski hey, actually I tried both Postgres and Redshift to S3 operators. -Both of them sent a new event through OL to Marquez, and still wasn’t part of the entire flow.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-04 01:40:15
-
-

Hey team! 👋

- -

We were exploring open-lineage and had a couple of questions:

- -
  1. Does open-lineage support presto-sql?
  2. Do we have any docs/benchmarks on query coverage (inner joins, subqueries, etc) & source/sink coverage (spark.read from JDBC, Files etc) for spark-sql?
  3. Can someone point to the code where we currently parse the input/output facets from the spark integration (like sql queries / transformations) and if it's extendable?
  4. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-04 02:17:19
-
-

*Thread Reply:* Hey @Athitya Kumar,

- -
  1. For parsing SQL queries, we're using sqlparser-rs (https://github.com/sqlparser-rs/sqlparser-rs) which already has great coverage of sql syntax and supports different dialects. it's open source project and we already did contribute to it for snowflake dialect.
  2. We don't have such a benchmark, but if you like, you could contribute and help us providing such. We do support joins, subqueries, iceberg and delta tables, jdbc for Spark and much more. Everything we do support, is covered in our tests.
  3. Not sure if got it properly. Marquez is our reference backend implementation which parses all the facets and stores them in relational db in a relational manner (facets, jobs, datasets and runs in separate tables).
  4. -
-
- - - - - - - -
-
Stars
- 1956 -
- -
-
Language
- Rust -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-04 02:29:53
-
-

*Thread Reply:* For (3), I was referring to where we call the sqlparser-rs in our spark-openlineage event listener / integration; and how customising/improving them would look like

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-04 02:37:20
-
-

*Thread Reply:* sqlparser-rs is a rust libary and we bundle it within iface-java (https://github.com/OpenLineage/OpenLineage/blob/main/integration/sql/iface-java/src/main/java/io/openlineage/sql/SqlMeta.java). It's capable of extracting input/output datasets, column lineage information from SQL

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-04 02:40:02
-
-

*Thread Reply:* and this is Spark code that extracts it from JdbcRelation -> https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/[…]ge/spark/agent/lifecycle/plan/handlers/JdbcRelationHandler.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-04 04:08:53
-
-

*Thread Reply:* I think 3 question relates generally to Spark SQL handling, rather than handling JDBC connections inside Spark, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-04 04:24:57
-
-

*Thread Reply:* Yup, both actually. Related to getting the JDBC connection info in the input/output facet, as well as spark-sql queries we do on that JDBC connection

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-04 06:00:17
-
-

*Thread Reply:* For Spark SQL - it's translated to Spark's internal query LogicalPlan. We take that plan, and process it's nodes. From root node we can take output dataset, from leaf nodes we can take input datasets, and inside internal nodes we track columns to extract column-level lineage. We express those (table-level) operations by implementing classes like QueryPlanVisitor

- -

You can extend that, for example for additional types of nodes that we don't support by implementing your own QueryPlanVisitor, and then implementing OpenLineageEventHandlerFactory and packaging this into a .jar deployed alongside OpenLineage jar - this would be loaded by us using Java's ServiceLoader .

- - - - - -
- 👍 Kiran Hiremath -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-08 05:06:07
-
-

*Thread Reply:* @Maciej Obuchowski @Paweł Leszczyński - Thanks for your responses! I had a follow-up query regarding the sqlparser-rs that's used internally by open-lineage: we see that these are the SQL dialects supported by sqlparser-rs here doesn't include spark-sql / presto-sql dialects which means they'd fallback to generic dialect:

- -

"--ansi" =&gt; Box::new(AnsiDialect {}), -"--bigquery" =&gt; Box::new(BigQueryDialect {}), -"--postgres" =&gt; Box::new(PostgreSqlDialect {}), -"--ms" =&gt; Box::new(MsSqlDialect {}), -"--mysql" =&gt; Box::new(MySqlDialect {}), -"--snowflake" =&gt; Box::new(SnowflakeDialect {}), -"--hive" =&gt; Box::new(HiveDialect {}), -"--redshift" =&gt; Box::new(RedshiftSqlDialect {}), -"--clickhouse" =&gt; Box::new(ClickHouseDialect {}), -"--duckdb" =&gt; Box::new(DuckDbDialect {}), -"--generic" | "" =&gt; Box::new(GenericDialect {}), -Any idea on how much coverage generic dialect provides for spark-sql / how different they are etc?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:21:32
-
-

*Thread Reply:* spark-sql integration is based on spark LogicalPlan's tree. Extracting input/output datasets from tree nodes which is more detailed than sql parsing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-08 07:04:52
-
-

*Thread Reply:* I think presto/trino dialect is very standard - there shouldn't be any problems with regular queries

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-08 11:19:53
-
-

*Thread Reply:* @Paweł Leszczyński - Got it, and would you be able to point me to where within the openlineage-spark integration do we:

- -
  1. provide the Spark Logical Plan / query to sqlparser-rs
  2. get the output of sqlparser-rs (parsed query AST) & stitch back the inputs/outputs in the open-lineage events?
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-08 12:09:06
-
-

*Thread Reply:* For example, we'd like to understand which dialectname of sqlparser-rs would be used in which scenario by open-lineage and what're the interactions b/w open-lineage & sqlparser-rs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-09 12:18:47
-
-

*Thread Reply:* @Paweł Leszczyński - Incase you missed the above messages ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-10 03:31:32
-
-

*Thread Reply:* Sqlparser-rs is used within Spark integration only for spark jdbc queries (queries to external databases). That's the only scenario. For spark.sql(...) , instead of SQL parsing, we rely on a logical plan of a job and extract information from it. For jdbc queries, that user sqlparser-rs, dialect is extracted from url: -https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/main/java/io/openlineage/spark/agent/util/JdbcUtils.java#L69

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Athitya Kumar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
nivethika R - (nivethikar8@gmail.com) -
-
2023-08-06 07:16:53
-
-

Hi.. Is column lineage available for spark version 2.4.0?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-06 17:25:31
-
-

*Thread Reply:* No, it's not.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
nivethika R - (nivethikar8@gmail.com) -
-
2023-08-06 23:53:17
-
-

*Thread Reply:* Is it only available for spark version 3+?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-07 04:53:41
-
-

*Thread Reply:* Yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHubOpenLineageIssues - (githubopenlineageissues@gmail.com) -
-
2023-08-07 11:18:25
-
-

Hi, Will really appreciate if I can learn how community have been able to harness spark integration. In our testing where a spark application writes to S3 multiple times (different location), OL generates the same job name for all writes (namepsacename.executeinsertintohadoopfsrelation_command ) rendering the OL graph final output less helpful. Say for example if I have series of transformation/writes 5 times , in Lineage graph we are just seeing last 1. There is an open bug and hopefully will be resolved soon.

- -

Curious how much is adoption of OL spark integration in presence of that bug, as generating same name for a job makes it less usable for anything other than trivial one output application.

- -

Example from 2 write application -EXPECTED : first produce weather dataset and the subsequent produce weather40. (generated/mocked using 2 spark app). (1st image) -ACTUAL OL : weather40. see only last one. (2nd image)

- -

Will really appreciate community guidance as in how successful they have been in utilizing spark integration (vanilla not Databricks) . Thank you

- -

Expected. vs Actual.

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-07 11:30:00
-
-

@channel -This month’s TSC meeting is this Thursday, August 10th at 10:00 a.m. PT. On the tentative agenda: -• announcements -• recent releases -• Airflow provider progress update -• OpenLineage 1.0 overview -• open discussion -• more (TBA) -More info and the meeting link can be found on the website. All are welcome! Also, feel free to reply or DM me with discussion topics, agenda items, etc.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Maciej Obuchowski, Athitya Kumar, Anirudh Shrinivason, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 04:39:45
-
-

I can’t see output when saveAsTable 100+ columns in spark. Any help or ideas for issue? Really thanks.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-08 04:59:23
-
-

*Thread Reply:* Does this work with similar jobs, but with small amount of columns?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:12:52
-
-

*Thread Reply:* thanks for reply @Maciej Obuchowski yes it works for small amount of columns -but not work in big amount of columns

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-08 05:14:04
-
-

*Thread Reply:* one more question: how much data the jobs approximately process and how long does the execution take?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:14:54
-
-

*Thread Reply:* ah… it’s like 20 min ~ 30 min various -data size is like 2000,0000 rows with columns 100 ~ 1000

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:15:17
-
-

*Thread Reply:* that's interesting. we could prepare integration test for that. 100 cols shouldn't make a difference

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:15:37
-
-

*Thread Reply:* honestly sorry for typo its 1000 columns

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:15:44
-
-

*Thread Reply:* pivoting features

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:16:09
-
-

*Thread Reply:* i check it works good for small numbers of columns

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-08 05:16:39
-
-

*Thread Reply:* if it's 1000, then maybe we're over event size - event is too large and backend can't accept that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-08 05:17:06
-
-

*Thread Reply:* maybe debug logs could tell us something

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:19:27
-
-

*Thread Reply:* i’ll do spark.sparkContext.setLogLevel("DEBUG") ing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:19:30
-
-

*Thread Reply:* are there any errors in the logs? perhaps pivoting uses contains nodes in SparkPlan that we don't support yet

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:19:52
-
-

*Thread Reply:* did you check pivoting that results in less columns?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-08 05:20:33
-
-

*Thread Reply:* @추호관 would also be good to disable logicalPlan facet: -spark.openlineage.facets.disabled: [spark_unknown;spark.logicalPlan] -in spark conf

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:23:40
-
-

*Thread Reply:* got it can’t we do in python config -.config("spark.dynamicAllocation.enabled", "true") \ -.config("spark.dynamicAllocation.initialExecutors", "5") \ -.config("spark.openlineage.facets.disabled", [spark_unknown;spark.logicalPlan]

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:24:31
-
-

*Thread Reply:* .config("spark.dynamicAllocation.enabled", "true") \ -.config("spark.dynamicAllocation.initialExecutors", "5") \ -.config("spark.openlineage.facets.disabled", "[spark_unknown;spark.logicalPlan]"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:24:42
-
-

*Thread Reply:* ah.. string got it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:36:03
-
-

*Thread Reply:* ah… there are no errors nor debug level issue successfully Registered listener ìo.openlineage.spark.agent.OpenLineageSparkListener

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:39:40
-
-

*Thread Reply: maybe df.groupBy(some column).pivot(some_column).agg(*agg_cols) is not supported

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:43:44
-
-

*Thread Reply:* oh.. interesting spark.openlineage.facets.disabled this option gives me output when eventType is START -“eventType”: “START” -“outputs”: [ -… -columns -…. -]

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:54:13
-
-

*Thread Reply:* Yes -"spark.openlineage.facets.disabled", "[spark_unknown;spark.logicalPlan]" <- this option gives output when eventType is START but not give output bunches of columns when that config is not set

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:55:18
-
-

*Thread Reply:* this option prevents logicalPlan being serialized and sent as a part of Openlineage event which included in one of the facets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:56:12
-
-

*Thread Reply:* possibly, serializing logicalPlans, in case of pivots, leads to size of the events that are not acceptable

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:57:56
-
-

*Thread Reply:* Ah… so you mean pivot makes serializing logical plan not availble for generating event because of size. -and disable logical plan with not serializing make availabe to generate event cuz not serialiize logical plan made by pivot

- -

Can we overcome this

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:58:48
-
-

*Thread Reply:* we've seen such issues for some plans some time ago

- - - -
- 🙌 추호관 -
- -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:59:29
-
-

*Thread Reply:* oh…. how did you solve it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:59:32
- -
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:59:51
-
-

*Thread Reply:* by excluding some properties from plan to be serialized

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 06:01:14
-
-

*Thread Reply:* here https://github.com/OpenLineage/OpenLineage/blob/c3a5211f919c01870a7f79f48588177a9b[…]io/openlineage/spark/agent/lifecycle/LogicalPlanSerializer.java we exclude certain classes

-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 추호관 -
- -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 06:02:00
-
-

*Thread Reply:* AH…. excluded properties cause ignore logical plan’s of pivointing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 06:08:25
-
-

*Thread Reply:* you can start with writing a failing test here -> https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/tes[…]/openlineage/spark/agent/lifecycle/SparkReadWriteIntegTest.java

- -

then you can try to debug logical plan trying to find out what should be excluded from it when it's being serialized. Even, if you find this difficult, a failing integration test is super helpful to let others help you in that.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 06:24:54
-
-

*Thread Reply:* okay i would look into and maybe pr thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 06:38:45
-
-

*Thread Reply:* Can I ask if there are any suspicious properties?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 06:39:25
-
-

*Thread Reply:* sure

- - - -
- 👍 추호관 -
- -
- 🙂 추호관 -
- -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 07:10:40
-
-

*Thread Reply:* Thanks I would also try to find the property too

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-08-08 05:34:46
-
-

Hi guys, I've a generic sql-parsing doubt... what would be the recommended way (if any) to check for sql similarity? I understand that most sql parsers parse the query into an AST, but are there any well known ways to measure semantic similarities between 2 or more ASTs? Just curious lol... Any ideas appreciated! Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guy Biecher - (guy.biecher21@gmail.com) -
-
2023-08-08 07:49:55
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, -I think I would take a look on this -https://sqlglot.com/sqlglot/diff.html

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-08-09 23:12:37
-
-

*Thread Reply:* Hey @Guy Biecher Yeah I was looking at this... but it seems to calculate similarity from a more textual context, as opposed to a more semantic one... -eg: SELECT ** FROM TABLE_1 and SELECT col1,col2,col3 FROM TABLE_1 could be the same semantic query, but sqlglot would give diffs in the ast because its textual...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guy Biecher - (guy.biecher21@gmail.com) -
-
2023-08-10 02:26:51
-
-

*Thread Reply:* I totally get you. In such cases without the metadata of the TABLE_1, it's impossible what I would do I would replace all ** before you use the diff function.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-08-10 07:04:37
-
-

*Thread Reply:* Yeah I was thinking about the same... But the more nested and complex your queries get, the harder it'll become to accurately pre-process before running the ast diff too... -But yeah that's probably the approach I'd be taking haha... Happy to discuss and learn if there are better ways to doing this

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-08 08:36:46
-
-

dear all, I have some novice questions. I put them in separate messages for clarity. 1st Question: I understand from the examples in the documentation that the main lineage events are RunEvent's, which can contain link to Run ID, Job ID, Dataset ID (I see they are RunEvent because they have EventType, correct?). However, the main openlineage json object contains also JobEvent and DatasetEvent. When are JobEvent and DatasetEvent supposed to be used in the workflow? Do you have relevant examples? thanks!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 09:53:05
-
-

*Thread Reply:* Hey @Luigi Scorzato! -You can read about these 2 event types in this blog post: https://openlineage.io/blog/static-lineage

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Luigi Scorzato -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 09:53:38
-
-

*Thread Reply:* we’ll work on getting the documentation improved to clarify the expected use cases for each event type. this is a relatively new addition to the spec.

- - - -
- 👍 Luigi Scorzato -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-08 10:08:28
-
-

*Thread Reply:* this sounds relevant for my 3rd question, doesn't it? But I do not see scheduling information among the use cases, am I wrong?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 11:16:39
-
-

*Thread Reply:* you’re not wrong, these 2 events were not designed for runtime lineage, but rather “static” lineage that gets emitted after the fact

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-08 08:46:39
-
-

2nd Question. I see that the input dataset appears in the RunEvent with EventType=START, the output dataset appears in the RunEvent with EventType=COMPLETE only, the RunEvent with EventType=RUNNING has no dataset attached. This makes sense for ETL jobs, but for streaming (e.g. Flink), the run could run very long and never terminate with a COMPLETE. On the other hand, emitting all the info about the output dataset in every RUNNING event would be far too verbose. What is the recommended set up in this case? TLDR: what is the recommended configuration of the frequency and data model of the lineage events for streaming systems like Flink?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 09:54:40
-
-

*Thread Reply:* great question! did you get a chance to look at the current Flink integration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-08 10:07:06
-
-

*Thread Reply:* to be honest, I only quickly went through this and I did not identfy what I needed. Can you please point me to the relevant section?

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 11:13:17
-
-

*Thread Reply:* here’s an example START event for Flink: https://github.com/OpenLineage/OpenLineage/blob/main/integration/flink/src/test/resources/events/expected_kafka.json

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 11:13:26
-
-

*Thread Reply:* or a checkpoint (RUNNING) event: https://github.com/OpenLineage/OpenLineage/blob/main/integration/flink/src/test/resources/events/expected_kafka_checkpoints.json

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 11:15:55
-
-

*Thread Reply:* generally speaking, you can see the execution contexts that invoke generation of OL events here: https://github.com/OpenLineage/OpenLineage/blob/main/integration/flink/src/main/ja[…]/openlineage/flink/visitor/lifecycle/FlinkExecutionContext.java

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Luigi Scorzato -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-08 17:46:17
-
-

*Thread Reply:* thank you! So, if I understand correctly, the key is that even eventType=START, admits an output datasets. Correct? What determines how often are the eventType=RUNNING emitted?

- - - -
- 👍 Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-09 03:25:16
-
-

*Thread Reply:* now I see, RUNNING events are emitted on onJobCheckpoint

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-08 08:59:40
-
-

3rd Question: I am looking for information about the time when the next run should start, in case of scheduled jobs. I see that the Run Facet has a Nominal Time Facet, but -- if I understand correctly -- it refers to the current run, so it is always emitted after the fact. Is the Nominal Start Time of the next run available somewhere? If not, where do you recommend to add it as a custom field? In principle, it belongs to the Job object, but would that maybe cause an undesirable fast change in the Job object?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 11:10:47
-
-

*Thread Reply:* For Airflow, this is part of the AirflowRunFacet, here: https://github.com/OpenLineage/OpenLineage/blob/81372ca2bc2afecab369eab4a54cc6380dda49d0/integration/airflow/facets/AirflowRunFacet.json#L100

- -

For other orchestrators / schedulers, that would depend..

- - - -
- 👍 Luigi Scorzato -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Kiran Hiremath - (kiran_hiremath@intuit.com) -
-
2023-08-08 10:30:56
-
-

Hi Team, Question regarding Databricks OpenLineage init script, is the path /mnt/driver-daemon/jars common to all the clusters? or its unique to each cluster? https://github.com/OpenLineage/OpenLineage/blob/81372ca2bc2afecab369eab4a54cc6380d[…]da49d0/integration/spark/databricks/open-lineage-init-script.sh

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-08 12:15:40
-
-

*Thread Reply:* I might be wrong, but I believe it's unique for each cluster - the common part is dbfs\.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-09 02:38:54
-
-

*Thread Reply:* dbfs is mounted to a databricks workspace which can run multiple clusters. so i think, it's common.

- -

Worth mentioning: init-scripts located in dbfs are becoming deprecated next month and we plan moving them into workspaces.

- - - -
- 👍 Kiran Hiremath -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Kiran Hiremath - (kiran_hiremath@intuit.com) -
-
2023-08-11 01:33:24
-
-

*Thread Reply:* yes, the init scripts are moved at workspace level.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHubOpenLineageIssues - (githubopenlineageissues@gmail.com) -
-
2023-08-08 14:19:40
-
-

Hi @Paweł Leszczyński Will really aprecaite if you please let me know once this PR is good to go. Will love to test in our environment : https://github.com/OpenLineage/OpenLineage/pull/2036. Thank you for all your help.

-
- - - - - - - -
-
Labels
- integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-09 02:35:28
-
-

*Thread Reply:* great to hear. I still need some time as there are few corner cases. For example: what should be the behaviour when alter table rename is called 😉 But sure, you can test it if you like. ci is failing on integration tests but ./gradlew clean build with unit tests are fine.

- - - -
- :gratitude_thank_you: GitHubOpenLineageIssues -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-10 03:33:50
-
-

*Thread Reply:* @GitHubOpenLineageIssues Feel invited to join todays community and advocate for the importance of this issue. Such discussions are extremely helpful in prioritising backlog the right way.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Gaurav Singh - (gaurav.singh@razorpay.com) -
-
2023-08-09 07:54:33
-
-

Hi Team, -I'm doing a POC with open lineage to extract column lineage of Spark. I'm using it on databricks notebook. I'm facing a issue where I,m trying to get the column lineage in a join involving external tables on s3. The lineage that is being extracted is returning on base path of the table ie on the s3 file path and not on the corresponding tables. Is there a way to extract/map columns of output to the columns of base tables instead of storage location.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Gaurav Singh - (gaurav.singh@razorpay.com) -
-
2023-08-09 07:55:28
-
-

*Thread Reply:* Query: -INSERT INTO test.merchant_md -(Select - m.`id`, - m.name, - m.activated, - m.parent_id, - md.contact_name, - md.contact_email -FROM - test.merchants_0 m - LEFT JOIN merchant_details md ON m.id = md.merchant_id -WHERE - m.created_date &gt; '2023-08-01')

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Gaurav Singh - (gaurav.singh@razorpay.com) -
-
2023-08-09 08:01:56
-
-

*Thread Reply:* "columnLineage":{ - "_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.30.1/integration/spark>", - "_schemaURL":"<https://openlineage.io/spec/facets/1-0-1/ColumnLineageDatasetFacet.json#/$defs/ColumnLineageDatasetFacet>", - "fields":{ - "merchant_id":{ - "inputFields":[ - { - "namespace":"<s3a://datalake>", - "name":"/test/merchants", - "field":"id" - } - ] - }, - "merchant_name":{ - "inputFields":[ - { - "namespace":"<s3a://datalake>", - "name":"/test/merchants", - "field":"name" - } - ] - }, - "activated":{ - "inputFields":[ - { - "namespace":"<s3a://datalake>", - "name":"/test/merchants", - "field":"activated" - } - ] - }, - "parent_id":{ - "inputFields":[ - { - "namespace":"<s3a://datalake>", - "name":"/test/merchants", - "field":"parent_id" - } - ] - }, - "contact_name":{ - "inputFields":[ - { - "namespace":"<s3a://datalake>", - "name":"/test/merchant_details", - "field":"contact_name" - } - ] - }, - "contact_email":{ - "inputFields":[ - { - "namespace":"<s3a://datalake>", - "name":"/test/merchant_details", - "field":"contact_email" - } - ] - } - } - }, - "symlinks":{ - "_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.30.1/integration/spark>", - "_schemaURL":"<https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet>", - "identifiers":[ - { - "namespace":"/warehouse/test.db", - "name":"test.merchant_md", - "type":"TABLE" - }

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Gaurav Singh - (gaurav.singh@razorpay.com) -
-
2023-08-09 08:23:57
-
-

*Thread Reply:* "contact_name":{ - "inputFields":[ - { - "namespace":"<s3a://datalake>", - "name":"/test/merchant_details", - "field":"contact_name" - } - ] - } -This is returning mapping from the s3 location on which the table is created.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-09 10:56:27
-
-

Hey, -I’m running Spark application (spark version 3.4) with OL integration. -I changed spark to use “debug” level, and I see the OL events with the below message: -“Emitting lineage completed successfully:”

- -

With all the above, I can’t see the event in Marquez.

- -

Attaching the OL configurations. -When changing the OL-spark version to 0.6.+, I do see event created in Marquez with only “Start” status (attached below).

- -

The OL-spark version is matching the Spark version? Is there a known issues with the Spark / OL versions ?

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-09 11:23:42
-
-

*Thread Reply:* > OL-spark version to 0.6.+ -This OL version is ancient. You can try with 1.0.0

- -

I think you're hitting this issue which duplicates jobs: https://github.com/OpenLineage/OpenLineage/issues/1943

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- -
-
Labels
- integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-10 01:46:08
-
-

*Thread Reply:* I haven’t mentioned that I tried multiple OL versions - 1.0.0 / 0.30.1 / 0.6.+ … -None of them worked for me. -@Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 05:25:49
-
-

*Thread Reply:* @Zahi Fail understood. Can you provide sample job that reproduces this behavior, and possibly some logs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 05:26:11
-
-

*Thread Reply:* If you can, it might be better to create issue at github and communicate there.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-10 08:34:01
-
-

*Thread Reply:* Before creating an issue in GIT, I wanted to check if my issue only related to versions compatibility..

- -

This is the sample of my test: -```from pyspark.sql import SparkSession -from pyspark.sql.functions import col

- -

spark = SparkSession.builder\ - .config('spark.jars.packages', 'io.openlineage:openlineage_spark:1.0.0') \ - .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') \ - .config('spark.openlineage.host', 'http://localhost:9000') \ -.config('spark.openlineage.namespace', 'default') \ - .getOrCreate()

- -

spark.sparkContext.setLogLevel("DEBUG")

- -

csv_file = location.csv

- -

df = spark.read.format("csv").option("header","true").option("sep","^").load(csv_file)

- -

df = df.select("campaignid","revenue").groupby("campaignid").sum("revenue").show()``` -Part of the logs with the OL configurations and the processed event

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 08:40:13
-
-

*Thread Reply:* try spark.openlineage.transport.url instead of spark.openlineage.host

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 08:40:27
-
-

*Thread Reply:* and possibly link the doc where you've seen spark.openlineage.host 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-10 08:59:27
-
-

*Thread Reply:* https://openlineage.io/blog/openlineage-spark/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- -
- - - - - - - -
- - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-10 09:04:56
-
-

*Thread Reply:* changing to “spark.openlineage.transport.url” didn’t make any change

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 09:09:42
-
-

*Thread Reply:* do you see the ConsoleTransport log? it suggests Spark integration did not register that you want to send events to Marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 09:10:09
-
-

*Thread Reply:* let's try adding spark.openlineage.transport.type to http

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-10 09:14:50
-
-

*Thread Reply:* Now it works !

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-10 09:14:58
-
-

*Thread Reply:* thanks @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 09:23:04
-
-

*Thread Reply:* Cool 🙂 however it should not require it if you provide spark.openlineage.transport.url - I'll create issue for debugging that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-09 14:37:24
-
-

@channel -This month’s TSC meeting is tomorrow! All are welcome. https://openlineage.slack.com/archives/C01CK9T7HKR/p1691422200847979

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-10 02:11:07
-
-

While using the spark integration, we're unable to see the query in the job facet for any spark-submit - is this a known issue/limitation, and can someone point to the code where this is currently extracted / can be enhanced?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-10 02:55:46
-
-

*Thread Reply:* Let me first rephrase my understanding of the question assume a user runs spark.sql('INSERT INTO ...'). Are we able to include sql queryINSERT INTO ...within SQL facet?

- -

We once had a look at it and found it difficult. Given an SQL, spark immediately translates it to a logical plan (which our integration is based on) and we didn't find any place where we could inject our code and get access to sql being run.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-10 04:27:51
-
-

*Thread Reply:* Got it. So for spark.sql() - there's no interaction with sqlparser-rs and we directly try stitching the input/output & column lineage from the spark logical plan. Would something like this fall under the spark.jdbc() route or the spark.sql() route (say, if the df is collected / written somewhere)?

- -

val df = spark.read.format("jdbc") - .option("url", url) - .option("user", user) - .option("password", password) - .option("fetchsize", fetchsize) - .option("driver", driver)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 05:15:17
-
-

*Thread Reply:* @Athitya Kumar I understand your issue. From my side, there's one problem with this - potentially there can be multiple queries for one spark job. You can imagine something like joining results of two queries - possible to separate systems - and then one SqlJobFacet would be misleading. This needs more thorough spec discussion

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-10 05:33:47
-
-

Hi Team, has anyone experience with integrating OpenLineage with the SAP ecosystem? And with Salesforce/MuleSoft?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 05:40:47
-
-

Hi, -Are there any ways to save list of string directly in the dataset facets? Such as the myfacets field in this dict -"facets": { - "metadata_facet": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/client/python>", - "_schemaURL": "<https://sth/schemas/facets.json#/definitions/SomeFacet>", - "myfacets": ["a", "b", "c"] - } - }

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 05:42:20
-
-

*Thread Reply:* I'm using python OpenLineage package and extend the BaseFacet class

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 05:53:57
-
-

*Thread Reply:* for custom facets, as long as it's valid json - go for it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 05:55:03
-
-

*Thread Reply:* However I tried to insert a list of string. And I tried to get the dataset, the returned valued of that list field is empty.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 05:55:57
-
-

*Thread Reply:* @attr.s -class MyFacet(BaseFacet): - columns: list[str] = attr.ib() -Here's my python code.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 05:59:02
-
-

*Thread Reply:* How did you emit, serialized the event, and where did you look when you said you tried to get the dataset?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 06:00:27
-
-

*Thread Reply:* I assume the problem is somewhere there, not on the level of facet definition, since SchemaDatasetFacet looks pretty much the same and it works

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 06:00:54
-
-

*Thread Reply:* I use the python openlineage client to emit the RunEvent. -openlineage_client.emit( - RunEvent( - eventType=RunState.COMPLETE, - eventTime=datetime.now().isoformat(), - run=run, - job=job, - producer=PRODUCER, - outputs=outputs, - ) - ) -And use marquez to visualize the get data result

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 06:02:12
-
-

*Thread Reply:* Yah, list of objects is working, but list of string is not.😩

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 06:03:23
-
-

*Thread Reply:* I think the problem is related to the openlineage package openlineage.client.serde.py. The function Serde.to_json()

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 06:05:56
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 06:19:34
-
-

*Thread Reply:* I think the code here filters out those string values in the list

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 06:21:39
-
-

*Thread Reply:* 👀

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 06:24:48
-
-

*Thread Reply:* Yah, the value in list will end up False in this code and be filtered out -isinstance(_x_, dict)

- -

😳

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 06:26:33
-
-

*Thread Reply:* wow, that's right 😬

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 06:26:47
-
-

*Thread Reply:* want to create PR fixing that?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 06:27:20
-
-

*Thread Reply:* Sure! May do this later tomorrow.

- - - -
- 👍 Maciej Obuchowski, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 23:59:28
-
-

*Thread Reply:* I created the pr at https://github.com/OpenLineage/OpenLineage/pull/2044 -But the ci on integration-test-integration-spark FAILED

-
- - - - - - - -
-
Labels
- client/python -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-11 04:17:01
-
-

*Thread Reply:* @Steven sorry for that - some tests require credentials that are not present on the forked versions of CI. It will work once I push it to origin. Anyway Spark tests failing aren't blocker for this Python PR

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-11 04:17:45
-
-

*Thread Reply:* I would only ask to add some tests for that case with facets containing list of string

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-11 04:18:21
-
-

*Thread Reply:* Yeah sure, I will add them now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-11 04:25:19
-
-

*Thread Reply:* ah we had other CI problem, go version was too old in one of the jobs - neverthless I won't judge your PR on stuff failing outside your PR anyway 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-11 04:36:57
-
-

*Thread Reply:* LOL🤣 I've added some tests and made a force push

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
savan - (SavanSharan_Navalgi@intuit.com) -
-
2023-10-20 08:31:45
-
-

*Thread Reply:* @GitHubOpenLineageIssues -I am trying to contribute to Integration tests which is listed here as good first issue -the CONTRIBUTING.md mentions that i can trigger CI for integration tests from forked branch. -using this tool. -but i am unable to do so, is there a way to trigger CI from forked brach or do i have to get permission from someone to run the CI?

- -

i am getting this error when i run this command sudo git-push-fork-to-upstream-branch upstream savannavalgi:hacktober -> Username for '<https://github.com>': savannavalgi -&gt; Password for '<https://savannavalgi@github.com>': -&gt; remote: Permission to OpenLineage/OpenLineage.git denied to savannavalgi. -&gt; fatal: unable to access '<https://github.com/OpenLineage/OpenLineage.git/>': The requested URL returned error: 403 -i have tried to configure ssh key -also tried to trigger CI from another brach, -and tried all of this after fetching the latest upstream

- -

cc: @Athitya Kumar @Maciej Obuchowski @Steven

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-23 04:57:44
-
-

*Thread Reply:* what PR is the probem related to? I can run git-push-fork-to-upstream-branch for you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
savan - (SavanSharan_Navalgi@intuit.com) -
-
2023-10-25 01:08:41
-
-

*Thread Reply:* @Paweł Leszczyński thanks for approving my PR - ( link )

- -

I will make the changes needed for the new integration test case for drop table (good first issue) , in another PR, -I would need your help to run the integration tests again, thank you

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
savan - (SavanSharan_Navalgi@intuit.com) -
-
2023-10-26 07:48:52
-
-

*Thread Reply:* @Paweł Leszczyński -opened a PR ( link ) for integration test for drop table -can you please help run the integration test

-
- - - - - - - -
-
Labels
- documentation, integration/spark -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-26 07:50:29
-
-

*Thread Reply:* sure, some of our tests require access to S3/BigQuery secret keys, so will not work automatically from the fork, and require action on our side. working on that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
savan - (SavanSharan_Navalgi@intuit.com) -
-
2023-10-29 09:31:22
-
-

*Thread Reply:* thanks @Paweł Leszczyński -let me know if i can help in any way

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
savan - (SavanSharan_Navalgi@intuit.com) -
-
2023-11-15 02:31:50
-
-

*Thread Reply:* @Paweł Leszczyński any action item on my side?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-11 07:36:57
-
-

Hey folks! 👋

- -

Had a query/observation regarding columnLineage inferred in spark integration - opened this issue for the same. Basically, when we do something like this in our spark-sql: -SELECT t1.c1, t1.c2, t1.c3, t2.c4 FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1 AND t1.c2 = t2.c2 -The expected column lineage for output table t3 is: -t3.c1 -&gt; Comes from both t1.c1 &amp; t2.c1 (SELECT + JOIN clause) -t3.c2 -&gt; Comes from both t1.c2 &amp; t2.c2 (SELECT + JOIN clause) -t3.c3 -&gt; Comes from t1.c3 -t3.c4 -&gt; Comes from t2.c4 -However, actual column lineage for output table t3 is: -t3.c1 -&gt; Comes from t1.c1 (Only based on SELECT clause) -t3.c2 -&gt; Comes from t1.c1 (Only based on SELECT clause) -t3.c3 -&gt; Comes from t1.c3 -t3.c4 -&gt; Comes from t2.c4 -Is this a known issue/behaviour?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-11 09:18:44
-
-

*Thread Reply:* Hmm... this is kinda "logical" difference - is column level lineage taken from actual "physical" operations - like in this case, we always take from t1 - or from "logical" where t2 is used only for predicate, yet we still want to indicate it as a source?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-11 09:18:58
-
-

*Thread Reply:* I think your interpretation is more useful

- - - -
- 🙏 Athitya Kumar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-11 09:25:03
-
-

*Thread Reply:* @Maciej Obuchowski - Yup, especially for use-cases where we wanna depend on column lineage for impact analysis, I think we should be considering even predicates. For example, if t2.c1 / t2.c2 gets corrupted or dropped, the query would be impacted - which means that we should be including even predicates (t2.c1 / t2.c2) in the column lineage imo

- -

But is there any technical limitation if we wanna implement this / make an OSS contribution for this (like logical predicate columns not being part of the spark logical plan object that we get in the PlanVisitor or something like that)?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-11 11:14:58
-
-

*Thread Reply:* It's probably a bit of work, but can't think it's impossible on parser side - @Paweł Leszczyński will know better about spark collection

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2023-08-11 12:45:34
-
-

*Thread Reply:* This is a case where it would be nice to have an alternate indication (perhaps in the Column lineage facet?) for this type of "suggested" lineage. As noted, this is especially important for impact analysis purposes. We (and I believe others do the same or similar) call that "indirect" lineage at Manta.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-11 12:49:10
-
-

*Thread Reply:* Something like additional flag in inputFields, right?

- - - -
- 👍 Athitya Kumar, Ernie Ostic, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-14 02:36:34
-
-

*Thread Reply:* Yes, this would require some extension to the spec. What do you mean spark-sql : spark.sql() with some spark query or SQL in spark JDBC?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-15 15:16:49
-
-

*Thread Reply:* Sorry, missed your question @Paweł Leszczyński. By spark-sql, I'm referring to the former: spark.sql() with some spark query

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-16 03:10:57
-
-

*Thread Reply:* cc @Jens Pfau - you may be also interested in extending column level lineage facet.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-08-22 02:23:08
-
-

*Thread Reply:* Hi, is there a github issue for this feature? Seems like a really cool and exciting functionality to have!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-22 08:03:49
-
-

*Thread Reply:* @Anirudh Shrinivason - Are you referring to this issue: https://github.com/OpenLineage/OpenLineage/issues/2048?

-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
- ✅ Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-14 05:13:48
-
-

Hey team 👋

- -

Is there a way we can feed the logical plan directly to check the open-lineage events being built, without actually running a spark-job with open-lineage configs? Basically interested to see if we can mock a dry-run of a spark job w/ open-lineage by mimicking the logical plan 😄

- -

cc @Shubh

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-14 06:00:21
-
-

*Thread Reply:* Not really I think - the integration does not rely purely on the logical plan

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-14 06:00:44
-
-

*Thread Reply:* At least, not in all cases. For some maybe

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-14 07:34:39
-
-

*Thread Reply:* We're using pretty similar approach in our column level lineage tests where we run some spark commands, register custom listener https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/tes[…]eage/spark/agent/util/LastQueryExecutionSparkEventListener.java which catches the logical plan. Further we run our tests on the captured logical plan.

- -

The difference here, between what you're asking about, is that we still have an access to the same spark session.

- -

In many cases, our integration uses active Spark session to fetch some dataset details. This happens pretty often (like fetch dataset location) and cannot be taken just from a Logical Plan.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-14 11:03:28
-
-

*Thread Reply:* @Paweł Leszczyński - We're mainly interested to see the inputs/outputs (mainly column schema and column lineage) for different logical plans. Is that something that could be done in a static manner without running spark jobs in your opinion?

- -

For example, I know that we can statically create logical plans

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-16 03:05:44
-
-

*Thread Reply:* The more we talk the more I am wondering what is the purpose of doing so? Do you want to test openlineage coverage or is there any production scenario where you would like to apply this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-16 04:01:39
-
-

*Thread Reply:* @Paweł Leszczyński - This is for testing openlineage coverage so that we can be more confident on what're the happy path scenarios and what're the scenarios where it may not work / work partially etc

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-16 04:22:01
-
-

*Thread Reply:* If this is for testing, then you're also capable of mocking some SparkSession/catalog methods when Openlineage integration tries to access them. If you want to reuse LogicalPlans from your prod environment, you will encounter logicalplan serialization issues. On the other hand, if you generate logical plans from some example Spark jobs, then the same can be easier achieved in a way the integration tests are run with mockserver.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-14 09:45:31
-
-

Hi Team,

- -

Spark & Databricks related question: Starting 1st September Databricks is going to block running init_scripts located in dbfs and this is the way our integration works (https://www.databricks.com/blog/securing-databricks-cluster-init-scripts).

- -

We have two ways of mitigating this in our docs and quickstart: - (1) move initscripts to workspace - (2) move initscripts to S3

- -

None of them is perfect. (1) requires creating init_script file manually through databricks UI and copy/paste its content. I couldn't find the way to load it programatically. (2) requires quickstart user to have s3 bucket access.

- -

Would love to hear your opinion on this. Perhaps there's some better way to do that. Thanks. `

-
-
Databricks
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-08-15 01:13:49
-
-

*Thread Reply:* We're uploading the init scripts to s3 via tf. But yeah ig there are some access permissions that the user needs to have

- - - -
- :gratitude_thank_you: Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-16 07:32:00
-
-

*Thread Reply:* Hello -I am new here and I am asking why do you need an init script ? -If it's a spark integration we can just specify --package=io.openlineage...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-16 07:41:25
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/databricks/open-lineage-init-script.sh -> I think the issue was in having openlineage-jar installed immediately on the classpath bcz it's required when OpenLineageSparkListener is instantiated. It didn't work without it.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-16 07:43:55
-
-

*Thread Reply:* Yes it happens if you use --jars s3://.../...openlineage-spark-VERSION.jar parameter. (I made a ticket for this issue in Databricks support) -But if you use --package io.openlineage... (the package will be downloaded from maven) it works fine.

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-16 07:47:50
-
-

*Thread Reply:* I think they don't use the right class loader.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-16 08:36:14
-
-

*Thread Reply:* To make sure: are you able to run Openlineage & Spark on Databricks Runtime without init_scripts?

- -

I was doing this a second ago and this ended up with Caused by: java.lang.ClassNotFoundException: io.openlineage.spark.agent.OpenLineageSparkListener not found in com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader@1609ed55

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Alexandre Campelo - (aleqi200@gmail.com) -
-
2023-08-14 19:49:00
-
-

Hello, I just downloaded Marquez and I'm trying to send a sample request but I'm getting a 403 (forbidden). Any idea how to find the authentication details?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Alexandre Campelo - (aleqi200@gmail.com) -
-
2023-08-15 12:19:34
-
-

*Thread Reply:* Ok, nevermind. I figured it out. The port 5000 is reserved in MACOS so I had to start on port 9000 instead.

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-08-15 01:25:48
-
-

Hi, I noticed that while capturing lineage for merge into commands, some of the tables/columns are unaccounted for the lineage. Example: -```fdummyfunnelstg = spark.sql("""WITH dummyfunnel AS ( - SELECT ** - FROM fdummyfunnelone - WHERE dateid BETWEEN {startdateid} AND {enddateid}

- -
        UNION ALL
-
-        SELECT **
-        FROM f_dummy_funnel_two
-        WHERE date_id BETWEEN {start_date_id} AND {end_date_id}
-
-        UNION ALL
-
-        SELECT **
-        FROM f_dummy_funnel_three
-        WHERE date_id BETWEEN {start_date_id} AND {end_date_id}
-
-        UNION ALL
-
-        SELECT **
-        FROM f_dummy_funnel_four
-        WHERE date_id BETWEEN {start_date_id} AND {end_date_id}
-
-        UNION ALL
-
-        SELECT **
-        FROM f_dummy_funnel_five
-        WHERE date_id BETWEEN {start_date_id} AND {end_date_id}
-
-    )
-    SELECT DISTINCT
-        dummy_funnel.customer_id,
-        dummy_funnel.product,
-        dummy_funnel.date_id,
-        dummy_funnel.country_id,
-        dummy_funnel.city_id,
-        dummy_funnel.dummy_type_id,
-        dummy_funnel.num_attempts,
-        dummy_funnel.num_transactions,
-        dummy_funnel.gross_merchandise_value,
-        dummy_funnel.sub_category_id,
-        dummy_funnel.is_dummy_flag
-    FROM dummy_funnel
-    INNER JOIN d_dummy_identity as dummy_identity
-        ON dummy_identity.id = dummy_funnel.customer_id
-    WHERE
-        date_id BETWEEN {start_date_id} AND {end_date_id}""")
-
- -

spark.sql(f""" - MERGE INTO {tablename} - USING fdummyfunnelstg - ON - fdummyfunnelstg.customerid = {tablename}.customerid - AND fdummyfunnelstg.product = {tablename}.product - AND fdummyfunnelstg.dateid = {tablename}.dateid - AND fdummyfunnelstg.countryid = {tablename}.countryid - AND fdummyfunnelstg.cityid = {tablename}.cityid - AND fdummyfunnelstg.dummytypeid = {tablename}.dummytypeid - AND fdummyfunnelstg.subcategoryid = {tablename}.subcategoryid - AND fdummyfunnelstg.isdummyflag = {tablename}.isdummyflag - WHEN MATCHED THEN - UPDATE SET - {tablename}.numattempts = fdummyfunnelstg.numattempts - , {tablename}.numtransactions = fdummyfunnelstg.numtransactions - , {tablename}.grossmerchandisevalue = fdummyfunnelstg.grossmerchandisevalue - WHEN NOT MATCHED - THEN INSERT ( - customerid, - product, - dateid, - countryid, - cityid, - dummytypeid, - numattempts, - numtransactions, - grossmerchandisevalue, - subcategoryid, - isdummyflag - ) - VALUES ( - fdummyfunnelstg.customerid, - fdummyfunnelstg.product, - fdummyfunnelstg.dateid, - fdummyfunnelstg.countryid, - fdummyfunnelstg.cityid, - fdummyfunnelstg.dummytypeid, - fdummyfunnelstg.numattempts, - fdummyfunnelstg.numtransactions, - fdummyfunnelstg.grossmerchandisevalue, - fdummyfunnelstg.subcategoryid, - fdummyfunnelstg.isdummyflag - ) - """)`` -In cases like this, I notice that the full lineage is not actually captured... I'd expect to see this having 5 upstreams:dummyfunnelone, dummyfunneltwo, dummyfunnelthree, dummyfunnelfour, dummyfunnel_five` , but I notice only 1-2 upstreams for this case... -Would like to learn more about why this might happen, and whether this is expected behaviour or not. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-15 06:48:43
-
-

*Thread Reply:* Would be useful to see generated event or any logs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-16 03:09:05
-
-

*Thread Reply:* @Anirudh Shrinivason what if there is just one union instead of four? What if there are just two columns selected instead of 10? What if inner join is skipped? Does merge into matter?

- -

The smaller SQL to reproduce the problem, the easier it is to find the root cause. Most of the issues are reproducible with just few lines of code.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-08-16 03:34:30
-
-

*Thread Reply:* Yup let me try to identify the cause from my end. Give me some time haha. I'll reach out again once there is more clarity on the occurence

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-16 07:33:21
-
-

Hello,

- -

The OpenLineage Databricks integration is not working properly in our side which due to filtering adaptive_spark_plan

- -

Please find the issue link.

- -

https://github.com/OpenLineage/OpenLineage/issues/2058

-
- - - - - - - - - - - - - - - - -
- - - -
- ⬆️ Mouad MOUSSABBIH, Abdallah -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-16 09:24:09
-
-

*Thread Reply:* thanks @Abdallah for the thoughtful issue that you submitted! -was wondering if you’d consider opening up a PR? would love to help you as a contributor is that’s something you are interested in.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 11:59:51
-
-

*Thread Reply:* Hello

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 11:59:58
-
-

*Thread Reply:* Yes I am working on it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 12:00:14
-
-

*Thread Reply:* I deleted the line that has that filter.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 12:00:24
-
-

*Thread Reply:* I am adding some tests now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 12:00:45
-
-

*Thread Reply:* But running -./gradlew --no-daemon databricksIntegrationTest -x test -Pspark.version=3.4.0 -PdatabricksHost=$DATABRICKS_HOST -PdatabricksToken=$DATABRICKS_TOKEN

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 12:01:11
-
-

*Thread Reply:* gives me -A problem occurred evaluating project ':app'. -&gt; Could not resolve all files for configuration ':app:spark33'. - &gt; Could not resolve io.openlineage:openlineage_java:1.1.0-SNAPSHOT. - Required by: - project :app &gt; project :shared - &gt; Could not resolve io.openlineage:openlineage_java:1.1.0-SNAPSHOT. - &gt; Unable to load Maven meta-data from <https://astronomer.jfrog.io/artifactory/maven-public-libs-snapshot/io/openlineage/openlineage-java/1.1.0-SNAPSHOT/maven-metadata.xml>. - &gt; org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 326; The reference to entity "display" must end with the ';' delimiter. - &gt; Could not resolve io.openlineage:openlineage_sql_java:1.1.0-SNAPSHOT. - Required by: - project :app &gt; project :shared - &gt; Could not resolve io.openlineage:openlineage_sql_java:1.1.0-SNAPSHOT. - &gt; Unable to load Maven meta-data from <https://astronomer.jfrog.io/artifactory/maven-public-libs-snapshot/io/openlineage/openlineage-sql-java/1.1.0-SNAPSHOT/maven-metadata.xml>. - &gt; org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 326; The reference to entity "display" must end with the ';' delimiter.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 12:01:25
-
-

*Thread Reply:* And I am trying to understand what should I do.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 12:13:37
-
-

*Thread Reply:* I am compiling sql integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 13:04:15
-
-

*Thread Reply:* I built the java client

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 13:04:29
-
-

*Thread Reply:* but having -A problem occurred evaluating project ':app'. -&gt; Could not resolve all files for configuration ':app:spark33'. - &gt; Could not resolve io.openlineage:openlineage_java:1.1.0-SNAPSHOT. - Required by: - project :app &gt; project :shared - &gt; Could not resolve io.openlineage:openlineage_java:1.1.0-SNAPSHOT. - &gt; Unable to load Maven meta-data from <https://astronomer.jfrog.io/artifactory/maven-public-libs-snapshot/io/openlineage/openlineage-java/1.1.0-SNAPSHOT/maven-metadata.xml>. - &gt; org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 326; The reference to entity "display" must end with the ';' delimiter. - &gt; Could not resolve io.openlineage:openlineage_sql_java:1.1.0-SNAPSHOT. - Required by: - project :app &gt; project :shared - &gt; Could not resolve io.openlineage:openlineage_sql_java:1.1.0-SNAPSHOT. - &gt; Unable to load Maven meta-data from <https://astronomer.jfrog.io/artifactory/maven-public-libs-snapshot/io/openlineage/openlineage-sql-java/1.1.0-SNAPSHOT/maven-metadata.xml>. - &gt; org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 326; The reference to entity "display" must end with the ';' delimiter.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-17 14:47:41
-
-

*Thread Reply:* Please do ./gradlew publishToMavenLocal in client/java directory

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 14:47:59
-
-

*Thread Reply:* Okay thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 14:48:01
-
-

*Thread Reply:* will do

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:33:02
-
-

*Thread Reply:* Hello back

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:33:12
-
-

*Thread Reply:* I created a databricks cluster.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:35:00
-
-

*Thread Reply:* And I had somme issues that -PdatabricksHost doesn't work with System.getProperty("databricksHost") So I changed to -DdatabricksHost with System.getenv("databricksHost")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:36:19
-
-

*Thread Reply:* Then I had some issue that the path dbfs:/databricks/openlineage/ doesn't exist, I, then, created the folder /dbfs/databricks/openlineage/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:38:03
-
-

*Thread Reply:* And now I am investigating this issue : -java.lang.NullPointerException - at io.openlineage.spark.agent.DatabricksUtils.uploadOpenlineageJar(DatabricksUtils.java:226) - at io.openlineage.spark.agent.DatabricksUtils.init(DatabricksUtils.java:66) - at io.openlineage.spark.agent.DatabricksIntegrationTest.setup(DatabricksIntegrationTest.java:54) - at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) - at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) - at ... -worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74) - Suppressed: com.databricks.sdk.core.DatabricksError: Missing required field: cluster_id - at app//com.databricks.sdk.core.error.ApiErrors.readErrorFromResponse(ApiErrors.java:48) - at app//com.databricks.sdk.core.error.ApiErrors.checkForRetry(ApiErrors.java:22) - at app//com.databricks.sdk.core.ApiClient.executeInner(ApiClient.java:236) - at app//com.databricks.sdk.core.ApiClient.getResponse(ApiClient.java:197) - at app//com.databricks.sdk.core.ApiClient.execute(ApiClient.java:187) - at app//com.databricks.sdk.core.ApiClient.POST(ApiClient.java:149) - at app//com.databricks.sdk.service.compute.ClustersImpl.delete(ClustersImpl.java:31) - at app//com.databricks.sdk.service.compute.ClustersAPI.delete(ClustersAPI.java:191) - at app//com.databricks.sdk.service.compute.ClustersAPI.delete(ClustersAPI.java:180) - at app//io.openlineage.spark.agent.DatabricksUtils.shutdown(DatabricksUtils.java:96) - at app//io.openlineage.spark.agent.DatabricksIntegrationTest.shutdown(DatabricksIntegrationTest.java:65) - at -...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:39:22
-
-

*Thread Reply:* Suppressed: com.databricks.sdk.core.DatabricksError: Missing required field: cluster_id

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:40:18
-
-

*Thread Reply:* at io.openlineage.spark.agent.DatabricksUtils.uploadOpenlineageJar(DatabricksUtils.java:226)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:54:51
-
-

*Thread Reply:* I did this !echo "xxx" &gt; /dbfs/databricks/openlineage/openlineage-spark-V.jar

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:55:29
-
-

*Thread Reply:* To create some fake file that can be deleted in uploadOpenlineageJar function.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:56:09
-
-

*Thread Reply:* Because if there is no file, this part fails -StreamSupport.stream( - workspace.dbfs().list("dbfs:/databricks/openlineage/").spliterator(), false) - .filter(f -&gt; f.getPath().contains("openlineage-spark")) - .filter(f -&gt; f.getPath().endsWith(".jar")) - .forEach(f -&gt; workspace.dbfs().delete(f.getPath()));

- - - -
- 😬 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-22 11:47:17
-
-

*Thread Reply:* does this work after -!echo "xxx" &gt; /dbfs/databricks/openlineage/openlineage-spark-V.jar -?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 11:47:36
-
-

*Thread Reply:* Yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 19:02:05
-
-

*Thread Reply:* I am now having another error in the driver

- -

23/08/22 22:56:26 ERROR SparkContext: Error initializing SparkContext. -org.apache.spark.SparkException: Exception when registering SparkListener - at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:3121) - at org.apache.spark.SparkContext.&lt;init&gt;(SparkContext.scala:835) - at com.databricks.backend.daemon.driver.DatabricksILoop$.$anonfun$initializeSharedDriverContext$1(DatabricksILoop.scala:362) -... - at com.databricks.DatabricksMain.main(DatabricksMain.scala:146) - at com.databricks.backend.daemon.driver.DriverDaemon.main(DriverDaemon.scala) -Caused by: java.lang.ClassNotFoundException: io.openlineage.spark.agent.OpenLineageSparkListener not found in com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader@298cfe89 - at com.databricks.backend.daemon.driver.ClassLoaders$MultiReplClassLoader.loadClass(ClassLoaders.scala:115) - at java.lang.ClassLoader.loadClass(ClassLoader.java:352) - at java.lang.Class.forName0(Native Method) - at java.lang.Class.forName(Class.java:348) - at org.apache.spark.util.Utils$.classForName(Utils.scala:263)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 19:19:29
-
-

*Thread Reply:* Can you please share with me your json conf for the cluster ?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 19:55:57
-
-

*Thread Reply:* It's because in mu build file I have

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 19:56:27
-
-

*Thread Reply:* and the one that was copied is

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 20:01:12
-
-

*Thread Reply:* due to the findAny 😕 -private static void uploadOpenlineageJar(WorkspaceClient workspace) { - Path jarFile = - Files.list(Paths.get("../build/libs/")) - .filter(p -&gt; p.getFileName().toString().startsWith("openlineage-spark-")) - .filter(p -&gt; p.getFileName().toString().endsWith("jar")) - .findAny() - .orElseThrow(() -&gt; new RuntimeException("openlineage-spark jar not found"));

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 20:35:10
-
-

*Thread Reply:* It works finally 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 05:16:19
-
-

*Thread Reply:* The PR 😄 -https://github.com/OpenLineage/OpenLineage/pull/2061

-
- - - - - - - -
-
Labels
- integration/spark -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 08:23:49
-
-

*Thread Reply:* thanks for the pr 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 08:24:02
-
-

*Thread Reply:* code formatting checks complain now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 08:25:09
-
-

*Thread Reply:* for the JAR issues, do you also want to create PR as you've fixed the issue on your end?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 09:06:26
-
-

*Thread Reply:* @Abdallah you're using newer version of Java than 8, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 09:07:07
-
-

*Thread Reply:* AFAIK googleJavaFormat behaves differently between Java versions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 09:15:41
-
-

*Thread Reply:* Okay I will switch back to another java version

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 09:25:06
-
-

*Thread Reply:* terra@MacBook-Pro-M3 spark % java -version -java version "1.8.0_381" -Java(TM) SE Runtime Environment (build 1.8.0_381-b09) -Java HotSpot(TM) 64-Bit Server VM (build 25.381-b09, mixed mode)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 09:28:28
-
-

*Thread Reply:* Can you tell me which java version should I use ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 09:49:42
-
-

*Thread Reply:* Hello, I have -@mobuchowski ERROR: Missing environment variable {i} -Can you please check what does it come from ?

-
- - - - - - - -
-
Company
- @getindata -
- -
-
Location
- Warsaw -
- -
-
Repositories
- 16 -
- -
-
Followers
- 25 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 09:50:24
-
-

*Thread Reply:* Can you help please ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 10:08:43
-
-

*Thread Reply:* Java 8

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 10:10:14
-
-

*Thread Reply:* ```Hello, I have

- -

@mobuchowski ERROR: Missing environment variable {i} -Can you please check what does it come from ? (edited) ``` -Yup, for now I have to manually make our CI account pick your changes up if you make PR from fork. Just did that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 10:11:10
- -
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 10:53:34
-
-

*Thread Reply:* @Abdallah merged 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 10:59:22
-
-

*Thread Reply:* Thank you !

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-16 14:21:26
-
-

@channel -Meetup notice: on Monday, 9/18, at 5:00 pm ET OpenLineage will be gathering in Toronto at Airflow Summit. Coming to the summit? Based in or near Toronto? Please join us to discuss topics such as: -• recent developments in the project including the addition of static lineage support and the OpenLineage Airflow Provider, -• the project’s history and architecture, -• opportunities to contribute, -• resources for getting started, -• + more. -Please visit medium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|the meetup page> for the specific location (which is not the conference hotel) and to sign up. Hope to see some of you there! (Please note that the start time is 5:00 pm ET.)

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
- ❤️ Julien Le Dem, Maciej Obuchowski, Harel Shein, Paweł Leszczyński, Athitya Kumar, tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-20 17:45:41
-
-

i saw OpenLineage was built into Airflow recently as a provider but the documentation seems really light (https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html), is the documentation from openlineage the correct way I should proceed?

- -

https://openlineage.io/docs/integrations/airflow/usage

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-08-21 20:26:56
-
-

*Thread Reply:* openlineage-airflow is the package maintained in the OpenLineage project and to be used for versions of Airflow before 2.7. You could use it with 2.7 as well but you’d be staying on the “old” integration. -apache-airflow-providers-openlineage is the new package, maintained in the Airflow project that can be used starting Airflow 2.7 and is the recommended package moving forward. It is compatible with the configuration of the old package described in that usage page. CC: @Maciej Obuchowski @Jakub Dardziński It looks like this page needs improvement.

-
-
PyPI
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-22 05:03:28
-
-

*Thread Reply:* Yeah, I'll fix that

- - - -
- :gratitude_thank_you: Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-22 17:55:08
-
-

*Thread Reply:* https://github.com/apache/airflow/pull/33610

- -

fyi

-
- - - - - - - -
-
Labels
- area:providers, kind:documentation, provider:openlineage -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- 🙌 ldacey, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-22 17:54:20
-
-

do I label certain raw data sources as a dataset, for example SFTP/FTP sites, 0365 emails, etc? I extract that data into a bucket for the client in a "folder" called "raw" which I know will be an OL Dataset. Would this GCS folder (after extracting the data with Airflow) be the first Dataset OL is aware of?

- -

<gcs://client-bucket/source-system-lob/raw>

- -

I then process that data into partitioned parquet datasets which would also be OL Datasets: -<gcs://client-bucket/source-system-lob/staging> -<gcs://client-bucket/source-system-lob/analytics>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-22 18:02:46
-
-

*Thread Reply:* that really depends on the use case IMHO -if you consider a whole directory/folder as a dataset (meaning that each file inside folds into a larger whole) you should label dataset as directory

- -

you might as well have directory with each file being something different - in this case it would be best to set each file separately as dataset

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-22 18:04:32
-
-

*Thread Reply:* there was also SymlinksDatasetFacet introduced to store alternative dataset names, might be useful: https://github.com/OpenLineage/OpenLineage/pull/936

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-22 18:07:26
-
-

*Thread Reply:* cool, yeah in general each file is just a snapshot of data from a client (for example, daily dump). the parquet datasets are normally partitioned and might have small fragments and I definitely picture it as more of a table than individual files

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 08:22:09
-
-

*Thread Reply:* Agree with Jakub here - with object storage, people use different patterns, but usually some directory layer vs file is the valid abstraction level, especially if your pattern is adding files with new data inside

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-25 10:26:52
-
-

*Thread Reply:* I tested a dataset for each raw file versus the folder and the folder looks much cleaner (not sure if I can collapse individual datasets/files into a group?)

- -

from 2022, this particular source had 6 raw schema changes (client controlled, no warning). what should I do to make that as obvious as possible if I track the dataset at a folder level?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-25 10:32:19
-
-

*Thread Reply:* I was thinking that I could name the dataset based on the schema_version (identified by the raw column names), so in this example I would have 6 OL datasets feeding into one "staging" dataset

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-25 10:32:57
-
-

*Thread Reply:* not sure what the best practice would be in this scenario though

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-22 17:55:38
-
-

• also saw the docs reference URI = gs://{bucket name}{path} and I wondered if the path would include the filename, or if it was just the base path like I showed above

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-22 18:35:45
-
-

Has anyone managed to get the OL Airflow integration to work on AWS MWAA? We've tried pretty much every trick but still ended up with the following error: -Broken plugin: [openlineage.airflow.plugin] No module named 'openlineage.airflow'; 'openlineage' is not a package

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 05:22:18
-
-

*Thread Reply:* Which version are you trying to use?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 05:22:45
-
-

*Thread Reply:* Both OL and MWAA/Airflow 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 05:23:52
-
-

*Thread Reply:* 'openlineage' is not a package -suggests that something went wrong with import process, for example cycle in import path

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-23 16:50:34
-
-

*Thread Reply:* MWAA: 2.6.3 -OL: 1.0.0

- -

I can see from the log that OL has been successfully installed to the webserver: -Successfully installed openlineage-airflow-1.0.0 openlineage-integration-common-1.0.0 openlineage-python-1.0.0 openlineage-sql-1.0.0 -This is the full stacktrace: -```Traceback (most recent call last):

- -

File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/pluginsmanager.py", line 229, in loadentrypointplugins -pluginclass = entrypoint.load() -File "/usr/local/airflow/.local/lib/python3.10/site-packages/importlibmetadata/init.py", line 209, in load -module = importmodule(match.group('module')) -File "/usr/lib/python3.10/importlib/init.py", line 126, in importmodule -return bootstrap.gcdimport(name[level:], package, level) -File "<frozen importlib.bootstrap>", line 1050, in gcdimport -File "<frozen importlib.bootstrap>", line 1027, in _findandload -File "<frozen importlib.bootstrap>", line 992, in findandloadunlocked -File "<frozen importlib.bootstrap>", line 241, in _callwithframesremoved -File "<frozen importlib.bootstrap>", line 1050, in _gcdimport -File "<frozen importlib.bootstrap>", line 1027, in _findandload -File "<frozen importlib.bootstrap>", line 1001, in findandloadunlocked -ModuleNotFoundError: No module named 'openlineage.airflow'; 'openlineage' is not a package```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-24 08:18:36
-
-

*Thread Reply:* It’s taking long to update MWAA environment but I tested 2.6.3 version with the followingrequirements.txt: -openlineage-airflow -and -openlineage-airflow==1.0.0 -is there any step that might lead to some unexpected results?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 08:29:30
-
-

*Thread Reply:* Yeah, it takes forever to update MWAA even for a simple change. If you open either the webserver log (in CloudWatch) or the AirFlow UI, you should see the above error message.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-24 08:33:53
-
-

*Thread Reply:* The thing is that I don’t see any error messages. -I wrote simple DAG to test too: -```from future import annotations

- -

from datetime import datetime

- -

from airflow.models import DAG

- -

try: - from airflow.operators.empty import EmptyOperator -except ModuleNotFoundError: - from airflow.operators.dummy import DummyOperator as EmptyOperator # type: ignore

- -

from openlineage.airflow.adapter import OpenLineageAdapter -from openlineage.client.client import OpenLineageClient

- -

from airflow.operators.python import PythonOperator

- -

DAGID = "exampleol"

- -

def callable(): - client = OpenLineageClient() - adapter = OpenLineageAdapter() - print(client, adapter)

- -

with DAG( - dagid=DAGID, - startdate=datetime(2021, 1, 1), - schedule="@once", - catchup=False, -) as dag: - begin = EmptyOperator(taskid="begin")

- -
test = PythonOperator(task_id='print_client', python_callable=callable)```
-
- -

and it gives expected results as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 08:48:11
-
-

*Thread Reply:* Oh how interesting. I did have a plugin that sets the endpoint & key via env var. Let me try to disable that to see if it fixes the issue. Will report back after 30 mins, or however long it takes to update MWAA 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-24 08:50:05
-
-

*Thread Reply:* ohh, I see -you probably followed this guide: https://aws.amazon.com/blogs/big-data/automate-data-lineage-on-amazon-mwaa-with-openlineage/?

-
-
Amazon Web Services
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 09:04:27
-
-

*Thread Reply:* Actually no. I'm not aware of this guide. I assume it's outdated already?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-24 09:04:54
-
-

*Thread Reply:* tbh I don’t know

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 09:04:55
-
-

*Thread Reply:* Actually while we're on that topic, what's the recommended way to pass the URL & API Key in MWAA?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-24 09:28:00
-
-

*Thread Reply:* I think it's still a plugin that sets env vars

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 09:32:18
-
-

*Thread Reply:* Yeah based on the page you shared, secret manager + plugin seems like the way to go.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 10:31:50
-
-

*Thread Reply:* Alas after disabling the plugin and restarting the cluster, I'm still getting the same error. Do you mind to share a screenshot of your cluster's settings so I can compare?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-24 11:57:04
-
-

*Thread Reply:* Are you maybe importing some top level OpenLineage code anywhere? This error is most likely circular import

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 12:01:12
-
-

*Thread Reply:* Let me try removing all the dags to see if it helps.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 18:42:49
-
-

*Thread Reply:* @Maciej Obuchowski you were correct! It was indeed the DAGs. The errors are gone after removing all the dags. Now just need to figure what caused the circular import since I didn't import OL directly in DAG.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 18:44:33
-
-

*Thread Reply:* Could this be the issue? -from airflow.lineage.entities import File, Table -How could I declare lineage manually if I can't import these classes?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-25 06:52:47
-
-

*Thread Reply:* @Mars Lan I'll look in more details next week, as I'm in transit now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-25 06:53:18
-
-

*Thread Reply:* but if you could narrow down a problem to single dag that I or @Jakub Dardziński could reproduce, ideally locally, it would help a lot

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-25 07:07:11
-
-

*Thread Reply:* Thanks. I think I understand how this works much better now. Found a few useful BQ example dags. Will give them a try and report back.

- - - -
- 🔥 Jakub Dardziński, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Nitin - (nitinkhannain@yahoo.com) -
-
2023-08-23 07:14:44
-
-

Hi All, -I want to capture, source and target table details as lineage information with openlineage for Amazon Redshift. Please let me know, if anyone has done it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-23 07:32:19
-
-

*Thread Reply:* are you using Airflow to connect to Redshift?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nitin - (nitinkhannain@yahoo.com) -
-
2023-08-24 06:50:05
-
-

*Thread Reply:* Hi @Jakub Dardziński, -Thank you for your reply. -No, we are not using Airflow. -We are using load/Unload cmd with Pyspark and also Pandas with JDBC connection

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-25 13:28:37
-
-

*Thread Reply:* @Paweł Leszczyński might know answer if Spark<->OL integration works with Redshift. Eventually JDBC is supported with sqlparser

- -

for Pandas I think there wasn’t too much work done

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-28 02:18:49
-
-

*Thread Reply:* @Nitin If you're using jdbc within Spark, the lineage should be obtained via sqlparser-rs library https://github.com/sqlparser-rs/sqlparser-rs. In case it's not, please try to provide some minimal SQL code (or pyspark) which leads to uncaught lineage.

-
- - - - - - - -
-
Stars
- 1980 -
- -
-
Language
- Rust -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nitin - (nitinkhannain@yahoo.com) -
-
2023-08-28 04:53:03
-
-

*Thread Reply:* Hi @Jakub Dardziński / @Paweł Leszczyński, thank you for taking out time to reply on my query. We need to capture only load and unload query lineage which we are running using Spark.

- -

If you have any sample implementation for reference, it will be indeed helpful

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-28 06:12:46
-
-

*Thread Reply:* I think we don't support load yet on our side: https://github.com/OpenLineage/OpenLineage/blob/main/integration/sql/impl/src/visitor.rs#L8

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nitin - (nitinkhannain@yahoo.com) -
-
2023-08-28 08:18:14
-
-

*Thread Reply:* Yeah! any way you can think of, we can accommodate it specially load and unload statement. -Also, we would like to capture, lineage information where our endpoints are Sagemaker and Redis

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nitin - (nitinkhannain@yahoo.com) -
-
2023-08-28 13:20:37
-
-

*Thread Reply:* @Paweł Leszczyński can we use this code base integration/common/openlineage/common/provider/redshift_data.py for redshift lineage capture

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-28 14:26:40
-
-

*Thread Reply:* it still expects input and output tables that are usually retrieved from sqlparser

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-28 14:31:00
-
-

*Thread Reply:* for Sagemaker there is an Airflow integration written, might be an example possibly -https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/openlineage/airflow/extractors/sagemaker_extractors.py

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 10:55:10
-
-

Approve a new release please 🙂 -• Fix spark integration filtering Databricks events.

- - - -
- ➕ Abdallah, Tristan GUEZENNEC -CROIX-, Mouad MOUSSABBIH, Ayoub Oudmane, Asmae Tounsi, Jakub Dardziński, Michael Robinson, Harel Shein, Willy Lulciuc, Maciej Obuchowski, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-23 12:27:15
-
-

*Thread Reply:* Thank you for requesting a release @Abdallah. Three +1s from committers will authorize.

- - - -
- 🙌 Abdallah -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-23 13:13:18
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated within 2 business days.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-23 13:08:48
-
-

Hey folks! Do we have clear step-by-step documentation on how we can leverage the ServiceLoader based approach for injecting specific OpenLineage customisations for tweaking the transport type with defaults / tweaking column level lineage etc?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 13:24:32
- -
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 13:29:05
-
-

*Thread Reply:* For custom transport, you have to provide implementation of interface https://github.com/OpenLineage/OpenLineage/blob/4a1a5c3bf9767467b71ca0e1b6d820ba9e[…]ain/java/io/openlineage/client/transports/TransportBuilder.java and point to it in META_INF file

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 13:29:52
-
-

*Thread Reply:* But if I understand correctly, if you want to change behavior rather than extend, the correct way may be to either contribute it to repo - if that behavior is useful to anyone, or fork the repo

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-23 15:14:43
-
-

*Thread Reply:* @Maciej Obuchowski - Can you elaborate more on the "point to it in META_INF file"? Let's say we have the custom transport type built in a standalone jar by extending transport builder - what're the exact next steps to use this custom transport in the standalone jar when doing spark-submit?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 15:23:13
-
-

*Thread Reply:* @Athitya Kumar your jar needs to have META-INF/services/io.openlineage.client.transports.TransportBuilder with fully qualified class names of your custom TransportBuilders there - like openlineage-spark has -io.openlineage.client.transports.HttpTransportBuilder -io.openlineage.client.transports.KafkaTransportBuilder -io.openlineage.client.transports.ConsoleTransportBuilder -io.openlineage.client.transports.FileTransportBuilder -io.openlineage.client.transports.KinesisTransportBuilder

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-25 01:49:29
-
-

*Thread Reply:* @Maciej Obuchowski - I think this change may be required for consumers to leverage custom transports, can you check & verify this GH comment? -https://github.com/OpenLineage/OpenLineage/issues/2007#issuecomment-1690350630

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-25 06:52:30
-
-

*Thread Reply:* Probably, I will look at more details next week @Athitya Kumar as I'm in transit

- - - -
- 👍 Athitya Kumar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-23 15:04:10
-
-

@channel -We released OpenLineage 1.1.0, including: -Additions: -• Flink: create Openlineage configuration based on Flink configuration #2033 @pawel-big-lebowski -• Java: add Javadocs to the Java client #2004 @julienledem -• Spark: append output dataset name to a job name #2036 @pawel-big-lebowski -• Spark: support Spark 3.4.1 #2057 @pawel-big-lebowski -Fixes: -• Flink: fix a bug when getting schema for KafkaSink #2042 @pentium3 -• Spark: fix ignored event adaptive_spark_plan in Databricks #2061 @algorithmy1 -Plus additional bug fixes, doc changes and more. -Thanks to all the contributors, especially new contributors @pentium3 and @Abdallah! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.1.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.0.0...1.1.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👏 Ayoub Oudmane, Abdallah, Yuanli Wang, Athitya Kumar, Mars Lan, Maciej Obuchowski, Harel Shein, Kiran Hiremath, Thomas Abraham -
- -
- :gratitude_thank_you: GitHubOpenLineageIssues -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-25 10:29:23
-
-

@channel -Friendly reminder: our next in-person meetup is next Wednesday, August 30th in San Francisco at Astronomer’s offices in the Financial District. You can sign up and find the details on the medium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|meetup event page>.

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-08-25 10:57:30
-
-

hi Openlineage team , we would like to join one of your meetups(me and @Madhav Kakumani nad @Phil Rolph and we're wondering if you are hosting any meetups after the 18/9 ? We are trying to join this but air - tickets are quite expensive

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-25 11:32:12
-
-

*Thread Reply:* there will certainly be more meetups, don’t worry about that!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-25 11:32:30
-
-

*Thread Reply:* where are you located? perhaps we can try to organize a meetup closer to where you are.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-08-25 11:49:37
-
-

*Thread Reply:* Thanks a lot for the response, we are in London. We'd be glad to help you organise a meetup and also meet in person!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-25 11:51:39
-
-

*Thread Reply:* This is awesome, thanks @George Polychronopoulos. I’ll start a channel and invite you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-28 04:47:53
-
-

hi folks, I'm looking into exporting static metadata, and found that DatasetEvent requires a eventTime, which in my mind doesn't make sense for static events. I'm setting it to None and the Python client seems to work, but wanted to ask if I'm missing something.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-28 05:59:10
-
-

*Thread Reply:* Although you emit DatasetEvent, you still emit an event and eventTime is a valid marker.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-28 06:01:40
-
-

*Thread Reply:* so, should I use the current time at the moment of emitting it and that's it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-28 06:01:53
-
-

*Thread Reply:* yes, that should be it

- - - -
- :gratitude_thank_you: Juan Luis Cano Rodríguez -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-28 04:49:21
-
-

and something else: I understand that Marquez does not yet support the 2.0 spec, hence it's incompatible with static metadata right? I tried to emit a list of DatasetEvent s and got HTTPError: 422 Client Error: Unprocessable Entity for url: <http://localhost:3000/api/v1/lineage> (I'm using a FileTransport for now)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-28 06:02:49
-
-

*Thread Reply:* marquez is not capable of reflecting DatasetEvents in DB but it should respond with Unsupported event type

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-28 06:03:15
-
-

*Thread Reply:* and return 200 instead of 201 created

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-28 06:05:41
-
-

*Thread Reply:* I'll have a deeper look then, probably I'm doing something wrong. thanks @Paweł Leszczyński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Joshua Dotson - (josdotso@cisco.com) -
-
2023-08-28 13:25:58
-
-

Hi folks. I have some pure golang jobs from which I need to emit OL events to Marquez. Is the right way to go about this to generate a Golang client from the Marquez OpenAPI spec and use that client from my go jobs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-28 14:23:24
-
-

*Thread Reply:* I'd rather generate them from OL spec (compliant with JSON Schema)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Joshua Dotson - (josdotso@cisco.com) -
-
2023-08-28 15:12:21
-
-

*Thread Reply:* I'll look into this. I take you to mean that I would use the OL spec which is available as a set of JSON schemas to create the data object and then HTTP POST it using vanilla Golang. Is that correct? Thank you for your help!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-28 15:30:05
-
-

*Thread Reply:* Correct! You’re also very welcome to contribute Golang client (currently we have Python & Java clients) if you manage to send events using golang 🙂

- - - -
- 👏 Joshua Dotson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-28 17:28:31
-
-

@channel -The agenda for the medium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|Toronto Meetup at Airflow Summit> on 9/18 has been updated. This promises to be an exciting, richly productive discussion. Don’t miss it if you’ll be in the area!

- -
  1. Intros
  2. Evolution of spec presentation/discussion (project background/history)
  3. State of the community
  4. Spark/Column lineage update
  5. Airflow Provider update
  6. Roadmap Discussion
  7. Action items review/next steps
  8. -
-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
- ❤️ Jarek Potiuk, Paweł Leszczyński, tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-28 20:05:37
-
-

New on the OpenLineage blog: a close look at the new OpenLineage Airflow Provider, including: -• the critical improvements it brings to the integration -• the high-level design -• implementation details -• an example operator -• planned enhancements -• a list of supported operators -• more. -The post, by @Maciej Obuchowski, @Julien Le Dem and myself is live now on the OpenLineage blog.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🎉 Drew Meyers, Harel Shein, Maciej Obuchowski, Julian LaNeve, Mars Lan -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sarwat Fatima - (sarwatfatimam@gmail.com) -
-
2023-08-29 03:18:04
-
-

Hello, I'm currently in the process of following the instructions outlined in the provided getting started guide at https://openlineage.io/getting-started/. However, I've encountered a problem while attempting to complete *Step 1* of the guide. Unfortunately, I'm encountering an internal server error at this stage. I did manage to successfully run Marquez, but it appears that there might be an issue that needs to be addressed. I have attached screen shots.

- -
- - - - - - - - - -
-
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-29 03:20:18
-
-

*Thread Reply:* is 5000 port taken by any other application? or ./docker/up.sh has some errors in logs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sarwat Fatima - (sarwatfatimam@gmail.com) -
-
2023-08-29 05:23:01
-
-

*Thread Reply:* @Jakub Dardziński 5000 port is not taken by any other application. The logs show some errors but I am not sure what is the issue here.

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-29 10:02:38
-
-

*Thread Reply:* I think Marquez is running on WSL while you're trying to connect from host computer?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-29 05:20:39
-
-

hi folks, for now I'm producing .jsonl (or .ndjson ) files with one event per line, do you know if there's any way to validate those? would standard JSON Schema tools work?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-29 10:58:29
-
-

*Thread Reply:* reply by @Julian LaNeve: yes 🙂💯

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-29 13:12:32
-
-

for namespaces, if my data is moving between sources (SFTP -> GCS -> Azure Blob (synapse connects to parquet datasets) then should my namespace be based on the client I am working with? my current namespace has been to refer to the bucket, but that falls apart when considering the data sources and some destinations. perhaps I should just add a field for client-name instead to have a consolidated view?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-30 10:53:08
-
-

*Thread Reply:* > then should my namespace be based on the client I am working with? -I think each of those sources should be a different namespace?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-30 12:59:53
-
-

*Thread Reply:* got it, yeah I was kind of picturing as one namespace for the client (we handle many clients but they are completely distinct entities). I was able to get it to work with multiple namespaces like you suggested and Marquez was able to plot everything correctly in the visualization

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-30 13:01:18
-
-

*Thread Reply:* I noticed some of my Dataset facets make more sense as Run facets, for example, the name of the specific file I processed and how many rows of data / size of the data for that schedule. that won't impact the Run facets Airflow provides right? I can still have the schedule information + my custom run facets?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-30 13:06:38
-
-

*Thread Reply:* Yes, unless you name it the same as one of the Airflow facets 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHubOpenLineageIssues - (githubopenlineageissues@gmail.com) -
-
2023-08-30 08:15:29
-
-

Hi, Will really appreciate if someone can guide me or provide me any pointer - if they have been able to implement authentication/authorization for access to Marquez. Have not seen much info around it. Any pointers greatly appreciated. Thanks in advance.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-08-30 12:23:18
-
-

*Thread Reply:* I’ve seen people do this through the ingress controller in Kubernetes. Unfortunately I don’t have documentation besides k8s specific ones you would find for the ingress controller you’re using. You’d redirect any unauthenticated request to your identity provider

- - - -
- :gratitude_thank_you: GitHubOpenLineageIssues -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-30 11:50:05
-
-

@channel -Friendly reminder: there’s a meetup tonight at Astronomer’s offices in SF!

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-08-30 12:15:31
-
-

*Thread Reply:* I’ll be there and looking forward to see @John Lukenoff ‘s presentation

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Barrientos - (mbarrien@gmail.com) -
-
2023-08-30 21:38:31
-
-

Can anyone let 3 people stuck downstairs into the 7th floor?

- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-08-30 23:25:21
-
-

*Thread Reply:* Sorry about that!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yunhe - (yunhe52203334@outlook.com) -
-
2023-08-31 02:31:48
-
-

hello,everyone,i can run openLineage spark code in my notebook with python,but when use my idea to execute scala code like this: -import org.apache.spark.internal.Logging -import org.apache.spark.sql.SparkSession -import io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml -import org.apache.spark.scheduler.{SparkListener, SparkListenerApplicationEnd, SparkListenerApplicationStart} -import sun.java2d.marlin.MarlinUtils.logInfo -object Test { - def main(args: Array[String]): Unit = {

- -
val spark = SparkSession
-  .builder()
-  .master("local")
-  .appName("test")
-  .config("spark.jars.packages","io.openlineage:openlineage_spark:0.12.0")
-  .config("spark.extraListeners","io.openlineage.spark.agent.OpenLineageSparkListener")
-  .config("spark.openlineage.transport.type","console")
-  .getOrCreate()
-
-spark.sparkContext.setLogLevel("INFO")
-
-//spark.sparkContext.addSparkListener(new MySparkAppListener)
-import spark.implicits._
-val input = Seq((1, "zs", 2020), (2, "ls", 2023)).toDF("id", "name", "year")
-
-input.select("id", "name").orderBy("id").show()
-
- -

}

- -

}

- -

there is something wrong: -Exception in thread "spark-listener-group-shared" java.lang.NoSuchMethodError: io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml(Ljava/io/InputStream;)Lio/openlineage/client/OpenLineageYaml; - at io.openlineage.spark.agent.ArgumentParser.extractOpenlineageConfFromSparkConf(ArgumentParser.java:114) - at io.openlineage.spark.agent.ArgumentParser.parse(ArgumentParser.java:78) - at io.openlineage.spark.agent.OpenLineageSparkListener.initializeContextFactoryIfNotInitialized(OpenLineageSparkListener.java:277) - at io.openlineage.spark.agent.OpenLineageSparkListener.onApplicationStart(OpenLineageSparkListener.java:267) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:55) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

- -

i want to know how can i set idea scala environment correctly

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-31 02:58:41
-
-

*Thread Reply:* io.openlineage:openlineage_spark:0.12.0 -> could you repeat the steps with newer version?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yunhe - (yunhe52203334@outlook.com) -
-
2023-08-31 03:51:52
-
-

ok,it`s my first use thie lineage tool. first,I added dependences in my pom.xml like this: -<dependency> - <groupId>io.openlineage</groupId> - <artifactId>openlineage-java</artifactId> - <version>0.12.0</version> - </dependency> - <dependency> - <groupId>org.apache.logging.log4j</groupId> - <artifactId>log4j-api</artifactId> - <version>2.7</version> - </dependency> - <dependency> - <groupId>org.apache.logging.log4j</groupId> - <artifactId>log4j-core</artifactId> - <version>2.7</version> - </dependency> - <dependency> - <groupId>org.apache.logging.log4j</groupId> - <artifactId>log4j-slf4j-impl</artifactId> - <version>2.7</version> - </dependency> - <dependency> - <groupId>io.openlineage</groupId> - <artifactId>openlineage-spark</artifactId> - <version>0.30.1</version> - </dependency>

- -

my spark version is 3.3.1 and the version can not change

- -

second, in file Openlineage/intergration/spark I enter command : docker-compose up and follow the steps in this doc: -https://openlineage.io/docs/integrations/spark/quickstart_local -there is no erro when i use notebook to execute pyspark for openlineage and I could get json message. -but after I enter "docker-compose up" ,I want to use my Idea tool to execute scala code like above,the erro happend like above. It seems that I does not configure the environment correctly. so how can i fix the problem .

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-01 05:15:28
-
-

*Thread Reply:* please use latest io.openlineage:openlineage_spark:1.1.0 instead. openlineage-java is already contained in the jar, no need to add it on your own.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-08-31 15:33:19
-
-

Will the August meeting be put up at https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting soon? (usually it’s up in a few days 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-01 06:00:53
-
-

*Thread Reply:* @Michael Robinson

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-01 17:13:32
-
-

*Thread Reply:* The recording is on the youtube channel here. I’ll update the wiki ASAP

-
-
YouTube
- -
- - - } - - OpenLineage Project - (https://www.youtube.com/@openlineageproject6897) -
- - - - - - - - - - - - - - - - - -
- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-08-31 18:10:20
-
-

It sounds like there have been a few announcements at Google Next: -https://cloud.google.com/data-catalog/docs/how-to/open-lineage -https://cloud.google.com/dataproc/docs/guides/lineage

-
-
Google Cloud
- - - - - - - - - - - - - - - - - -
-
-
Google Cloud
- - - - - - - - - - - - - - - - - -
- - - -
- 🎉 Harel Shein, Willy Lulciuc, Kevin Languasco, Peter Hicks, Maciej Obuchowski, Paweł Leszczyński, Sheeri Cabral (Collibra), Ross Turk, Michael Robinson, Jakub Dardziński, Kiran Hiremath, Laurent Paris, Anastasia Khomyakova -
- -
- 🙌 Harel Shein, Willy Lulciuc, Mars Lan, Peter Hicks, Maciej Obuchowski, Paweł Leszczyński, Eric Veleker, Sheeri Cabral (Collibra), Ross Turk, Michael Robinson -
- -
- ❤️ Willy Lulciuc, Maciej Obuchowski, ldacey, Ross Turk, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-09-01 23:09:55
-
-

*Thread Reply:* https://www.youtube.com/watch?v=zvCdrNJsxBo&t=2260s

-
-
YouTube
- -
- - - } - - Google Cloud - (https://www.youtube.com/@googlecloud) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-01 17:16:21
-
-

@channel -The latest issue of OpenLineage News is out now! Please subscribe to get it directly in your inbox each month.

-
-
apache.us14.list-manage.com
- - - - - - - - - - - - - - - -
- - - -
- 🙌 Jakub Dardziński, Maciej Obuchowski -
- -
- 🙌:skin_tone_3: Juan Luis Cano Rodríguez -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-09-04 03:38:28
-
-

Hi guys, I'd like to capture the spark.databricks.clusterUsageTags.clusterAllTags property from databricks. However, the value of this is a list of keys, and therefore cannot be supported by custom environment facet builder. -I was thinking that capturing this property might be useful for most databricks workloads, and whether it might make sense to auto-capture it along with other databricks variables, similar to how we capture mount points for the databricks jobs. -Does this sound okay? If so, then I can help to contribute this functionality

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-04 06:43:47
-
-

*Thread Reply:* Sounds good to me

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-09-11 05:15:03
-
-

*Thread Reply:* Added this here: https://github.com/OpenLineage/OpenLineage/pull/2099

-
- - - - - - - -
-
Labels
- integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-09-04 06:39:05
-
-

Also, another small clarification is that when using MergeIntoCommand, I'm receiving the lineage events on the backend, but I cannot seem to find any logging of the payload when I enable debug mode in openlineage. I remember there was a similar issue reported by another user in the past. May I check if it might be possible to help with this? It's making debugging quite hard for these cases. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-04 06:54:12
-
-

*Thread Reply:* I think it only depends on log4j configuration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-04 06:57:15
-
-

*Thread Reply:* ```# Set everything to be logged to the console -log4j.rootCategory=INFO, console -log4j.appender.console=org.apache.log4j.ConsoleAppender -log4j.appender.console.target=System.err -log4j.appender.console.layout=org.apache.log4j.PatternLayout -log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

- -

set the log level for the openlineage spark library

- -

log4j.logger.io.openlineage.spark=DEBUG`` -this is what we have inlog4j.properties` in test environment and it works

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-09-04 11:28:11
-
-

*Thread Reply:* Hmm... I can see the logs for the other commands, like createViewCommand etc. I just cannot see it for any of the delta runs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-05 03:33:03
-
-

*Thread Reply:* that's interesting. So, logging is done here: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/main/java/io/openlineage/spark/agent/EventEmitter.java#L63 and this code is unaware of delta.

- -

The possible problem could be filtering delta events (which we do bcz of delta being noisy)

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-05 03:33:36
-
-

*Thread Reply:* Recently, we've closed that https://github.com/OpenLineage/OpenLineage/issues/1982 which prevents generating events for ` -createOrReplaceTempView

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- -
-
Labels
- integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-05 03:35:12
-
-

*Thread Reply:* and this is the code change: https://github.com/OpenLineage/OpenLineage/pull/1987/files

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-09-05 05:19:22
-
-

*Thread Reply:* Hmm I'm a little confused here. I thought we are only filtering out events for certain specific commands, like show table etc. because its noisy right? Some important commands like MergeInto or SaveIntoDataSource used to be logged before, but I notice now that its not being logged anymore... -I'm using 0.23.0 openlineage version.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-05 05:47:51
-
-

*Thread Reply:* yes, we do. it's just sometimes when doing a filter, we can remove too much. but SaveIntoDataSource and MergeInto should be fine, as we do check them within the tests

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-04 21:35:05
-
-

it looks like my dynamic task mapping in Airflow has the same run ID in marquez, so even if I am processing 100 files, there is only one version of the data. is there a way to have a separate version of each dynamic task so I can track the filename etc?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-05 08:54:57
-
-

*Thread Reply:* map_index should be indeed included when calculating run ID (it’s deterministic in Airflow integration) -what version of Airflow are you using btw?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-05 09:04:14
-
-

*Thread Reply:* 2.7.0

- -

I do see this error log in all of my dynamic tasks which might explain it:

- -

[2023-09-05, 00:31:57 UTC] {manager.py:200} ERROR - Extractor returns non-valid metadata: None -[2023-09-05, 00:31:57 UTC] {utils.py:401} ERROR - cannot import name 'get_operator_class' from 'airflow.providers.openlineage.utils' (/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/__init__.py) -Traceback (most recent call last): - File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/utils.py", line 399, in wrapper - return f(**args, ****kwargs) - ^^^^^^^^^^^^^^^^^^ - File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/plugins/listener.py", line 93, in on_running - ****get_custom_facets(task_instance), - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/utils.py", line 148, in get_custom_facets - custom_facets["airflow_mappedTask"] = AirflowMappedTaskRunFacet.from_task_instance(task_instance) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/plugins/facets.py", line 36, in from_task_instance - from airflow.providers.openlineage.utils import get_operator_class -ImportError: cannot import name 'get_operator_class' from 'airflow.providers.openlineage.utils' (/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/__init__.py)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-05 09:05:34
-
-

*Thread Reply:* I only have a few custom operators with the on_complete facet so I think this is a built in one - it runs before my task custom logs for example

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-05 09:06:05
-
-

*Thread Reply:* and any time I messed up my custom facet, the error would be at the bottom of the logs. this is on top, probably an on_start facet?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-05 09:16:32
-
-

*Thread Reply:* seems like some circular import

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-05 09:19:47
-
-

*Thread Reply:* I just tested it manually, it’s a bug in OL provider. let me fix that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-05 10:53:28
-
-

*Thread Reply:* cool, thanks. I am glad it is just a bug, I was afraid dynamic tasks were not supported for a minute there

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-07 11:46:20
-
-

*Thread Reply:* how do the provider updates work? they can be released in between Airflow releases and issues for them are raised on the main Airflow repo?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-07 11:50:07
-
-

*Thread Reply:* generally speaking anything related to OL-Airflow should be placed to Airflow repo, important changes/bug fixes would be implemented in OL repo as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-07 15:40:31
-
-

*Thread Reply:* got it, thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-07 19:43:46
-
-

*Thread Reply:* is there a way for me to install the openlineage provider based on the commit you made to fix the circular imports?

- -

i was going to try to install from Airflow main branch but didnt want to mess anything up

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-07 19:44:39
-
-

*Thread Reply:* I saw it was merged to airflow main but it is not in 2.7.1 and there is no 1.0.3 provider version yet, so I wondered if I could manually install it for the time being

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-08 05:45:48
-
-

*Thread Reply:* https://github.com/apache/airflow/blob/main/BREEZE.rst#preparing-provider-packages -building the provider package on your own could be best idea probably? that depends on how you manage your Airflow instance

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-08 12:01:53
-
-

*Thread Reply:* there's 1.1.0rc1 btw

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-08 13:44:44
-
-

*Thread Reply:* perfect, thanks. I got started with breeze but then stopped haha

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-10 20:29:00
-
-

*Thread Reply:* The dynamic task mapping error is gone, I did run into this:

- -

File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/extractors/base.py", line 70, in disabledoperators - operator.strip() for operator in conf.get("openlineage", "disabledfor_operators").split(";") - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/airflow/.local/lib/python3.11/site-packages/airflow/configuration.py", line 1065, in get - raise AirflowConfigException(f"section/key [{section}/{key}] not found in config")

- -

I am redeploying now with that option added to my config. I guess it did not use the default which should be ""

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-10 20:49:17
-
-

*Thread Reply:* added "disabledforoperators" to my openlineage config and it worked (using Airflow helm chart - not sure if that means there is an error because the value I provided should just be the default value, not sure why I needed to explicitly specify it)

- -

openlineage: - disabledforoperators: "" - ...

- -

this is so much better and makes a lot more sense. most of my tasks are dynamic so I was missing a lot of metadata before the fix, thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-09-06 16:43:07
-
-

Hello Everyone,

- -

I've been diving into the Marquez codebase and found a performance bottleneck in JobDao.java for the query related to namespaceName=MyNameSpace limit=10 and 12s with limit=25. I managed to optimize it using CTEs, and the execution times dropped dramatically to 300ms (for limit=100) and under 100ms (for limit=25 ) on the same cluster. -Issue link : https://github.com/MarquezProject/marquez/issues/2608

- -

I believe there's even more room for optimization, especially if we adjust the job_facets_view to include the namespace_name column.

- -

Would the team be open to a PR where I share the optimized query and discuss potential further refinements? I believe these changes could significantly enhance the Marquez web UI experience.

- -

PR link : https://github.com/MarquezProject/marquez/pull/2609

- -

Looking forward to your feedback.

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
- 🔥 Jakub Dardziński, Harel Shein, Paweł Leszczyński, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-06 18:03:01
-
-

*Thread Reply:* @Willy Lulciuc wdyt?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-09-06 17:44:12
-
-

Has there been any conversation on the extensibility of facets/concepts? E.g.: -• how does one extends the list of run states https://openlineage.io/docs/spec/run-cycle to add a paused/resumed state? -• how does one extend https://openlineage.io/docs/spec/facets/run-facets/nominal_time to add a created at field?

-
-
openlineage.io
- - - - - - - - - - - - - - - -
-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-09-06 18:28:17
-
-

*Thread Reply:* Hello Bernat,

- -

The primary mechanism to extend the model is through facets. You can either: -• create new standard facets in the spec: https://github.com/OpenLineage/OpenLineage/tree/main/spec/facets -• create custom facets defined somewhere else with a prefix in their name: https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md#custom-facet-naming -• Update existing facets with a backward compatible change (example: adding an optional field). -The core spec can also be modified. Here is an example of adding a state -That being said I think more granular states like pause/resume are probably better suited in a run facet. There was an issue opened for that particular one a while ago: https://github.com/OpenLineage/OpenLineage/issues/9 maybe that particular discussion can continue there.

- -

For the nominal time facet, You could open an issue describing the use case and on community agreement follow up with a PR on the facet itself: https://github.com/OpenLineage/OpenLineage/blob/main/spec/facets/NominalTimeRunFacet.json -(adding an optional field is backwards compatible)

- - - -
- 👀 Juan Luis Cano Rodríguez -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-09-06 18:31:12
-
-

*Thread Reply:* I see, so in general one is best copying a standard facet and maintain it under a different name. That way can be made mandatory 🙂 and one does not need to be blocked for a long time until there's a community agreement 🤔

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-09-06 18:35:43
-
-

*Thread Reply:* Yes, The goal of custom facets is to allow you to experiment and extend the spec however you want without having to wait for approval. -If the custom facet is very specific to a third party project/product then it makes sense for it to stay a custom facet. -If it is more generic then it makes sense to add it to the core facets as part of the spec. -Hopefully community agreement can be achieved relatively quickly. Unless someone is strongly against something, it can be added without too much red tape. Typically with support in at least one of the integrations to validate the model.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-07 15:12:20
-
-

@channel -This month’s TSC meeting is next Thursday the 14th at 10am PT. On the tentative agenda: -• announcements -• recent releases -• demo: Spark integration tests in Databricks runtime -• open discussion -• more (TBA) -More info and the meeting link can be found on the website. All are welcome! Also, feel free to reply or DM me with discussion topics, agenda items, etc.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-11 10:07:41
-
-

@channel -The first Toronto OpenLineage Meetup, featuring a presentation by recent adopter Metaphor, is just one week away. On the agenda:

- -
  1. Evolution of spec presentation/discussion (project background/history)
  2. State of the community
  3. Integrating OpenLineage with Metaphor (by special guests Ye & Ivan)
  4. Spark/Column lineage update
  5. Airflow Provider update
  6. Roadmap Discussion -Find more details and RSVP https://www.meetup.com/openlineage/events/295488014/?utmmedium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|here.
  7. -
-
-
metaphor.io
- - - - - - - - - - - - - - - - - -
-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Mars Lan, Jarek Potiuk, Harel Shein, Maciej Obuchowski, Peter Hicks, Paweł Leszczyński, Dongjin Seo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-09-11 17:07:26
-
-

I’m seeing some odd behavior with my http transport when upgrading airflow/openlineage-airflow from 2.3.2 -> 2.6.3 and 0.24.0 -> 0.28.0. Previously I had a config like this that let me provide my own auth tokens. However, after upgrading I’m getting a 401 from the endpoint and further debugging seems to reveal that we’re not using the token provided in my TokenProvider. Does anyone know if something changed between these versions that could be causing this? (more details in 🧵 ) -transport: - type: http - url: <https://my.fake-marquez-endpoint.com> - auth: - type: some.fully.qualified.classpath

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-09-11 17:09:40
-
-

*Thread Reply:* If I log this line I can tell the TokenProvider is the class instance I would expect: https://github.com/OpenLineage/OpenLineage/blob/45d94fb73b5488d34b8ca544b58317382ceb3980/client/python/openlineage/client/transport/http.py#L55

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-09-11 17:11:14
-
-

*Thread Reply:* However, if I log the token_provider here I get the origin TokenProvider: https://github.com/OpenLineage/OpenLineage/blob/45d94fb73b5488d34b8ca544b58317382ceb3980/client/python/openlineage/client/transport/http.py#L154

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-09-11 17:18:56
-
-

*Thread Reply:* Ah I think I see the issue. Looks like this was introduced here, we are instantiating with the base token provider here when we should be using the subclass: https://github.com/OpenLineage/OpenLineage/pull/1869/files#diff-2f8ea6f9a22b5567de8ab56c6a63da8e7adf40cb436ee5e7e6b16e70a82afe05R57

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-09-11 17:37:42
-
-

*Thread Reply:* Opened a PR for this here: https://github.com/OpenLineage/OpenLineage/pull/2100

-
- - - - - - - -
-
Labels
- client/python -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- ❤️ Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sarwat Fatima - (sarwatfatimam@gmail.com) -
-
2023-09-12 08:14:06
-
-

This particular code in docker-compose exits with code 1 because it is unable to find wait-for-it.sh, file in the container. I have checked the mounting path from the local machine, It is correct and the path on the container for Marquez is also correct i.e. /usr/src/app but it is unable to mount the wait-for-it.sh. Does anyone know why is this? This code exists in the open lineage repository as well https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/docker-compose.yml -# Marquez as an OpenLineage Client - api: - image: marquezproject/marquez - container_name: marquez-api - ports: - - "5000:5000" - - "5001:5001" - volumes: - - ./docker/wait-for-it.sh:/usr/src/app/wait-for-it.sh - links: - - "db:postgres" - depends_on: - - db - entrypoint: [ "./wait-for-it.sh", "db:5432", "--", "./entrypoint.sh" ]

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sarwat Fatima - (sarwatfatimam@gmail.com) -
-
2023-09-12 08:15:19
-
-

*Thread Reply:* This is the error message:

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-12 10:38:41
-
-

*Thread Reply:* no permissions?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-12 15:11:45
-
-

I am trying to run Google Cloud Composer where i have added the openlineage-airflow pypi packagae as a dependency and have added the env OPENLINEAGEEXTRACTORS to point to my custom extractor. I have added a folder by name dependencies and inside that i have placed my extractor file, and the path given to OPENLINEAGEEXTRACTORS is dependencies.<filename>.<extractorclass_name>…still it fails with the exception saying No module named ‘dependencies’. Can anyone kindly help me out on correcting my mistake

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-12 17:15:36
-
-

*Thread Reply:* Hey @Guntaka Jeevan Paul, can you share some details on which versions of airflow and openlineage you’re using?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-12 17:16:26
-
-

*Thread Reply:* airflow ---> 2.5.3, openlinegae-airflow ---> 1.1.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-12 17:45:08
-
-

*Thread Reply:* ```import traceback -import uuid -from typing import List, Optional

- -

from openlineage.airflow.extractors.base import BaseExtractor, TaskMetadata -from openlineage.airflow.utils import getjobname

- -

class BigQueryInsertJobExtractor(BaseExtractor): - def init(self, operator): - super().init(operator)

- -
@classmethod
-def get_operator_classnames(cls) -&gt; List[str]:
-    return ['BigQueryInsertJobOperator']
-
-def extract(self) -&gt; Optional[TaskMetadata]:
-    return None
-
-def extract_on_complete(self, task_instance) -&gt; Optional[TaskMetadata]:
-    self.log.debug(f"JEEVAN ---&gt; extract_on_complete({task_instance})")
-    random_uuid = str(uuid.uuid4())
-    self.log.debug(f"JEEVAN ---&gt; Randomly Generated UUID --&gt; {random_uuid}")
-
-    self.operator.job_id = random_uuid
-
-    return TaskMetadata(
-        name=get_job_name(task=self.operator)
-    )```
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-12 17:45:24
-
-

*Thread Reply:* this is the custom extractor code that im trying with

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-12 21:10:02
-
-

*Thread Reply:* thanks @Guntaka Jeevan Paul, will try to take a deeper look tomorrow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 07:54:26
-
-

*Thread Reply:* No module named 'dependencies'. -This sounds like general Python problem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 07:55:12
-
-

*Thread Reply:* https://stackoverflow.com/questions/69991553/how-to-import-custom-modules-in-cloud-composer

-
-
Stack Overflow
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 07:56:28
-
-

*Thread Reply:* basically, if you're able to import the file from your dag code, OL should be able too

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:01:12
-
-

*Thread Reply:* The Problem is in the GCS Composer there is a component called Triggerer, which they say is used for deferrable operators…i have logged into that pod and i could see that the GCS Bucket is not mounted on this, but i am unable to understand why is the initialisation happening inside the triggerer pod

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:01:32
-
-

*Thread Reply:*

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:01:47
-
-

*Thread Reply:* > The Problem is in the GCS Composer there is a component called Triggerer, which they say is used for deferrable operators…i have logged into that pod and i could see that the GCS Bucket is not mounted on this, but i am unable to understand why is the initialisation happening inside the triggerer pod -OL integration is not running on triggerer, only on worker and scheduler pods

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:01:53
-
-

*Thread Reply:*

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:03:26
-
-

*Thread Reply:* As you can see in this screenshot i am seeing the logs of the triggerer and it says clearly unable to import plugin openlineage

- - - -
-
-
-
- - - - - -
-
- - - - - -
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:10:32
-
-

*Thread Reply:* I see. There are few possible things to do here - composer could mount the user files, Airflow could not start plugins on triggerer, or we could detect we're on triggerer and not import anything there. However, does it impact OL or Airflow operation in other way than this log?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:12:06
-
-

*Thread Reply:* Probably we'd have to do something if that really bothers you as there won't be further changes to Airflow 2.5

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:18:14
-
-

*Thread Reply:* The Problem is it is actually not registering this custom extractor written by me, henceforth i am just receiving the DefaultExtractor things and my piece of extractor code is not even getting triggered

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:22:49
-
-

*Thread Reply:* any suggestions to try @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:27:48
-
-

*Thread Reply:* Could you share worker logs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:27:56
-
-

*Thread Reply:* and check if module is importable from your dag code?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:31:25
-
-

*Thread Reply:* these are the worker pod logs…where there is no log of openlineageplugin

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:31:52
-
-

*Thread Reply:* https://openlineage.slack.com/archives/C01CK9T7HKR/p1694608076879469?thread_ts=1694545905.974339&cid=C01CK9T7HKR --> sure will check now on this one

-
- - -
- - - } - - Maciej Obuchowski - (https://openlineage.slack.com/team/U01RA9B5GG2) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:38:32
-
-

*Thread Reply:* { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py\", line 427, in import_from_string module = importlib.import_module(module_path) File \"/opt/python3.8/lib/python3.8/importlib/__init__.py\", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File \"&lt;frozen importlib._bootstrap&gt;\", line 1014, in _gcd_import File \"&lt;frozen importlib._bootstrap&gt;\", line 991, in _find_and_load File \"&lt;frozen importlib._bootstrap&gt;\", line 961, in _find_and_load_unlocked File \"&lt;frozen importlib._bootstrap&gt;\", line 219, in _call_with_frames_removed File \"&lt;frozen importlib._bootstrap&gt;\", line 1014, in _gcd_import File \"&lt;frozen importlib._bootstrap&gt;\", line 991, in _find_and_load File \"&lt;frozen importlib._bootstrap&gt;\", line 961, in _find_and_load_unlocked File \"&lt;frozen importlib._bootstrap&gt;\", line 219, in _call_with_frames_removed File \"&lt;frozen importlib._bootstrap&gt;\", line 1014, in _gcd_import File \"&lt;frozen importlib._bootstrap&gt;\", line 991, in _find_and_load File \"&lt;frozen importlib._bootstrap&gt;\", line 973, in _find_and_load_unlockedModuleNotFoundError: No module named 'airflow.gcs'", - "insertId": "pt2eu6fl9z5vw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:44.131577764Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, -it doesn't see No module named 'airflow.gcs' that is part of your extractor path airflow.gcs.dags.big_query_insert_job_extractor.BigQueryInsertJobExtractor -however, is it necessary? I generally see people using imports directly from dags folder

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:44:11
-
-

*Thread Reply:* this is one of the experimentation that i have did, but then i reverted it back to keeping it to dependencies.bigqueryinsertjobextractor.BigQueryInsertJobExtractor…where dependencies is a module i have created inside my dags folder

- - - - -
-
-
-
- - - - - -
-
- - - - - -
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:45:46
-
-

*Thread Reply:* these are the logs of the triggerer pod specifically

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:46:31
-
-

*Thread Reply:* yeah it would be expected to have this in triggerer where it's not mounted, but will it behave the same for worker where it's mounted?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:47:09
-
-

*Thread Reply:* maybe ___init___.py is missing for top-level dag path?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:49:01
-
-

*Thread Reply:* these are the logs of the worker pod at startup, where it does not complain of the plugin like in triggerer, but when tasks are run on this worker…somehow it is not picking up the extractor for the operator that i have written it for

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:49:54
-
-

*Thread Reply:* https://openlineage.slack.com/archives/C01CK9T7HKR/p1694609229577469?thread_ts=1694545905.974339&cid=C01CK9T7HKR --> you mean to make the dags folder as well like a module by adding the init.py?

-
- - -
- - - } - - Maciej Obuchowski - (https://openlineage.slack.com/team/U01RA9B5GG2) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:55:24
-
-

*Thread Reply:* yes, I would put whole custom code directly in dags folder, to make sure import paths are the problem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:55:48
-
-

*Thread Reply:* and would be nice if you could set -AIRFLOW__LOGGING__LOGGING_LEVEL="DEBUG"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 09:14:58
-
-

*Thread Reply:* ```Starting the process, got command: triggerer -Initializing airflow.cfg. -airflow.cfg initialization is done. -[2023-09-13T13:11:46.620+0000] {settings.py:267} DEBUG - Setting up DB connection pool (PID 8) -[2023-09-13T13:11:46.622+0000] {settings.py:372} DEBUG - settings.prepareengineargs(): Using pool settings. poolsize=5, maxoverflow=10, poolrecycle=570, pid=8 -[2023-09-13T13:11:46.742+0000] {cliactionloggers.py:39} DEBUG - Adding <function defaultactionlog at 0x7ff39ca1d3a0> to pre execution callback -[2023-09-13T13:11:47.638+0000] {cliactionloggers.py:65} DEBUG - Calling callbacks: [<function defaultactionlog at 0x7ff39ca1d3a0>] - __ ___ - _ |( )__ / /_ _ -_ /| |_ / / /_ _ / _ _ | /| / / -_ | / _ / _ _/ _ / / // /_ |/ |/ / - // |// // // // _/_/|/ -[2023-09-13T13:11:50.527+0000] {pluginsmanager.py:300} DEBUG - Loading plugins -[2023-09-13T13:11:50.580+0000] {pluginsmanager.py:244} DEBUG - Loading plugins from directory: /home/airflow/gcs/plugins -[2023-09-13T13:11:50.581+0000] {pluginsmanager.py:224} DEBUG - Loading plugins from entrypoints -[2023-09-13T13:11:50.587+0000] {pluginsmanager.py:227} DEBUG - Importing entrypoint plugin OpenLineagePlugin -[2023-09-13T13:11:50.740+0000] {utils.py:430} WARNING - No module named 'boto3' -[2023-09-13T13:11:50.743+0000] {utils.py:430} WARNING - No module named 'botocore' -[2023-09-13T13:11:50.833+0000] {utils.py:430} WARNING - No module named 'airflow.providers.sftp' -[2023-09-13T13:11:51.144+0000] {utils.py:430} WARNING - No module named 'bigqueryinsertjobextractor' -[2023-09-13T13:11:51.145+0000] {pluginsmanager.py:237} ERROR - Failed to import plugin OpenLineagePlugin -Traceback (most recent call last): - File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py", line 427, in importfromstring - module = importlib.importmodule(modulepath) - File "/opt/python3.8/lib/python3.8/importlib/init.py", line 127, in importmodule - return bootstrap.gcdimport(name[level:], package, level) - File "<frozen importlib.bootstrap>", line 1014, in gcdimport - File "<frozen importlib.bootstrap>", line 991, in _findandload - File "<frozen importlib.bootstrap>", line 973, in findandloadunlocked -ModuleNotFoundError: No module named 'bigqueryinsertjobextractor'

- -

The above exception was the direct cause of the following exception:

- -

Traceback (most recent call last): - File "/opt/python3.8/lib/python3.8/site-packages/airflow/pluginsmanager.py", line 229, in loadentrypointplugins - pluginclass = entrypoint.load() - File "/opt/python3.8/lib/python3.8/site-packages/setuptools/vendor/importlibmetadata/init.py", line 194, in load - module = importmodule(match.group('module')) - File "/opt/python3.8/lib/python3.8/importlib/init.py", line 127, in importmodule - return _bootstrap.gcdimport(name[level:], package, level) - File "<frozen importlib.bootstrap>", line 1014, in gcdimport - File "<frozen importlib.bootstrap>", line 991, in _findandload - File "<frozen importlib.bootstrap>", line 975, in findandloadunlocked - File "<frozen importlib.bootstrap>", line 671, in _loadunlocked - File "<frozen importlib.bootstrapexternal>", line 843, in execmodule - File "<frozen importlib.bootstrap>", line 219, in callwithframesremoved - File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/plugin.py", line 32, in <module> - from openlineage.airflow import listener - File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/listener.py", line 75, in <module> - extractormanager = ExtractorManager() - File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/extractors/manager.py", line 16, in init - self.tasktoextractor = Extractors() - File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/extractors/extractors.py", line 122, in init - extractor = importfromstring(extractor.strip()) - File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py", line 431, in importfromstring - raise ImportError(f"Failed to import {path}") from e -ImportError: Failed to import bigqueryinsertjobextractor.BigQueryInsertJobExtractor -[2023-09-13T13:11:51.235+0000] {pluginsmanager.py:227} DEBUG - Importing entrypoint plugin composermenuplugin -[2023-09-13T13:11:51.719+0000] {pluginsmanager.py:316} DEBUG - Loading 1 plugin(s) took 1.14 seconds -[2023-09-13T13:11:51.733+0000] {triggererjob.py:101} INFO - Starting the triggerer -[2023-09-13T13:11:51.734+0000] {selectorevents.py:59} DEBUG - Using selector: EpollSelector -[2023-09-13T13:11:56.118+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:01.359+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:06.665+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:11.880+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:17.098+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:22.323+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:27.597+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:32.826+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:38.049+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:43.275+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:48.509+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:53.867+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:59.087+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:04.300+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:09.539+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:14.785+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:20.007+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:25.274+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:30.510+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:35.729+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:40.960+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:46.444+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:51.751+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:57.084+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:02.310+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:07.535+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:12.754+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:17.967+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:23.185+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:28.406+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:33.661+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:38.883+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:44.247+0000] {base_job.py:240} DEBUG - [heartbeat]```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 09:15:10
-
-

*Thread Reply:* still the same error in the triggerer pod

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 09:16:23
-
-

*Thread Reply:* have changed the dags folder where i have added the init file as you suggested and then have updated the OPENLINEAGEEXTRACTORS to bigqueryinsertjob_extractor.BigQueryInsertJobExtractor…still the same thing

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 09:36:27
-
-

*Thread Reply:* > still the same error in the triggerer pod -it won't change, we're not trying to fix the triggerer import but worker, and should look only at worker pod at this point

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 09:43:34
-
-

*Thread Reply:* ```extractor for <class 'airflow.providers.google.cloud.operators.bigquery.BigQueryInsertJobOperator'> is <class 'bigqueryinsertjobextractor.BigQueryInsertJobExtractor'

- -

Using extractor BigQueryInsertJobExtractor tasktype=BigQueryInsertJobOperator airflowdagid=dataanalyticsdag taskid=joinbqdatasets.bqjoinholidaysweatherdata2021 airflowrunid=manual_2023-09-13T13:24:08.946947+00:00

- -

fatal: not a git repository (or any parent up to mount point /home/airflow) -Stopping at filesystem boundary (GITDISCOVERYACROSSFILESYSTEM not set). -fatal: not a git repository (or any parent up to mount point /home/airflow) -Stopping at filesystem boundary (GITDISCOVERYACROSSFILESYSTEM not set).```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 09:44:44
-
-

*Thread Reply:* able to see these logs in the worker pod…so what you said is right that it is able to get the extractor but i get the below error immediately where it says not a git repository

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 09:45:24
-
-

*Thread Reply:* seems like we are almost there nearby…am i missing something obvious

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 10:06:35
-
-

*Thread Reply:* > fatal: not a git repository (or any parent up to mount point /home/airflow) -&gt; Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). -&gt; fatal: not a git repository (or any parent up to mount point /home/airflow) -&gt; Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). -hm, this could be the actual bug?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:06:51
-
-

*Thread Reply:* that’s casual log in composer

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:12:16
-
-

*Thread Reply:* extractor for &lt;class 'airflow.providers.google.cloud.operators.bigquery.BigQueryInsertJobOperator'&gt; is &lt;class 'big_query_insert_job_extractor.BigQueryInsertJobExtractor' -that’s actually class from your custom module, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:14:03
-
-

*Thread Reply:* I’ve done experiment, that’s how gcs looks like

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:14:09
-
-

*Thread Reply:* and env vars

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:14:19
-
-

*Thread Reply:* I have this extractor detected as expected

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:15:06
-
-

*Thread Reply:* seens as &lt;class 'dependencies.bq.BigQueryInsertJobExtractor'&gt;

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:16:02
-
-

*Thread Reply:* no __init__.py in base dags folder

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:17:02
-
-

*Thread Reply:* I also checked that triggerer pod indeed has no gcsfuse set up, tbh no idea why, maybe some kind of optimization -the only effect is that when loading plugins in triggerer it throws some errors in logs, we don’t do anything at the moment there

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 10:19:26
-
-

*Thread Reply:* okk…got it @Jakub Dardziński…so the init at the top level of dags is as well not reqd, got it. Just one more doubt, there is a requirement where i want to change the operators property in the extractor inside the extract function, will that be taken into account and the operator’s execute be called with the property that i have populated in my extractor?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 10:21:28
-
-

*Thread Reply:* for example i want to add a custom jobid to the BigQueryInsertJobOperator, so wheneerv someone uses the BigQueryInsertJobOperator operator i want to intercept that and add this jobid property to the operator…will that work?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:24:46
-
-

*Thread Reply:* I’m not sure if using OL for such thing is best choice. Wouldn’t it be better to subclass the operator?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:25:37
-
-

*Thread Reply:* but the answer is: it dependes on the airflow version, in 2.3+ I’m pretty sure the changed property stays in execute method

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 10:27:49
-
-

*Thread Reply:* yeah ideally that is how we should have done this but the problem is our client is having around 1000+ Dag’s in different google cloud projects, which are owned by multiple teams…so they are not willing to change anything in their dag. Thankfully they are using airflow 2.4.3

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 10:31:15
-
-

*Thread Reply:* task_policy might be better tool for that: https://airflow.apache.org/docs/apache-airflow/2.6.0/administration-and-deployment/cluster-policies.html

- - - -
- ➕ Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:35:30
-
-

*Thread Reply:* btw I double-checked - execute method is in different process so this would not change task’s attribute there

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-16 03:32:49
-
-

*Thread Reply:* @Jakub Dardziński any idea how can we achieve this one. ---> https://openlineage.slack.com/archives/C01CK9T7HKR/p1694849427228709

-
- - -
- - - } - - Guntaka Jeevan Paul - (https://openlineage.slack.com/team/U05QL7LN2GH) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-12 17:26:01
-
-

@here has anyone succeded in getting a custom extractor to work in GCP Cloud Composer or AWS MWAA, seems like there is no way

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-09-12 17:34:29
-
-

*Thread Reply:* I'm getting quite close with MWAA. See https://openlineage.slack.com/archives/C01CK9T7HKR/p1692743745585879.

-
- - -
- - - } - - Mars Lan - (https://openlineage.slack.com/team/U01HVNU6A4C) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 01:44:27
-
-

I am exploring Spark - OpenLineage integration (using the latest PySpark and OL versions). I tested a simple pipeline which: -• Reads JSON data into PySpark DataFrame -• Apply data transformations -• Write transformed data to MySQL database -Observed that we receive 4 events (2 START and 2 COMPLETE) for the same job name. The events are almost identical with a small diff in the facets. All the events share the same runId, and we don't get any parentRunId. -Team, can you please confirm if this behaviour is expected? Seems to be different from the Airflow integration where we relate jobs to Parent Jobs.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-09-13 02:54:37
-
-

*Thread Reply:* The Spark integration requires that two parameters are passed to it, namely:

- -

spark.openlineage.parentJobName -spark.openlineage.parentRunId -You can find the list of parameters here:

- -

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/README.md

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 02:55:51
-
-

*Thread Reply:* Thanks, will check this out

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-09-13 02:57:43
-
-

*Thread Reply:* As for double accounting of events - that's a bit harder to diagnose.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 04:33:03
-
-

*Thread Reply:* Can you share the the job and events? -Also @Paweł Leszczyński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 06:03:49
-
-

*Thread Reply:* Sure, sharing Job and events.

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 06:06:21
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-13 06:39:02
-
-

*Thread Reply:* Hi @Suraj Gupta,

- -

Thanks for providing such a detailed description of the problem.

- -

It is not expected behaviour, it's an issue. The events correspond to the same logical plan which for some reason lead to sending two OL events. Is it reproducible aka. does it occur each time? If yes, we please feel free to raise an issue for that.

- -

We have added in recent months several tests to verify amount of OL events being generated but we haven't tested it that way with JDBC. BTW. will the same happen if you write your data df_transformed to a file (like parquet file) ?

- - - -
- :gratitude_thank_you: Suraj Gupta -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 07:28:03
-
-

*Thread Reply:* Thanks @Paweł Leszczyński, will confirm about writing to file and get back.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 07:33:35
-
-

*Thread Reply:* And yes, the issue is reproducible. Will raise an issue for this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-13 07:33:54
-
-

*Thread Reply:* even if you write onto a file?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 07:37:21
-
-

*Thread Reply:* Yes, even when I write to a parquet file.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-13 07:49:28
-
-

*Thread Reply:* ok. i think i was able to reproduce it locally with https://github.com/OpenLineage/OpenLineage/pull/2103/files

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 07:56:11
-
-

*Thread Reply:* Opened an issue: https://github.com/OpenLineage/OpenLineage/issues/2104

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-25 16:32:09
-
-

*Thread Reply:* @Paweł Leszczyński I see that the PR is work in progress. Any rough estimate on when we can expect this fix to be released?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-26 03:32:03
-
-

*Thread Reply:* @Suraj Gupta put a comment within your issue. it's a bug we need to solve but I cannot bring any estimates today.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-26 04:33:03
-
-

*Thread Reply:* Thanks for update @Paweł Leszczyński, also please look into this comment. It might related and I'm not sure if expected behaviour.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-13 14:20:32
-
-

@channel -This month’s TSC meeting, open to all, is tomorrow: https://openlineage.slack.com/archives/C01CK9T7HKR/p1694113940400549

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-09-14 06:20:15
-
-

Context:

- -

We use Spark with YARN, running on Hadoop 2.x (I can't remember the exact minor version) with Hive support.

- -

Problem:

- -

I'm noticed that CreateDataSourceAsSelectCommand objects are always transformed to an OutputDataset with a namespace value set to file - which is curious, because the inputs always have a (correct) namespace of hdfs://&lt;name-node&gt; - is this a known issue? A flaw with Apache Spark? A bug in the resolution logic?

- -

For reference:

- -

```public class CreateDataSourceTableCommandVisitor - extends QueryPlanVisitor<CreateDataSourceTableCommand, OpenLineage.OutputDataset> {

- -

public CreateDataSourceTableCommandVisitor(OpenLineageContext context) { - super(context); - }

- -

@Override - public List<OpenLineage.OutputDataset> apply(LogicalPlan x) { - CreateDataSourceTableCommand command = (CreateDataSourceTableCommand) x; - CatalogTable catalogTable = command.table();

- -
return Collections.singletonList(
-    outputDataset()
-        .getDataset(
-            PathUtils.fromCatalogTable(catalogTable),
-            catalogTable.schema(),
-            OpenLineage.LifecycleStateChangeDatasetFacet.LifecycleStateChange.CREATE));
-
- -

} -}`` -Running this:cat events.log | jq '{eventTime: .eventTime, eventType: .eventType, runId: .run.runId, jobNamespace: .job.namespace, jobName: .job.name, outputs: .outputs[] | {namespace: .namespace, name: .name}, inputs: .inputs[] | {namespace: .namespace, name: .name}}'`

- -

This is an output: -{ - "eventTime": "2023-09-13T16:01:27.059Z", - "eventType": "START", - "runId": "bbbb5763-3615-46c0-95ca-1fc398c91d5d", - "jobNamespace": "spark.cluster-1", - "jobName": "ol_hadoop_test.execute_create_data_source_table_as_select_command.dhawes_db_ol_test_hadoop_tgt", - "outputs": { - "namespace": "file", - "name": "/user/hive/warehouse/dhawes.db/ol_test_hadoop_tgt" - }, - "inputs": { - "namespace": "<hdfs://nn1>", - "name": "/user/hive/warehouse/dhawes.db/ol_test_hadoop_src" - } -}

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-14 07:32:25
-
-

*Thread Reply:* Seems like an issue on our side. Do you know how the source is read? What LogicalPlan leaf is used to read src? Would love to find how is this done differently

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-09-14 09:16:58
-
-

*Thread Reply:* Hmm, I'll have to do explain plan to see what exactly it is.

- -

However my sample job uses spark.sql("SELECT ** FROM dhawes.ol_test_hadoop_src")

- -

which itself is created using

- -

spark.sql("SELECT 1 AS id").write.format("orc").mode("overwrite").saveAsTable("dhawes.ol_test_hadoop_src")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-09-14 09:23:59
-
-

*Thread Reply:* ``&gt;&gt;&gt; spark.sql("SELECT ** FROM dhawes.ol_test_hadoop_src").explain(True) -== Parsed Logical Plan == -'Project [**] -+- 'UnresolvedRelationdhawes.oltesthadoop_src`

- -

== Analyzed Logical Plan == -id: int -Project [id#3] -+- SubqueryAlias dhawes.ol_test_hadoop_src - +- Relation[id#3] orc

- -

== Optimized Logical Plan == -Relation[id#3] orc

- -

== Physical Plan == -**(1) FileScan orc dhawes.oltesthadoop_src[id#3] Batched: true, Format: ORC, Location: InMemoryFileIndex[], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:int>```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
tati - (tatiana.alchueyr@astronomer.io) -
-
2023-09-14 10:03:41
-
-

Hey everyone, -Any chance we could have a openlineage-integration-common 1.1.1 release with the following changes..? -• https://github.com/OpenLineage/OpenLineage/pull/2106 -• https://github.com/OpenLineage/OpenLineage/pull/2108

- - - -
- ➕ Michael Robinson, Harel Shein, Maciej Obuchowski, Jakub Dardziński, Paweł Leszczyński, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
tati - (tatiana.alchueyr@astronomer.io) -
-
2023-09-14 10:05:19
-
-

*Thread Reply:* Specially the first PR is affecting users of the astronomer-cosmos library: https://github.com/astronomer/astronomer-cosmos/issues/533

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-14 10:05:24
-
-

*Thread Reply:* Thanks @tati for requesting your first OpenLineage release! Three +1s from committers will authorize

- - - -
- :gratitude_thank_you: tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-14 11:59:55
-
-

*Thread Reply:* The release is authorized and will be initiated within two business days.

- - - -
- 🎉 tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
tati - (tatiana.alchueyr@astronomer.io) -
-
2023-09-15 04:40:12
-
-

*Thread Reply:* Thanks a lot, @Michael Robinson!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-09-14 20:23:01
-
-

Per discussion in the OpenLineage sync today here is a very early strawman proposal for an OpenLineage registry that producers and consumers could be registered in. -Feedback or alternate proposals welcome -https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit -Once this is sufficiently fleshed out, I’ll create an actual proposal on github

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-10-03 20:33:35
-
-

*Thread Reply:* I have cleaned up the registry proposal. -https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit -In particular: -• I clarified that option 2 is preferred at this point. -• I moved discussion notes to the bottom. they will go away at some point -• Once it is stable, I’ll create a proposal with the preferred option. -• we need a good proposal for the core facets prefix. My suggestion is to move core facets to core in the registry. The drawback is prefix would be inconsistent.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-10-05 17:34:12
-
-

*Thread Reply:* I have created a ticket to make this easier to find. Once I get more feedback I’ll turn it into a md file in the repo: https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit#heading=h.enpbmvu7n8gu -https://github.com/OpenLineage/OpenLineage/issues/2161

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-15 12:03:27
-
-

@channel -Friendly reminder: the next OpenLineage meetup, our first in Toronto, is happening this coming Monday at 5 PM ET https://openlineage.slack.com/archives/C01CK9T7HKR/p1694441261486759

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-16 03:30:27
-
-

@here we have dataproc operator getting called from a dag which submits a spark job, we wanted to maintain that continuity of parent job in the spark job so according to the documentation we can acheive that by using a macro called lineagerunid that requires task and taskinstance as the parameters. The problem we are facing is that our client’s have 1000's of dags, so asking them to change this everywhere it is used is not feasible, so we thought of using the taskpolicy feature in the airflow…but the problem is that taskpolicy gives you access to only the task/operator, but we don’t have the access to the task instance..that is required as a parameter to the lineagerun_id function. Can anyone kindly help us on how should we go about this one -t1 = DataProcPySparkOperator( - task_id=job_name, - <b>#required</b> pyspark configuration, - job_name=job_name, - dataproc_pyspark_properties={ - 'spark.driver.extraJavaOptions': - f"-javaagent:{jar}={os.environ.get('OPENLINEAGE_URL')}/api/v1/namespaces/{os.getenv('OPENLINEAGE_NAMESPACE', 'default')}/jobs/{job_name}/runs/{{{{macros.OpenLineagePlugin.lineage_run_id(task, task_instance)}}}}?api_key={os.environ.get('OPENLINEAGE_API_KEY')}" - dag=dag)

- - - -
- ➕ Abdallah -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-16 04:22:47
-
-

*Thread Reply:* you don't need actual task instance to do that. you only should set additional argument as jinja template, same as above

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-16 04:25:28
-
-

*Thread Reply:* task_instance in this case is just part of string which is evaluated when jinja render happens

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-16 04:27:10
-
-

*Thread Reply:* ohh…then we could use the same example as above inside the task_policy to intercept the Operator and add the openlineage specific additions properties?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-16 04:30:59
-
-

*Thread Reply:* correct, just remember not to override all properties, just add ol specific

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-16 04:32:02
-
-

*Thread Reply:* yeah sure…thank you so much @Jakub Dardziński, will try this out and keep you posted

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-16 05:00:24
-
-

*Thread Reply:* We want to automate setting those options at some point inside the operator itself

- - - -
- ➕ Guntaka Jeevan Paul -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-16 19:40:27
-
-

@here is there a way by which we could add custom headers to openlineage client in airflow, i see that provision is there for spark integration via these properties spark.openlineage.transport.headers.xyz --> abcdef

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-19 16:40:55
-
-

*Thread Reply:* there’s no out-of-the-box possibility to do that yet, you’re very welcome to create an issue in GitHub and maybe contribute as well! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-09-17 09:07:41
-
-

It doesn't seem like there's a way to override the OL endpoint from the default (/api/v1/lineage) in Airflow? I tried setting the OPENLINEAGE_ENDPOINT environment to no avail. Based on this statement, it seems that only OPENLINEAGE_URL was used to construct HttpConfig ?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-18 16:25:11
-
-

*Thread Reply:* That’s correct. For now there’s no way to configure the endpoint via env var. You can do that by using config file

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-09-18 16:30:39
-
-

*Thread Reply:* How do you do that in Airflow? Any particular reason for excluding endpoint override via env var? Happy to create a PR to fix that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-18 16:52:48
-
-

*Thread Reply:* historical I guess? go for the PR, of course 🚀

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-10-03 08:52:16
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/2151

-
- - - - - - - -
-
Labels
- documentation, client/python -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Terese Larsson - (terese@jclab.se) -
-
2023-09-18 08:22:34
-
-

Hi! I'm in need of help with wrapping my head around OpenLineage. My team have the goal of collecting metadata from the Airflow operators GreatExpectationsOperator, PythonOperator, MsSqlOperator and BashOperator (for dbt). Where can I see the sourcecode for what is collected for each operator, and is there support for these in the new provider apache-airflow-providers-openlineage? I am super confused and feel lost in the docs. 🤯 We are using MSSQL/ODBC to connect to our db, and this data does not seem to appear as datasets in Marquez, do I need to configure this? If so, HOW and WHERE? 🥲

- -

Happy for any help, big or small! 🙏

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-18 16:26:07
-
-

*Thread Reply:* there’s no actual single source of what integrations are currently implemented in openlineage Airflow provider. That’s something we should work on so it’s more visible

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-18 16:26:46
-
-

*Thread Reply:* answering this quickly - GE & MS SQL are not currently implemented yet in the provider

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-18 16:26:58
-
-

*Thread Reply:* but I also invite you to contribute if you’re interested! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sarathch - (sarathch@hpe.com) -
-
2023-09-19 02:47:47
-
-

Hi I need help in extracting OpenLineage for PostgresOperator in json format. -any suggestions or comments would be greatly appreciated

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-19 16:40:06
-
-

*Thread Reply:* If you're using Airflow 2.7, take a look at https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html

- - - -
- ❤️ sarathch -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-19 16:40:54
-
-

*Thread Reply:* If you use one of the lower versions, take a look here https://openlineage.io/docs/integrations/airflow/usage

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sarathch - (sarathch@hpe.com) -
-
2023-09-20 06:26:56
-
-

*Thread Reply:* Maciej, -Thanks for sharing the link https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html -this should address the issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-09-20 09:36:54
-
-

congrats folks 🥳 https://lfaidata.foundation/blog/2023/09/20/lf-ai-data-foundation-announces-graduation-of-openlineage-project

- - - -
- 🎉 Jakub Dardziński, Mars Lan, Ross Turk, Guntaka Jeevan Paul, Peter Hicks, Maciej Obuchowski, Athitya Kumar, John Lukenoff, Harel Shein, Francis McGregor-Macdonald, Laurent Paris -
- -
- 👍 Athitya Kumar -
- -
- ❤️ Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-20 17:08:58
-
-

@channel -We released OpenLineage 1.2.2! -Added -• Spark: publish the ProcessingEngineRunFacet as part of the normal operation of the OpenLineageSparkEventListener #2089 @d-m-h -• Spark: capture and emit spark.databricks.clusterUsageTags.clusterAllTags variable from databricks environment #2099 @Anirudh181001 -Fixed -• Common: support parsing dbt_project.yml without target-path #2106 @tatiana -• Proxy: fix Proxy chart #2091 @harels -• Python: fix serde filtering #2044 @xli-1026 -• Python: use non-deprecated apiKey if loading it from env variables @2029 @mobuchowski -• Spark: Improve RDDs on S3 integration. #2039 @pawel-big-lebowski -• Flink: prevent sending running events after job completes #2075 @pawel-big-lebowski -• Spark & Flink: Unify dataset naming from URI objects #2083 @pawel-big-lebowski -• Spark: Databricks improvements #2076 @pawel-big-lebowski -Removed -• SQL: remove sqlparser dependency from iface-java and iface-py #2090 @JDarDagran -Thanks to all the contributors, including new contributors @tati, @xli-1026, and @d-m-h! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.2.2 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.1.0...1.2.2 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🔥 Maciej Obuchowski, Harel Shein, Anirudh Shrinivason -
- -
- 👍 Guntaka Jeevan Paul, John Rosenbaum, Sangeeta Mishra -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yevhenii Soboliev - (esoboliev@griddynamics.com) -
-
2023-09-22 21:05:20
-
-

*Thread Reply:* Hi @Michael Robinson Thank you! I love the job that you’ve done. If you have a few seconds, please hint at how I can push lineage gathered from Airflow and Spark jobs into DataHub for visualization? I didn’t find any solutions or official support neither at Openlineage nor at DataHub, but I still want to continue using Openlineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-22 21:30:22
-
-

*Thread Reply:* Hi Yevhenii, thank you for using OpenLineage. The DataHub integration is new to us, but perhaps the experts on Spark and Airflow know more. @Paweł Leszczyński @Maciej Obuchowski @Jakub Dardziński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-23 08:11:17
-
-

*Thread Reply:* @Yevhenii Soboliev at Airflow Summit, Shirshanka Das from DataHub mentioned this as upcoming feature.

- - - -
- 👍 Yevhenii Soboliev -
- -
- 🎯 Yevhenii Soboliev -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-21 02:11:10
-
-

Hi, we're using Custom Operators in airflow(2.5) and are planning to expose lineage via default extractors: https://openlineage.io/docs/integrations/airflow/default-extractors/ -Question: Now if we upgrade our Airflow version to 2.7 in the future, would our code be backward compatible? -Since OpenLineage has now moved inside airflow and I think there is no concept of extractors in the latest version.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-21 02:15:00
-
-

*Thread Reply:* Also, do we have any docs on how OL works with the latest airflow version? Few questions: -• How is it replacing the concept of custom extractors and Manually Annotated Lineage in the latest version? -• Do we have any examples of setting up the integration to emit input/output datasets for non supported Operators like PythonOperator?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-27 10:04:09
-
-

*Thread Reply:* > Question: Now if we upgrade our Airflow version to 2.7 in the future, would our code be backward compatible? -It will be compatible, “default extractors” is generally the same concept as we’re using in the 2.7 integration. -One thing that might be good to update is import paths, from openlineage.airflow to airflow.providers.openlineage but should work both ways

- -

> • Do we have any code samples/docs of setting up the integration to emit input/output datasets for non supported Operators like PythonOperator? -Our experience with that is currently lacking - this means, it works like in bare airflow, if you annotate your PythonOperator tasks with old Airflow lineage like in this doc.

- -

We want to make this experience better - by doing few things -• instrumenting hooks, then collecting lineage from them -• integration with AIP-48 datasets -• allowing to emit lineage collected inside Airflow task by other means, by providing core Airflow API for that -All those things require changing core Airflow in a couple of ways: -• tracking which hooks were used during PythonOperator execution -• just being able to emit datasets (airflow inlets/outlets) from inside of a task - they are now a static thing, so if you try that it does not work -• providing better API for emitting that lineage, preferably based on OpenLineage itself rather than us having to convert that later. -As this requires core Airflow changes, it won’t be live until Airflow 2.8 at the earliest.

- -

thanks to @Maciej Obuchowski for this response

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-09-21 18:36:17
-
-

I am using this accelerator that leverages OpenLineage on Databricks to publish lineage info to Purview, but it's using a rather old version of OpenLineage aka 0.18, anybody has tried it on a newer version of OpenLineage? I am facing some issues with the inputs and outputs for the same object is having different json -https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator/

-
- - - - - - - -
-
Stars
- 77 -
- -
-
Language
- C# -
- - - - - - - - -
- - - -
- ✅ Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-09-21 21:51:41
-
-

I installed 1.2.2 on Databricks, followed the below init script: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/databricks/open-lineage-init-script.sh

- -

my cluster config looks like this:

- -

spark.openlineage.version v1 -spark.openlineage.namespace adb-5445974573286168.8#default -spark.openlineage.endpoint v1/lineage -spark.openlineage.url.param.code 8kZl0bo2TJfnbpFxBv-R2v7xBDj-PgWMol3yUm5iP1vaAzFu9kIZGg== -spark.openlineage.url https://f77b-50-35-69-138.ngrok-free.app

- -

But it is not calling the API, it works fine with 0.18 version

-
- - - - - - - - - - - - - - - - -
- - - -
- ✅ Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-09-21 23:16:10
-
-

I am attaching the log4j, there is no openlineagecontext

- -
- - - - - - - -
- - -
- ✅ Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-09-21 23:47:22
-
-

*Thread Reply:* this issue is resolved, solution can be found here: https://openlineage.slack.com/archives/C01CK9T7HKR/p1691592987038929

-
- - -
- - - } - - Zahi Fail - (https://openlineage.slack.com/team/U05KNSP01TR) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-25 08:59:10
-
-

*Thread Reply:* We were all out at Airflow Summit last week, so apologies for the delayed response. Glad you were able to resolve the issue!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sangeeta Mishra - (sangeeta@acceldata.io) -
-
2023-09-25 05:11:50
-
-

@here I'm presently addressing a particular scenario that pertains to Openlineage authentication, specifically involving the use of an access key and secret.

- -

I've implemented a custom token provider called AccessKeySecretKeyTokenProvider, which extends the TokenProvider class. This token provider communicates with another service, obtaining a token and an expiration time based on the provided access key, secret, and client ID.

- -

My goal is to retain this token in a cache prior to its expiration, thereby eliminating the need for network calls to the third-party service. Is it possible without relying on an external caching system.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-25 08:56:53
-
-

*Thread Reply:* Hey @Sangeeta Mishra, I’m not sure that I fully understand your question here. What do you mean by OpenLineage authentication? -What are you using to generate OL events? What’s your OL receiving backend?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sangeeta Mishra - (sangeeta@acceldata.io) -
-
2023-09-25 09:04:33
-
-

*Thread Reply:* Hey @Harel Shein, -I wanted to clarify the previous message. I apologize for any confusion. When I mentioned "OpenLineage authentication," I was actually referring to the authentication process for the OpenLineage backend, specifically using HTTP transport. This involves using my custom token provider, which utilizes access keys and secrets for authentication. The OL backend is http based backend . I hope this clears things up!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-25 09:05:12
-
-

*Thread Reply:* Are you using Marquez?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sangeeta Mishra - (sangeeta@acceldata.io) -
-
2023-09-25 09:05:55
-
-

*Thread Reply:* We are trying to leverage our own backend here.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-25 09:07:03
-
-

*Thread Reply:* I see.. I’m not sure the OpenLineage community could help here. Which webserver framework are you using?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sangeeta Mishra - (sangeeta@acceldata.io) -
-
2023-09-25 09:08:56
-
-

*Thread Reply:* KTOR framework

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sangeeta Mishra - (sangeeta@acceldata.io) -
-
2023-09-25 09:15:33
-
-

*Thread Reply:* Our backend authentication operates based on either a pair of keys or a single bearer token, with a limited time of expiry. Hence, wanted to cache this information inside the token provider.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-25 09:26:57
-
-

*Thread Reply:* I see, I would ask this question here https://ktor.io/support/

-
-
Ktor Framework
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sangeeta Mishra - (sangeeta@acceldata.io) -
-
2023-09-25 10:12:52
-
-

*Thread Reply:* Thank you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-26 04:13:20
-
-

*Thread Reply:* @Sangeeta Mishra which openlineage client are you using: java or python?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sangeeta Mishra - (sangeeta@acceldata.io) -
-
2023-09-26 04:19:53
-
-

*Thread Reply:* @Paweł Leszczyński I am using python client

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-25 13:36:25
-
-

I'm using the Spark OpenLineage integration. In the outputStatistics output dataset facet we receive rowCount and size. -The Job performs a SQL insert into a MySQL table and I'm receiving the size as 0. -{ - "outputStatistics": - { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/OutputStatisticsOutputDatasetFacet.json#/$defs/OutputStatisticsOutputDatasetFacet>", - "rowCount": 1, - "size": 0 - } -} -I'm not sure what the size means here. Does this mean number of bytes inserted/updated? -Also, do we have any documentation for Spark specific Job and Run facets?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-27 09:56:00
-
-

*Thread Reply:* I am not sure it's stated in the doc. Here's the list of spark facets schemas: https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark/shared/facets/spark/v1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-26 00:51:30
-
-

@here In Airflow Integration we send across a lineage Event for Dag start and complete, but that is not the case with spark integration…we don’t receive any event for the application start and complete in spark…is this expected behaviour or am i missing something?

- - - -
- ➕ Suraj Gupta -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-27 09:47:39
-
-

*Thread Reply:* For spark we do send start and complete for each spark action being run (single operation that causes spark processing being run). However, it is difficult for us to know if we're dealing with the last action within spark job or a spark script.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-27 09:49:35
-
-

*Thread Reply:* I think we need to look deeper into that as there is reoccuring need to capture such information

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-27 09:49:57
-
-

*Thread Reply:* and spark listener event has methods like onApplicationStart and onApplicationEnd

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-27 09:50:13
-
-

*Thread Reply:* We are using the SparkListener, which has a function called OnApplicationStart which gets called whenever a spark application starts, so i was thinking why cant we send one at start and simlarly at end as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-27 09:50:33
-
-

*Thread Reply:* additionally, we would like to have a concept of a parent run for a spark job which aggregates all actions run within a single spark job context

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-27 09:51:11
-
-

*Thread Reply:* yeah exactly. the way that it works with airflow integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-27 09:51:26
-
-

*Thread Reply:* we do have an issue for that https://github.com/OpenLineage/OpenLineage/issues/2105

-
- - - - - - - -
-
Labels
- proposal -
- -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-27 09:52:08
-
-

*Thread Reply:* what you can is: come to our monthly Openlineage open meetings and raise that issue and convince the community about its importance

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-27 09:53:32
-
-

*Thread Reply:* yeah sure would love to do that…how can i join them, will that be posted here in this slack channel?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-27 09:54:08
-
-

*Thread Reply:* Hi, you can see the schedule and RSVP here: https://openlineage.io/community

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🙌 Paweł Leszczyński -
- -
- :gratitude_thank_you: Guntaka Jeevan Paul -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-27 11:19:16
-
-

Meetup recap: Toronto Meetup @ Airflow Summit, September 18, 2023 -It was great to see so many members of our community at this event! I counted 32 total attendees, with all but a handful being first-timers. -Topics included: -• Presentation on the history, architecture and roadmap of the project by @Julien Le Dem and @Harel Shein -• Discussion of OpenLineage support in Marquez by @Willy Lulciuc -• Presentation by Ye Liu and Ivan Perepelitca from Metaphor, the social platform for data, about their integration -• Presentation by @Paweł Leszczyński about the Spark integration -• Presentation by @Maciej Obuchowski about the Apache Airflow Provider -Thanks to all the presenters and attendees with a shout out to @Harel Shein for the help with organizing and day-of logistics, @Jakub Dardziński for the help with set up/clean up, and @Sheeri Cabral (Collibra) for the crucial assist with the signup sheet. -This was our first meetup in Toronto, and we learned some valuable lessons about planning events in new cities — the first and foremost being to ask for a pic of the building! 🙂 But it seemed like folks were undeterred, and the space itself lived up to expectations. -For a recording and clips from the meetup, head over to our YouTube channel. -Upcoming events: -• October 5th in San Francisco: Marquez Meetup @ Astronomer (sign up https://www.meetup.com/meetup-group-bnfqymxe/events/295444209/?utmmedium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|here) -• November: Warsaw meetup (details, date TBA) -• January: London meetup (details, date TBA) -Are you interested in hosting or co-hosting an OpenLineage or Marquez meetup? DM me!

-
-
metaphor.io
- - - - - - - - - - - - - - - - - -
-
-
YouTube
- - - - - - - - - - - - - - - - - -
-
-
Meetup
- - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - -
-
- - - - - - - - - -
-
- - - - - - - - - -
-
- - - - - - - - - -
-
- - - - - - - - - -
- - -
- 🙌 Mars Lan, Harel Shein, Paweł Leszczyński -
- -
- ❤️ Jakub Dardziński, Harel Shein, Rodrigo Maia, Paweł Leszczyński, Julien Le Dem, Willy Lulciuc -
- -
- 🚀 Jakub Dardziński, Kevin Languasco -
- -
- 😅 Harel Shein -
- -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-27 11:55:47
-
-

*Thread Reply:* A few more pics:

- -
- - - - - - - - - -
-
- - - - - - - - - -
-
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-09-27 12:23:05
-
-

Hi folks, am I correct in my observations that the Spark integration does not generate inputs and outputs for Kafka-to-Kafka pipelines?

- -

EDIT: Removed the crazy wall of text. Relevant GitHub issue is here.

-
- - - - - - - - - - - - - - - - -
- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-28 02:42:18
-
-

*Thread Reply:* responded within the issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 02:40:40
-
-

Hello community -First time poster - bear with me :)

- -

I am looking to make a minor PR on the airflow integration (fixing github #2130), and the code change is easy enough, but I fail to install the python environment. I have tried the simple ones -OpenLineage/integration/airflow &gt; pip install -e . - or -OpenLineage/integration/airflow &gt; pip install -r dev-requirements.txt -but they both fail on -ERROR: No matching distribution found for openlineage-sql==1.3.0

- -

(which I think is an unreleased version in the git project)

- -

How would I go about to install the requirements?

- -

//Erik

- -

PS. Sorry for posting this in general if there is a specific integration or contribution channel - I didnt find a better channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-28 03:04:48
-
-

*Thread Reply:* Hi @Erik Alfthan, the channel is totally OK. I am not airflow integration expert, but what it looks to me, you're missing openlineage-sql library, which is a rust library used to extract lineage from sql queries. This is how we do that in circle ci: -https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/8080/workflows/aba53369-836c-48f5-a2dd-51bc0740a31c/jobs/140113

- -

and subproject page with build instructions: https://github.com/OpenLineage/OpenLineage/tree/main/integration/sql

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 03:07:23
-
-

*Thread Reply:* Ok, so I go and "manually" build the internal dependency so that it becomes available in the pip cache?

- -

I was hoping for something more automagical, but that should work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-28 03:08:06
-
-

*Thread Reply:* I think so. @Jakub Dardziński am I right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 03:18:27
-
-

*Thread Reply:* https://openlineage.io/docs/development/developing/python/setup -there’s a guide how to setup the dev environment

- -

> Typically, you first need to build openlineage-sql locally (see README). After each release you have to repeat this step in order to bump local version of the package. -This might be somewhat exposed more in GitHub repository README as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 03:27:20
-
-

*Thread Reply:* It didnt find the wheel in the cache, but if I used the line in the sql/README.md -pip install openlineage-sql --no-index --find-links ../target/wheels --force-reinstall -It is installed and thus skipped/passed when pip later checks if it needs to be installed.

- -

Now I have a second issue because it is expecting me to have mysqlclient-2.2.0 which seems to need a binary -Command 'pkg-config --exists mysqlclient' returned non-zero exit status 127 -and -Command 'pkg-config --exists mariadb' returned non-zero exit status 127 -I am on Ubuntu 22.04 in WSL2. Should I go to apt and grab me a mysql client?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 03:31:52
-
-

*Thread Reply:* > It didnt find the wheel in the cache, but if I used the line in the sql/README.md -> pip install openlineage-sql --no-index --find-links ../target/wheels --force-reinstall -> It is installed and thus skipped/passed when pip later checks if it needs to be installed. -That’s actually expected. You should build new wheel locally and then install it.

- -

> Now I have a second issue because it is expecting me to have mysqlclient-2.2.0 which seems to need a binary -> Command 'pkg-config --exists mysqlclient' returned non-zero exit status 127 -> and -> Command 'pkg-config --exists mariadb' returned non-zero exit status 127 -> I am on Ubuntu 22.04 in WSL2. Should I go to apt and grab me a mysql client? -We’ve left some system specific configuration, e.g. mysqlclient, to users as it’s a bit aside from OpenLineage and more of general development task.

- -

probably -sudo apt-get install python3-dev default-libmysqlclient-dev build-essential -should work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 03:32:04
-
-

*Thread Reply:* I just realized that I should probably skip setting up my wsl and just run the tests in the docker setup you prepared

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 03:35:46
-
-

*Thread Reply:* You could do that as well but if you want to test your changes vs many Airflow versions that wouldn’t be possible I think (run them with tox btw)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 04:54:39
-
-

*Thread Reply:* This is starting to feel like a rabbit hole 😞

- -

When I run tox, I get a lot of build errors -• client needs to be built -• sql needs to be built to a different target than its readme says -• a lot of builds fail on cython_sources

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 05:19:34
-
-

*Thread Reply:* would you like to share some exact log lines? I’ve never seen such errors, they probably are system specific

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 06:45:48
-
-

*Thread Reply:* Getting requirements to build wheel did not run successfully. -│ exit code: 1 -╰─&gt; [62 lines of output] - /tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/config/setupcfg.py:293: _DeprecatedConfig: Deprecated config insetup.cfg` -!!`

- -
        `****************************************************************************************************************************************************************`
-        `The license_file parameter is deprecated, use license_files instead.`
-
-        `By 2023-Oct-30, you need to update your project and remove deprecated calls`
-        `or your builds will no longer be supported.`
-
-        `See <https://setuptools.pypa.io/en/latest/userguide/declarative_config.html> for details.`
-        `****************************************************************************************************************************************************************`
-
-`!!`
-  `parsed = self.parsers.get(option_name, lambda x: x)(value)`
-`running egg_info`
-`writing lib3/PyYAML.egg-info/PKG-INFO`
-`writing dependency_links to lib3/PyYAML.egg-info/dependency_links.txt`
-`writing top-level names to lib3/PyYAML.egg-info/top_level.txt`
-`Traceback (most recent call last):`
-  `File "/home/obr_erikal/projects/OpenLineage/integration/airflow/.tox/py3-airflow-2.1.4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in &lt;module&gt;`
-    `main()`
-  `File "/home/obr_erikal/projects/OpenLineage/integration/airflow/.tox/py3-airflow-2.1.4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main`
-    `json_out['return_val'] = hook(****hook_input['kwargs'])`
-  `File "/home/obr_erikal/projects/OpenLineage/integration/airflow/.tox/py3-airflow-2.1.4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel`
-    `return hook(config_settings)`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 355, in get_requires_for_build_wheel`
-    `return self._get_build_requires(config_settings, requirements=['wheel'])`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 325, in _get_build_requires`
-    `self.run_setup()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 341, in run_setup`
-    `exec(code, locals())`
-  `File "&lt;string&gt;", line 271, in &lt;module&gt;`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/__init__.py", line 103, in setup`
-    `return distutils.core.setup(****attrs)`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup`
-    `return run_commands(dist)`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands`
-    `dist.run_commands()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands`
-    `self.run_command(cmd)`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 989, in run_command`
-    `super().run_command(command)`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command`
-    `cmd_obj.run()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 318, in run`
-    `self.find_sources()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 326, in find_sources`
-    `mm.run()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 548, in run`
-    `self.add_defaults()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 586, in add_defaults`
-    `sdist.add_defaults(self)`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/sdist.py", line 113, in add_defaults`
-    `super().add_defaults()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 251, in add_defaults`
-    `self._add_defaults_ext()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 336, in _add_defaults_ext`
-    `self.filelist.extend(build_ext.get_source_files())`
-  `File "&lt;string&gt;", line 201, in get_source_files`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 107, in __getattr__`
-    `raise AttributeError(attr)`
-`AttributeError: cython_sources`
-`[end of output]`
-
- -

note: This error originates from a subprocess, and is likely not a problem with pip. -py3-airflow-2.1.4: exit 1 (7.85 seconds) /home/obr_erikal/projects/OpenLineage/integration/airflow&gt; python -m pip install --find-links target/wheels/ --find-links ../sql/iface-py/target/wheels --use-deprecated=legacy-resolver --constraint=<https://raw.githubusercontent.com/apache/airflow/constraints-2.1.4/constraints-3.8.txt> apache-airflow==2.1.4 'mypy&gt;=0.9.6' pytest pytest-mock -r dev-requirements.txt pid=368621 -py3-airflow-2.1.4: FAIL ✖ in 7.92 seconds

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 06:53:54
-
-

*Thread Reply:* Then, for the actual error in my PR: Evidently you are not using isort, so what linter/fixer should I use for imports?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 06:58:15
-
-

*Thread Reply:* for the error - I think there’s a mistake in the docs. Could you please run maturin build --out target/wheels as a temp solution?

- - - -
- 👀 Erik Alfthan -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 06:58:57
-
-

*Thread Reply:* we’re using ruff , tox runs it as one of commands

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:00:37
-
-

*Thread Reply:* Not in the airflow folder? -OpenLineage/integration/airflow$ maturin build --out target/wheels -💥 maturin failed - Caused by: pyproject.toml at /home/obr_erikal/projects/OpenLineage/integration/airflow/pyproject.toml is invalid - Caused by: TOML parse error at line 1, column 1 - | -1 | [tool.ruff] - | ^ -missing fieldbuild-system``

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:02:32
-
-

*Thread Reply:* I meant change here https://github.com/OpenLineage/OpenLineage/blob/main/integration/sql/README.md

- -

so -cd iface-py -python -m pip install maturin -maturin build --out ../target/wheels -becomes -cd iface-py -python -m pip install maturin -maturin build --out target/wheels -tox runs -install_command = python -m pip install {opts} --find-links target/wheels/ \ - --find-links ../sql/iface-py/target/wheels -but it should be -install_command = python -m pip install {opts} --find-links target/wheels/ \ - --find-links ../sql/target/wheels -actually and I’m posting PR to fix that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:05:12
-
-

*Thread Reply:* yes, that part I actually worked out myself, but the cython_sources error I fail to understand cause. I have python3-dev installed on WSL Ubuntu with python version 3.10.12 in a virtualenv. Anything in that that could cause issues?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:12:20
-
-

*Thread Reply:* looks like it has something to do with latest release of Cython? -pip install "Cython&lt;3" maybe solves the issue?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:15:06
-
-

*Thread Reply:* I didnt have any cython before the install. Also no change. Could it be some update to setuptools itself? seems like the depreciation notice and the error is coming from inside setuptools

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:16:59
-
-

*Thread Reply:* (I.e. I tried the pip install "Cython&lt;3" command without any change in the output )

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:20:30
-
-

*Thread Reply:* Applying ruff lint on the converter.py file fixed the issue on the PR though so unless you have any feedback on the change itself, I will set it up on my own computer later instead (right now doing changes on behalf of a client on the clients computer)

- -

If the issue persists on my own computer, I'll dig a bit further

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:21:03
-
-

*Thread Reply:* It’s a bit hard for me to find the root cause as I cannot reproduce this locally and CI works fine as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:22:41
-
-

*Thread Reply:* Yeah, I am thinking that if I run into the same problem "at home", I might find it worthwhile to understand the issue. Right now, the client only wants the fix.

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:25:10
-
-

*Thread Reply:* Is there an official release cycle?

- -

or more specific, given that the PRs are approved, how soon can they reach openlineage-dbt and apache-airflow-providers-openlineage ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:28:58
-
-

*Thread Reply:* we need to differentiate some things:

- -
  1. OpenLineage repository: -a. dbt integration - this is the only place where it is maintained -b. Airflow integration - here we only keep backwards compatibility but generally speaking starting from Airflow 2.7+ we would like to do all the job in Airflow repo as OL Airflow provider
  2. Airflow repository - there’s only Airflow Openlineage provider compatible (and works best) with Airflow 2.7+
  3. -
- -

we have control over releases (obviously) in OL repo - it’s monthly cycle so beginning next week that should happen. There’s also a possibility to ask for ad-hoc release in #general slack channel and with approvals of committers the new version is also released

- -

For Airflow providers - the cycle is monthly as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:31:30
-
-

*Thread Reply:* it’s a bit complex for this split but needed temporarily

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:31:47
-
-

*Thread Reply:* oh, I did the fix in the wrong place! The client is on airflow 2.7 and is using the provider. Is it syncing?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:32:28
-
-

*Thread Reply:* it’s not, two separate places a~nd we haven’t even added the whole thing with converting old lineage objects to OL specific~

- -

editing, that’s not true

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:34:40
-
-

*Thread Reply:* the code’s here: -https://github.com/apache/airflow/blob/main/airflow/providers/openlineage/extractors/manager.py#L154

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:35:17
-
-

*Thread Reply:* sorry I did not mention this earlier. we definitely need to add some guidance how to proceed with contributions to OL and Airflow OL provider

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:36:10
-
-

*Thread Reply:* anyway, the dbt fix is the blocking issue, so if that parts comes next week, there is no real urgency in getting the columns. It is a nice to have for our ingest parquet files.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:37:12
-
-

*Thread Reply:* may I ask if you use some custom operator / python operator there?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:37:33
-
-

*Thread Reply:* yeah, taskflow with inlets/outlets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:38:38
-
-

*Thread Reply:* so we extract from sources and use pyarrow to create parquet files in storage that an mssql-server can use as external tables

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:39:54
-
-

*Thread Reply:* awesome 👍 -we have plans to integrate more with Python operator as well but not earlier than in Airflow 2.8

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:43:41
-
-

*Thread Reply:* I guess writing a generic extractor for the python operator is quite hard, but if you could support some inlet/outlet type for tabular fileformat / their python libraries like pyarrow or maybe even pandas and document it, I think a lot of people would understand how to use them

- - - -
- ➕ Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-28 16:16:24
-
-

Are you located in the Brussels area or within commutable distance? Interested in attending a meetup between October 16-20? If so, please DM @Sheeri Cabral (Collibra) or myself. TIA

- - - -
- ❤️ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-02 11:58:32
-
-

@channel -Hello all, I’d like to open a vote to release OpenLineage 1.3.0, including: -• support for Spark 3.5 in the Spark integration -• scheme preservation bug fix in the Spark integration -• find-links path in tox bug in the Airflow integration fix -• more graceful logging when no OL provider is installed in the Airflow integration -• columns as schema facet for airflow.lineage.Table addition -• SQLSERVER to supported dbt profile types addition -Three +1s from committers will authorize. Thanks in advance.

- - - -
- 🙌 Harel Shein, Paweł Leszczyński, Rodrigo Maia -
- -
- 👍 Jason Yip, Paweł Leszczyński -
- -
- ➕ Willy Lulciuc, Jakub Dardziński, Erik Alfthan, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-02 17:00:08
-
-

*Thread Reply:* Thanks all. The release is authorized and will be initiated within 2 business days.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-02 17:11:46
-
-

*Thread Reply:* looking forward to that, I am seeing inconsistent results in Databricks for Spark 3.4+, sometimes there's no inputs / outputs, hope that is fixed?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-10-03 09:59:24
-
-

*Thread Reply:* @Jason Yip if it isn’t fixed for you, would love it if you could open up an issue that will allow us to reproduce and fix

- - - -
- 👍 Jason Yip -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-03 20:23:40
-
-

*Thread Reply:* @Harel Shein the issue still exists -> Spark 3.4 and above, including 3.5, saveAsTable and create table won't have inputs and outputs in Databricks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-03 20:30:15
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/2124

-
- - - - - - - -
-
Labels
- integration/spark, integration/databricks -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-03 20:30:21
-
-

*Thread Reply:* and of course this issue still exists

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-10-03 21:45:09
-
-

*Thread Reply:* thanks for posting, we’ll continue looking into this.. if you find any clues that might help, please let us know.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-03 21:46:27
-
-

*Thread Reply:* is there any instructions on how to hook up a debugger to OL?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-10-04 09:04:16
-
-

*Thread Reply:* @Paweł Leszczyński has been working on adding a debug facet, but more suggestions are more than welcome!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-10-04 09:05:58
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/2147

-
- - - - - - - -
-
Labels
- documentation, integration/spark -
- -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- - - - - - - - - - -
- - - -
- 👀 Paweł Leszczyński -
- -
- 👍 Jason Yip -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-05 03:20:11
-
-

*Thread Reply:* @Paweł Leszczyński do you have a build for the PR? Appreciated!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-10-05 15:05:08
-
-

*Thread Reply:* we’ll ask for a release once it’s reviewed and merged

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-02 12:28:28
-
-

@channel -The September issue of OpenLineage News is here! This issue covers the big news about OpenLineage coming out of Airflow Summit, progress on the Airflow Provider, highlights from our meetup in Toronto, and much more. -To get the newsletter directly in your inbox each month, sign up here.

-
-
apache.us14.list-manage.com
- - - - - - - - - - - - - - - -
- - - -
- 🦆 Harel Shein, Paweł Leszczyński -
- -
- 🔥 Willy Lulciuc, Jakub Dardziński, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-10-03 03:44:36
-
-

Hi folks - I'm wondering if its just me, but does io.openlineage:openlineage_sql_java:1.2.2 ship with the arm64.dylib binary? When i try and run code that uses the Java package on an Apple M1, the binary isn't found, The workaround is to checkout 1.2.2 and then build and publish it locally.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-03 09:01:38
-
-

*Thread Reply:* Not sure if I follow your question. Whenever OL is released, there is a script new-version.sh - https://github.com/OpenLineage/OpenLineage/blob/main/new-version.sh being run and modify the codebase.

- -

So, If you pull the code, it contains OL version that has not been released yet and in case of dependencies, one need to build them on their own.

- -

For example, here https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#preparation Preparation section describes how to build openlineage-java and openlineage-sql in order to build openlineage-spark.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-10-04 05:27:26
-
-

*Thread Reply:* Hmm. Let's elaborate my use case a bit.

- -

We run Apache Hive on-premise. Hive provides query execution hooks for pre-query, post-query, and I think failed query.

- -

Any way, as part of the hook, you're given the query string.

- -

So I, naturally, tried to pass the query string into OpenLineageSql.parse(Collections.singletonList(hookContext.getQueryPlan().getQueryStr()), "hive") in order to test this out.

- -

I was using openlineage-sql-java:1.2.2 at that time, and no matter what query string I gave it, nothing was returned.

- -

I then stepped through the code and noticed that it was looking for the arm64 lib, and I noticed that that package (downloaded from maven central) lacked that particular native binary.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-10-04 05:27:36
-
-

*Thread Reply:* I hope that helps.

- - - -
- 👍 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-04 09:03:02
-
-

*Thread Reply:* I get in now. In Circle CI we do have 3 build steps: -- build-integration-sql-x86 - - build-integration-sql-arm - - build-integration-sql-macos -but no mac m1. I think at that time circle CI did not have a proper resource class in free plan. Additionally, @Maciej Obuchowski would prefer to migrate this to github actions as he claims this can be achieved there in a cleaner way (https://github.com/OpenLineage/OpenLineage/issues/1624).

- -

Feel free to create an issue for this. Others would be able to upvote it in case they have similar experience.

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/mobuchowski">@mobuchowski</a> -
- -
-
Labels
- ci, integration/sql -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-10-23 11:56:12
-
-

*Thread Reply:* It doesn't have the free resource class still 😞 -We're blocked on that unfortunately. Other solution would be to migrate to GH actions, where most of our solution could be replaced by something like that https://github.com/PyO3/maturin-action

-
- - - - - - - -
-
Stars
- 98 -
- -
-
Language
- TypeScript -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-03 10:56:03
-
-

@channel -We released OpenLineage 1.3.1! -Added: -• Airflow: add some basic stats to the Airflow integration #1845 @harels -• Airflow: add columns as schema facet for airflow.lineage.Table (if defined) #2138 @erikalfthan -• DBT: add SQLSERVER to supported dbt profile types #2136 @erikalfthan -• Spark: support for latest 3.5 #2118 @pawel-big-lebowski -Fixed: -• Airflow: fix find-links path in tox #2139 @JDarDagran -• Airflow: add more graceful logging when no OpenLineage provider installed #2141 @JDarDagran -• Spark: fix bug in PathUtils’ prepareDatasetIdentifierFromDefaultTablePath (CatalogTable) to correctly preserve scheme from CatalogTable’s location #2142 @d-m-h -Thanks to all the contributors, including new contributor @Erik Alfthan! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.3.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.2.2...1.3.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👍 Jason Yip, Peter Hicks, Peter Huang, Mars Lan -
- -
- 🎉 Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-10-04 07:42:59
-
-

*Thread Reply:* Any chance we can do a 1.3.2 soonish to include https://github.com/OpenLineage/OpenLineage/pull/2151 instead of waiting for the next monthly release?

-
- - - - - - - -
-
Labels
- documentation, client/python -
- -
-
Comments
- 4 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Matthew Paras - (matthewparas2020@u.northwestern.edu) -
-
2023-10-03 12:34:57
-
-

Hey everyone - does anyone have a good mechanism for alerting on issues with open lineage? For example, maybe alerting when an event times out - perhaps to prometheus or some other kind of generic endpoint? Not sure the best approach here (if the meta inf extension would be able to achieve it)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-04 03:01:02
-
-

*Thread Reply:* That's a great usecase for OpenLineage. Unfortunately, we don't have any doc or recomendation on that.

- -

I would try using FluentD proxy we have (https://github.com/OpenLineage/OpenLineage/tree/main/proxy/fluentd) to copy event stream (alerting is just one of usecases for lineage events) and write fluentd plugin to send it asynchronously further to alerting service like PagerDuty.

- -

It looks cool to me but I never had enough time to test this approach.

- - - -
- 👍 Matthew Paras -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-05 14:44:14
-
-

@channel -This month’s TSC meeting is next Thursday the 12th at 10am PT. On the tentative agenda: -• announcements -• recent releases -• Airflow Summit recap -• tutorial: migrating to the Airflow Provider -• discussion topic: observability for OpenLineage/Marquez -• open discussion -• more (TBA) -More info and the meeting link can be found on the website. All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👀 Sheeri Cabral (Collibra), Julian LaNeve, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-10-05 20:40:40
-
-

The Marquez meetup in San Francisco is happening right now! -https://www.meetup.com/meetup-group-bnfqymxe/events/295444209/?utmmedium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|https://www.meetup.com/meetup-group-bnfqymxe/events/295444209/?utmmedium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
- 🎉 Paweł Leszczyński, Rodrigo Maia -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-10-06 07:19:01
-
-

@Michael Robinson can we cut a new release to include this change? -• https://github.com/OpenLineage/OpenLineage/pull/2151

-
- - - - - - - -
-
Labels
- documentation, client/python -
- -
-
Comments
- 6 -
- - - - - - - - - - -
- - - -
- ➕ Harel Shein, Jakub Dardziński, Julien Le Dem, Michael Robinson, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-06 19:16:02
-
-

*Thread Reply:* Thanks for requesting a release, @Mars Lan. It has been approved and will be initiated within 2 business days of next Monday.

- - - -
- 🙏 Mars Lan -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-08 23:59:36
-
-

@here I am trying out the openlineage integration of spark on databricks. There is no event getting emitted from Openlineage, I see logs saying OpenLineage Event Skipped. I am attaching the Notebook that i am trying to run and the cluster logs. Kindly can someone help me on this

- - -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-09 00:02:10
-
-

*Thread Reply:* from my experience, it will only work on Spark 3.3.x or below, aka Runtime 12.2 or below. Anything above the events will show up once in a blue moon

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 00:04:38
-
-

*Thread Reply:* ohh, thanks for the information @Jason Yip, I am trying out with 13.3 Databricks Version and Spark 3.4.1, will try using a below version as you suggested. Any issue tracking this bug @Jason Yip

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-09 00:06:06
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/2124

-
- - - - - - - -
-
Labels
- integration/spark, integration/databricks -
- -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 00:11:54
-
-

*Thread Reply:* tried with databricks 12.2 --> spark 3.3.2, still the same behaviour no event getting emitted

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-09 00:12:35
-
-

*Thread Reply:* you can do 11.3, its the most stable one I know

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 00:12:46
-
-

*Thread Reply:* sure, let me try that out

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 00:31:51
-
-

*Thread Reply:* still the same problem…the jar that i am using is the latest openlineage-spark-1.3.1.jar, do you think that can be the problem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 00:43:59
-
-

*Thread Reply:* tried with openlineage-spark-1.2.2.jar, still the same issue, seems like they are skipping some events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-09 01:47:20
-
-

*Thread Reply:* Probably not all events will be captured, I have only tested create tables and jobs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-09 04:31:12
-
-

*Thread Reply:* Hi @Guntaka Jeevan Paul, how did you configure openlineage and what is your job doing?

- -

We do have a bunch of integration tests on Databricks platform available here and they're passing on databricks runtime 13.0.x-scala2.12.

- -

Could you also try running code same as our test does (this one)? If you run it and see OL events, this will make us sure your config is OK and we can continue further debug.

- -

Looking at your spark script: could you save your dataset and see if you still don't see any events?

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 05:06:41
-
-

*Thread Reply:* babynames = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("dbfs:/FileStore/babynames.csv") -babynames.createOrReplaceTempView("babynames_table") -years = spark.sql("select distinct(Year) from babynames_table").rdd.map(lambda row : row[0]).collect() -years.sort() -dbutils.widgets.dropdown("year", "2014", [str(x) for x in years]) -display(babynames.filter(babynames.Year == dbutils.widgets.get("year")))

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 05:08:09
-
-

*Thread Reply:* this is the script that i am running @Paweł Leszczyński…kindly let me know if i’m doing any mistake. I have added the init script at the cluster level and from the logs i could see that openlineage is configured as i see a log statement

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-09 05:10:30
-
-

*Thread Reply:* there's nothing wrong in that script. It's just we decided to limit amount of OL events for jobs that don't write their data anywhere and just do collect operation

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-09 05:11:02
-
-

*Thread Reply:* this is also a potential reason why can't you see any events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 05:14:33
-
-

*Thread Reply:* ohh…okk, will try out the test script that you have mentioned above. Kindly correct me if my understanding is correct, so if there are a few transformatiosna nd finally writing somewhere that is where the OL events are expected to be emitted?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-09 05:16:54
-
-

*Thread Reply:* yes. main purpose of the lineage is to track dependencies between the datasets, when a job reads from dataset A and writes to dataset B. In case of databricks notebook, that do show or collect and print some query result on the screen, there may be no reason to track it in the sense of lineage.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-09 15:25:14
-
-

@channel -We released OpenLineage 1.4.1! -Additions: -• Client: allow setting client’s endpoint via environment variable 2151 @Mars Lan -• Flink: expand Iceberg source types 2149 @Peter Huang -• Spark: add debug facet 2147 @Paweł Leszczyński -• Spark: enable Nessie REST catalog 2165 @julwin -Thanks to all the contributors, especially new contributors @Peter Huang and @julwin! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.4.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.3.1...1.4.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👍 Jason Yip, Ross Turk, Mars Lan, Harel Shein, Rodrigo Maia -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Drew Bittenbender - (drew@salt.io) -
-
2023-10-09 16:55:35
-
-

Hello. I am getting started with OL and Marquez with dbt. I am using dbt-ol. The namespace of the dataset showing up in Marquez is not the namespace I provide using OPENLINEAGENAMESPACE. It happens to be the same as the source in Marquez which is the snowflake account uri. It's obviously picking up the other env variable OPENLINEAGEURL so i am pretty sure its not the environment. Is this expected?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-09 18:56:13
-
-

*Thread Reply:* Hi Drew, thank you for using OpenLineage! I don’t know the details of your use case, but I believe this is expected, yes. In general, the dataset namespace is different. Jobs are namespaced separately from datasets, which are namespaced by their containing datasources. This is the case so datasets have the same name regardless of the job writing to them, as datasets are sometimes shared by jobs in different namespaces.

- - - -
- 👍 Drew Bittenbender -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-10 01:05:11
-
-

Any idea why "environment-properties" is gone in Spark 3.4+ in StartEvent?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-10 20:53:59
-
-

example:

- -

{"environment_properties":{"spark.databricks.clusterUsageTags.clusterName":"<a href="mailto:jason.yip@tredence.com">jason.yip@tredence.com</a>'s Cluster","spark.databricks.job.runId":"","spark.databricks.job.type":"","spark.databricks.clusterUsageTags.azureSubscriptionId":"a4f54399_8db8_4849_adcc_a42aed1fb97f","spark.databricks.notebook.path":"/Repos/jason.yip@tredence.com/segmentation/01_Data Prep","spark.databricks.clusterUsageTags.clusterOwnerOrgId":"4679476628690204","MountPoints":[{"MountPoint":"/databricks-datasets","Source":"databricks_datasets"},{"MountPoint":"/Volumes","Source":"UnityCatalogVolumes"},{"MountPoint":"/databricks/mlflow-tracking","Source":"databricks/mlflow-tracking"},{"MountPoint":"/databricks-results","Source":"databricks_results"},{"MountPoint":"/databricks/mlflow-registry","Source":"databricks/mlflow-registry"},{"MountPoint":"/Volume","Source":"DbfsReserved"},{"MountPoint":"/volumes","Source":"DbfsReserved"},{"MountPoint":"/","Source":"DatabricksRoot"},{"MountPoint":"/volume","Source":"DbfsReserved"}],"User":"<a href="mailto:jason.yip@tredence.com">jason.yip@tredence.com</a>","UserId":"4768657035718622","OrgId":"4679476628690204"}}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-11 03:46:13
-
-

*Thread Reply:* Is this related to any OL version? In OL 1.2.2. we've added extra variable spark.databricks.clusterUsageTags.clusterAllTags to be captured, but this should not break things.

- -

I think we're facing some issues on recent databricks runtime versions. Here is an issue for this: https://github.com/OpenLineage/OpenLineage/issues/2131

- -

Is the problem you describe specific to some databricks runtime versions?

-
- - - - - - - -
-
Labels
- integration/spark, integration/databricks -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-11 11:17:06
-
-

*Thread Reply:* yes, exactly Spark 3.4+

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-11 21:12:27
-
-

*Thread Reply:* Btw I don't understand the code flow entirely, if we are talking about a different classpath only, I see there's Unity Catalog handler in the code and it says it works the same as Delta, but I am not seeing it subclassing Delta. I suppose it will work the same.

- -

I am happy to jump on a call to show you if needed

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-16 02:58:56
-
-

*Thread Reply:* @Paweł Leszczyński do you think in Spark 3.4+ only one event would happen?

- -

/** - * We get exact copies of OL events for org.apache.spark.scheduler.SparkListenerJobStart and - * org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart. The same happens for end - * events. - * - * @return - */ - private boolean isOnJobStartOrEnd(SparkListenerEvent event) { - return event instanceof SparkListenerJobStart || event instanceof SparkListenerJobEnd; - }

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-10 23:43:39
-
-

@here i am trying out the databricks spark integration and in one of the events i am getting a openlineage event where the output dataset is having a facet called symlinks , the statement that generated this event is this sql -CREATE TABLE IF NOT EXISTS covid_research.covid_data -USING CSV -LOCATION '<abfss://oltptestdata@jeevanacceldata.dfs.core.windows.net/testdata/johns-hopkins-covid-19-daily-dashboard-cases-by-states.csv>' -OPTIONS (header "true", inferSchema "true"); -Can someone kindly let me know what this symlinks facet is. i tried seeing the spec but did not get it completely

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-10 23:44:53
-
-

*Thread Reply:* I use it to get the table with database name

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-10 23:47:15
-
-

*Thread Reply:* so can i think it like if there is a synlink, then that table is kind of a reference to the original dataset

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-11 01:25:44
-
-

*Thread Reply:* yes

- - - -
- 🙌 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-11 06:55:58
-
-

@here When i am running this sql as part of a databricks notebook, i am recieving an OL event where i see only an output dataset and there is no input dataset or a symlink facet inside the dataset to map it to the underlying azure storage object. Can anyone kindly help on this -spark.sql(f"CREATE TABLE IF NOT EXISTS covid_research.uscoviddata USING delta LOCATION '<abfss://oltptestdata@jeevanacceldata.dfs.core.windows.net/testdata/modified-delta>'") -{ - "eventTime": "2023-10-11T10:47:36.296Z", - "producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "schemaURL": "<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent>", - "eventType": "COMPLETE", - "run": { - "runId": "d0f40be9-b921-4c84-ac9f-f14a86c29ff7", - "facets": { - "spark.logicalPlan": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet>", - "plan": [ - { - "class": "org.apache.spark.sql.catalyst.plans.logical.CreateTable", - "num-children": 1, - "name": 0, - "tableSchema": [], - "partitioning": [], - "tableSpec": null, - "ignoreIfExists": true - }, - { - "class": "org.apache.spark.sql.catalyst.analysis.ResolvedIdentifier", - "num-children": 0, - "catalog": null, - "identifier": null - } - ] - }, - "spark_version": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet>", - "spark-version": "3.3.0", - "openlineage-spark-version": "1.2.2" - }, - "processing_engine": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet>", - "version": "3.3.0", - "name": "spark", - "openlineageAdapterVersion": "1.2.2" - } - } - }, - "job": { - "namespace": "default", - "name": "adb-3942203504488904.4.azuredatabricks.net.create_table.covid_research_db_uscoviddata", - "facets": {} - }, - "inputs": [], - "outputs": [ - { - "namespace": "dbfs", - "name": "/user/hive/warehouse/covid_research.db/uscoviddata", - "facets": { - "dataSource": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet>", - "name": "dbfs", - "uri": "dbfs" - }, - "schema": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet>", - "fields": [] - }, - "storage": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/StorageDatasetFacet.json#/$defs/StorageDatasetFacet>", - "storageLayer": "unity", - "fileFormat": "parquet" - }, - "symlinks": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet>", - "identifiers": [ - { - "namespace": "/user/hive/warehouse/covid_research.db", - "name": "covid_research.uscoviddata", - "type": "TABLE" - } - ] - }, - "lifecycleStateChange": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet>", - "lifecycleStateChange": "CREATE" - } - }, - "outputFacets": {} - } - ] -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-10-11 06:57:46
-
-

*Thread Reply:* Hey Guntaka - can I ask you a favour? Can you please stop using @here or @channel - please keep in mind, you're pinging over 1000 people when you use that mention. Its incredibly distracting to have Slack notify me of a message that isn't pertinent to me.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-11 06:58:50
-
-

*Thread Reply:* sure noted @Damien Hawes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-10-11 06:59:34
-
-

*Thread Reply:* Thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-10-11 12:04:24
-
-

Hi @there, I am trying to make API call to get column-lineage information could you please let me know the url construct to retrieve the same? As per the API documentation I am passing the following url to GET column-lineage: http://localhost:5000/api/v1/column-lineage but getting error code:400. Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-12 13:55:26
-
-

*Thread Reply:* Make sure to provide a dataset field nodeId as a query param in your request. If you’ve seeded Marquez with test metadata, you can use: -curl -XGET "<http://localhost:5002/api/v1/column-lineage?nodeId=datasetField%3Afood_delivery%3Apublic.delivery_7_days%3Acustomer_email>" -You can view the API docs for column lineage here!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-10-17 05:57:36
-
-

*Thread Reply:* Thanks Willy. The documentation says 'name space' so i constructed API Like this: -'http://marquez-web:3000/api/v1/column-lineage/nodeId=datasetField:file:/home/jovyan/Downloads/event_attribute.csv:eventType' -but it is still not working 😞

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-10-17 06:07:06
-
-

*Thread Reply:* nodeId is constructed like this: datasetField:<namespace>:<dataset>:<field name>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-11 13:00:01
-
-

@channel -Friendly reminder: this month’s TSC meeting, open to all, is tomorrow at 10 am PT: https://openlineage.slack.com/archives/C01CK9T7HKR/p1696531454431629

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-11 14:26:45
-
-

*Thread Reply:* Newly added discussion topics: -• a proposal to add a Registry of Consumers and Producers -• a dbt issue to add OpenLineage Dataset names to the Manifest -• a proposal to add Dataset support in Spark LogicalPlan Nodes -• a proposal to institute a certification process for new integrations

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-12 15:08:34
-
-

This might be a dumb question, I guess I need to setup local Spark in order for the Spark tests to run successfully?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-13 01:56:19
-
-

*Thread Reply:* just follow these instructions: https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#build

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-13 06:41:56
-
-

*Thread Reply:* when trying to install openlineage-java in local via this command --> cd ../../client/java/ && ./gradlew publishToMavenLocal, i am receiving this error -```> Task :signMavenJavaPublication FAILED

- -

FAILURE: Build failed with an exception.

- -

** What went wrong: -Execution failed for task ':signMavenJavaPublication'. -> Cannot perform signing task ':signMavenJavaPublication' because it has no configured signatory```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-13 13:35:06
-
-

*Thread Reply:* @Paweł Leszczyński this is what I am getting

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-13 13:36:00
-
-

*Thread Reply:* attaching the html

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-16 03:02:13
-
-

*Thread Reply:* which java are you using? what is your operation system (is it windows?)?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-16 03:35:18
-
-

*Thread Reply:* yes it is Windows, i downloaded java 8 but I can try to build it with Linux subsystem or Mac

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-16 03:35:51
-
-

*Thread Reply:* In my case it is Mac

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-16 03:56:09
-
-

*Thread Reply: * Where: -Build file '/mnt/c/Users/jason/Downloads/github/OpenLineage/integration/spark/build.gradle' line: 9

- -

** What went wrong: -An exception occurred applying plugin request [id: 'com.adarshr.test-logger', version: '3.2.0'] -> Failed to apply plugin [id 'com.adarshr.test-logger'] - > Could not generate a proxy class for class com.adarshr.gradle.testlogger.TestLoggerExtension.

- -

** Try:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-16 03:56:23
-
-

*Thread Reply:* tried with Linux subsystem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-16 04:04:29
-
-

*Thread Reply:* we don't have any restrictions for windows builds, however it is something we don't test regularly. 2h ago we did have a successful build on circle CI https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/8271/workflows/0ec521ae-cd21-444a-bfec-554d101770ea

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-16 04:13:04
-
-

*Thread Reply:* ... 111 more -Caused by: java.lang.ClassNotFoundException: org.gradle.api.provider.HasMultipleValues - ... 117 more

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-17 00:26:07
-
-

*Thread Reply:* @Paweł Leszczyński now I am doing gradlew instead of gradle on windows coz Linux one doesn't work. The doc didn't mention about setting up Spark / Hadoop and that's my original question -- do I need to setup local Spark? Now it's throwing an error on Hadoop: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-21 23:33:48
-
-

*Thread Reply:* Got it working with Mac, couldn't get it working with Windows / Linux subsystem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-22 13:08:40
-
-

*Thread Reply:* Now getting class not found despite build and test succeeded

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-22 21:46:23
-
-

*Thread Reply:* I uploaded the wrong jar.. there are so many jars, only the jar in the spark folder works, not subfolder

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-10-13 02:48:40
-
-

Hi team, I am running the following pyspark code in a cell: -```print("SELECTING 100 RECORDS FROM METADATA TABLE") -df = spark.sql("""select ** from limit 100""")

- -

print("WRITING (1) 100 RECORDS FROM METADATA TABLE") -df.write.mode("overwrite").format('delta').save("") -df.createOrReplaceTempView("temp_metadata")

- -

print("WRITING (2) 100 RECORDS FROM METADATA TABLE") -df.write.mode("overwrite").format("delta").save("")

- -

print("READING (1) 100 RECORDS FROM METADATA TABLE") -dfread = spark.read.format('delta').load("") -dfread.createOrReplaceTempView("metadata_1")

- -

print("DOING THE MERGE INTO SQL STEP!") -dfnew = spark.sql(""" - MERGE INTO metadata1 - USING

- ON metadata1.id = tempmetadata.id - WHEN MATCHED THEN UPDATE SET - metadata1.id = tempmetadata.id, - metadata1.aspect = tempmetadata.aspect - WHEN NOT MATCHED THEN INSERT (id, aspect) - VALUES (tempmetadata.id, tempmetadata.aspect) -""")`` -I am running with debug log levels. I actually don't see any of the events being logged forSaveIntoDataSourceCommandor theMergeIntoCommand`, but OL is in fact emitting events to the backend. It seems like the events are just not being logged... I actually observe this for all delta table related spark sql queries...

- - - - - - - - - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-10-16 00:01:42
-
-

*Thread Reply:* Hi @Paweł Leszczyński is this expected? CMIIW but we should expect to see the events being logged when running with debug log level right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-10-16 04:17:30
-
-

*Thread Reply:* It's impossible to know without seeing how you've configured the listener.

- -

Can you show this configuration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-10-17 03:15:20
-
-

*Thread Reply:* spark.openlineage.transport.url &lt;url&gt; -spark.openlineage.transport.endpoint /&lt;endpoint&gt; -spark.openlineage.transport.type http -spark.extraListeners io.openlineage.spark.agent.OpenLineageSparkListener -spark.openlineage.facets.custom_environment_variables [BUNCH_OF_VARIABLES;] -spark.openlineage.facets.disabled [spark_unknown\;spark.logicalPlan] -These are my spark configs... I'm setting log level to debug with sc.setLogLevel("DEBUG")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-10-17 04:40:03
-
-

*Thread Reply:* Two things:

- -
  1. If you want debug logs, you're going to have to provide a log4j.properties file or log4j2.properties file depending on the version of spark you're running. In that file, you will need to configure the logging levels. If I am not mistaken, the sc.setLogLevel controls ONLY the log levels of Spark namespaced components (i.e., org.apache.spark)
  2. You're telling the listener to emit to a URL. If you want to see the events emitted to the console, then set spark.openlineage.transport.type=console, and remove the other spark.openlineage.transport.** configurations. -Do either (1) or (2).
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-10-20 00:49:45
-
-

*Thread Reply:* @Damien Hawes Hi, sflr.

- -
  1. So enabling sc.setLogLevel does actually enable debug logs from Openlineage. I can see the events and everyting being logged if I save it as a parquet format instead of delta.
  2. I do want to emit events to the url. But, I would like to just see what exactly are the events being emitted for some specific jobs, since I see that the lineage is incorrect for some MergeInto cases
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-10-26 04:56:50
-
-

*Thread Reply:* Hi @Damien Hawes would like to check again on whether you'd have any thoughts about this... Thanks! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-10-17 03:17:57
-
-

Hello All 👋! -We are currently trying to work the the spark integration for OpenLineage in our Databricks instance. The general setup is done and working with a few hicups here and there. -But one thing we are still struggling is how to link all spark jobs events with a Databricks job or a notebook run. -We´ve recently noticed that some of the events produced by OL have the "environment-properties" attribute with information (for our context) regarding notebook path (if it is a notebook run), or the the job run ID (if its a databricks job run). But the thing is that these attributes are not always present. -I ran some samples yesterday for a job with 4 notebook tasks. From all 20 json payload sent by the OL listener, only 3 presented the "environment-properties" attribute. Its not only happening with Databricks jobs. When i run single notebooks and each cell has its onw set of spark jobs, not all json events presented that property either.

- -

So my question is what is the criteria to have this attributes present or not in the event json file? Or maybe this in an issue? @Jason Yip did you find out anything about this?

- -

⚙️ Spark 3.4 / OL-Spark 1.4.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-17 06:55:47
-
-

*Thread Reply:* In general, we assume that OL events per run are cumulative. So, if you have 20 events with the same runId , then even if a single event contains some facet, we consider this is OK and let the backend combine it together. That's what we do in Marquez project (a reference backend architecture for OL) and that's why it is worth to use in Marquez as a rest API.

- -

Are you able to use job namespace to aggregate all the Spark actions run within the databricks notebook? This is something that should serve this purpose.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-17 12:48:33
-
-

*Thread Reply:* @Rodrigo Maia for Spark 3.4 I don't see the environment-properties showing up at all, but if you run the code as it is, register a listener on SparkListenerJobStart and get the properties, all of those properties will show up. There's an event filter that filters out the SparkListenerJobStart, I suspect that filtered out the "unneccessary" events.. was trying to do a custom build to do that, but still trying to setup Hadoop and Spark on my local

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-10-18 05:23:16
-
-

*Thread Reply:* @Paweł Leszczyński you are right. This is what we are doing as well, combining events with the same runId to process the information on our backend. But even so, there are several runIds without this information. I went through these events to have a better view of what was happening. As you can see from 7 runIds, only 3 were showing the "environment-properties" attribute. Some condition is not being met here, or maybe it is what @Jason Yip suspects and there's some sort of filtering of unnecessary events

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-19 02:28:03
-
-

*Thread Reply:* @Rodrigo Maia, If you are able to provide a small Spark script such that none of the OL events contain the environment-properties, but at least one should, please raise an issue for this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-19 02:29:11
-
-

*Thread Reply:* It's extremely helpful when community open issues that are not only described well, but also contain small piece of code needed to reproduce this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-10-19 02:59:39
-
-

*Thread Reply:* I know. that's the goal. that is why I wanted to understand in the first place if there was any condition preventing this from happening, but now i get that this is not expected behaviour.

- - - -
- 👍 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-19 13:44:00
-
-

*Thread Reply:* @Paweł Leszczyński @Rodrigo Maia I am referring to this: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/main/java/io/openlineage/spark/agent/filters/DeltaEventFilter.java#L51

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-19 14:49:03
-
-

*Thread Reply:* Please note that I am getting the same behavior, no code is needed, Spark 3.4+ won't be generating no matter what. I have been testing the same code for 2 months from this issue: https://github.com/OpenLineage/OpenLineage/issues/2124

- -

I tried the code without OL and it worked perfectly, so it is OL filtering out the event for sure. I will try posting the code I use to collect the properties.

-
- - - - - - - -
-
Labels
- integration/spark, integration/databricks -
- -
-
Comments
- 3 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-19 23:46:17
-
-

*Thread Reply:* this code proves that the prosperities are still there, somehow got filtered out by OL:

- -

```%scala -import org.apache.spark.scheduler._

- -

class JobStartListener extends SparkListener { - override def onJobStart(jobStart: SparkListenerJobStart): Unit = { - // Extract properties here - val jobId = jobStart.jobId - val stageInfos = jobStart.stageInfos - val properties = jobStart.properties

- -
// You can print properties or save them somewhere
-println(s"JobId: $jobId, Stages: ${stageInfos.size}, Properties: $properties")
-
- -

} -}

- -

val listener = new JobStartListener() -spark.sparkContext.addSparkListener(listener)

- -

val df = spark.range(1000).repartition(10) -df.count()```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-19 23:55:05
-
-

*Thread Reply:* of course feel free to test this logic as well, it still works -- if not the filtering:

- -

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/[…]ark/agent/facets/builder/DatabricksEnvironmentFacetBuilder.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-10-30 04:46:16
-
-

*Thread Reply:* Any ideas on how could i test it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ankit jain - (ankit.goods10@gmail.com) -
-
2023-10-17 22:57:03
-
-

Hello All, I am completely new for Openlineage, I have to setup the lab to conduct POC on various aspects like Lineage, metadata management , etc. As per openlineage site, i tried downloading Ubuntu, docker and binary files for Marquez. But I am lost somewhere and unable to configure whole setup. Can someone please assist in steps to start from scratch so that i can delve into the Openlineage capabilities. Many thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-18 01:32:01
-
-

*Thread Reply:* hey, did you try to follow one of these guides? -https://openlineage.io/docs/guides/about

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-18 09:14:08
-
-

*Thread Reply:* Which guide were you using, and what errors/issues are you encountering?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ankit jain - (ankit.goods10@gmail.com) -
-
2023-10-21 15:43:14
-
-

*Thread Reply:* Thanks Jakub for the response.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ankit jain - (ankit.goods10@gmail.com) -
-
2023-10-21 15:45:42
-
-

*Thread Reply:* In docker, marquez-api image is not running and exiting with the exit code 127.

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-22 09:34:53
-
-

*Thread Reply:* @ankit jain thanks. I don't recognize 127, but 9 times out of 10 if the API or DB container fails the reason is a port conflict. Have you checked if port 5000 is available?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-22 09:54:10
-
-

*Thread Reply:* could you please check what’s the output of -git config --get core.autocrlf -or -git config --global --get core.autocrlf -?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ankit jain - (ankit.goods10@gmail.com) -
-
2023-10-24 08:09:14
-
-

*Thread Reply:* @Michael Robinson thanks , I checked the port 5000 is not available. -I tried deleting docker images and recreating them, but still the same issue persist stating -/Usr/bin/env bash/r not found. -Gradle build is successful.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ankit jain - (ankit.goods10@gmail.com) -
-
2023-10-24 08:09:54
-
-

*Thread Reply:* @Jakub Dardziński thanks, first command resulted as true and second command has no response

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-24 08:15:57
-
-

*Thread Reply:* are you running docker and git in Windows or Mac OS before 10.0?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Matthew Paras - (matthewparas2020@u.northwestern.edu) -
-
2023-10-19 15:00:42
-
-

Hey all - we've been noticing that some events go unreported by openlineage (spark) when the AsyncEventQueue fills up and starts dropping events. Wondering if anyone has experienced this before, and knows why it is happening? We've expanded the event queue capacity and thrown more hardware at the problem but no dice

- -

Also as a note, the query plans from this job are pretty big - could the listener just be choking up? Happy to open a github issue as well if we suspect that it could be the listener itself having issues

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-10-20 02:57:50
-
-

*Thread Reply:* Hi, just checking, are you excluding the sparkPlan from the events? Or is it sending the spark plan too

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-10-23 11:59:40
-
-

*Thread Reply:* yeah - setting spark.openlineage.facets.disabled to [spark_unknown;spark.logicalPlan] should help

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Matthew Paras - (matthewparas2020@u.northwestern.edu) -
-
2023-10-24 17:50:26
-
-

*Thread Reply:* sorry for the late reply - turns out this job is just whack 😄 we were going in circles trying to figure it out, we end up dropping events without open lineage enabled at all. But good to know that disabling the logical plan should speed us up if we run into this again

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
praveen kanamarlapudi - (kpraveen420@gmail.com) -
-
2023-10-20 18:18:37
-
-

Hi,

- -

We are using openlineage spark connector. We have used spark 3.2 and scala 2.12 so far. We have triggered a new job with Spark 3.4 and scala 2.13 and faced below exception.

- -

java.lang.NoSuchMethodError: 'scala.collection.Seq org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.map(scala.Function1)' - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.lambda$buildInputDatasets$6(OpenLineageRunEventBuilder.java:341) - at java.base/java.util.Optional.map(Optional.java:265) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildInputDatasets(OpenLineageRunEventBuilder.java:339) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.populateRun(OpenLineageRunEventBuilder.java:295) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:279) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:222) - at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:72) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:91)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-23 04:56:25
-
-

*Thread Reply:* Hmy, that is interesting. Did it occur on databricks runtime? Could you give it a try with Scala 2.12? I think we don't test scala 2.13.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
praveen kanamarlapudi - (kpraveen420@gmail.com) -
-
2023-10-23 12:02:13
-
-

*Thread Reply:* I believe our Scala 2.12 jobs are working fine. It's not databricks runtime. We run Spark on Kube.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-24 06:47:14
-
-

*Thread Reply:* Ok. I think You can raise an issue to support Scala 2.13 for latest Spark versions.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-10-26 06:13:40
-
-

Hi I want to customise the events which comes from Openlineage spark . Can some one give some information

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-26 07:45:41
-
-

*Thread Reply:* Hi @priya narayana, please get familiar with Extending section on our docs: https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#extending

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-10-26 09:53:07
-
-

*Thread Reply:* Okay thank you. Just checking any other docs or git code which also can help me

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:11:17
-
-

Hello Team

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:12:38
-
-

Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:14:02
-
-

*Thread Reply:* @Jakub Dardziński any thoughts?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 13:14:24
-
-

*Thread Reply:* what version of Airflow are you using?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:14:52
-
-

*Thread Reply:* 2.6.3 that satisfies the requirement

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 13:16:38
-
-

*Thread Reply:* is it possible you have some custom operator?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:17:15
-
-

*Thread Reply:* i think its the base operator causing the issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:17:36
-
-

*Thread Reply:* so no i believe

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 13:18:43
-
-

*Thread Reply:* BaseOperator is parent class for any other operators, it defines how to do deepcopy

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:19:11
-
-

*Thread Reply:* yeah so its controlled by Airflow itself, I didnt customize it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 13:19:49
-
-

*Thread Reply:* uhm, maybe it's possible you could share dag code? you may hide sensitive data

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:21:23
-
-

*Thread Reply:* let me try with lower versions of openlineage, what's say

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:21:39
-
-

*Thread Reply:* its a big jump from 0.24.0 to 1.4.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:22:25
-
-

*Thread Reply:* but i will help here to investigate this issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 13:24:03
-
-

*Thread Reply:* for me it seems that within dag or task you're defining some object that is not easy to copy

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:26:05
-
-

*Thread Reply:* possible, but with 0.24.0 that issue is not occurring, so worry is that the version upgrade could potentially break things

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 13:39:34
-
-

*Thread Reply:* 0.24.0 is not that old 🤔

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:45:07
-
-

*Thread Reply:* i see the issue with 0.24.0 I see it as warning -[airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/threading.py", line 932, in _bootstrap_inner -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - self.run() -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/threading.py", line 870, in run -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - self._target(**self._args, ****self._kwargs) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/home/upgrade/.local/lib/python3.8/site-packages/openlineage/airflow/listener.py", line 89, in on_running -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - task_instance_copy = copy.deepcopy(task_instance) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 172, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = _reconstruct(x, memo, **rv) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 270, in _reconstruct -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - state = deepcopy(state, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 172, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = _reconstruct(x, memo, **rv) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 270, in _reconstruct -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - state = deepcopy(state, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 153, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/home/upgrade/.local/lib/python3.8/site-packages/airflow/models/dag.py", line 2162, in __deepcopy__ -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - setattr(result, k, copy.deepcopy(v, memo)) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 153, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/home/upgrade/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 1224, in __deepcopy__ -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - setattr(result, k, copy.deepcopy(v, memo)) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 172, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = _reconstruct(x, memo, **rv) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 270, in _reconstruct -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - state = deepcopy(state, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 153, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/home/upgrade/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 1224, in __deepcopy__ -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - setattr(result, k, copy.deepcopy(v, memo)) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 161, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - rv = reductor(4) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - TypeError: cannot pickle 'module' object -but with 1.4.1 its stopped processing any further and threw error

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:18:08
-
-

*Thread Reply:* I see the difference of calling in these 2 versions, current versions checks if Airflow is >2.6 then directly runs on_running but earlier version was running on separate thread. IS this what's raising this exception?

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:24:49
-
-

*Thread Reply:* this is the issue - https://github.com/OpenLineage/OpenLineage/blob/c343835c1664eda94d5c315897ae6702854c81bd/integration/airflow/openlineage/airflow/listener.py#L89 while copying the task

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:25:21
-
-

*Thread Reply:* since we are directly running if version>2.6.0 therefore its throwing error in main processing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:28:02
-
-

*Thread Reply:* may i know which Airflow version we tested this process?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:28:39
-
-

*Thread Reply:* im on 2.6.3

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 14:30:53
-
-

*Thread Reply:* 2.1.4, 2.2.4, 2.3.4, 2.4.3, 2.5.2, 2.6.1 -usually there are not too many changes between minor versions

- -

I still believe it might be some code you might improve and probably is also an antipattern in airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:34:26
-
-

*Thread Reply:* hummm...that's a valid observation but I dont write DAGS, other teams do, so imagine if many people wrote such DAGS I can't ask everyone to change their patterns right? If something is running on current openlineage version with warning that should still be running on upgraded version isn't it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:38:04
-
-

*Thread Reply:* however I see ur point

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:49:52
-
-

*Thread Reply:* So that specific task has 570 line of query and pretty bulky query, let me split into smaller units

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:50:15
-
-

*Thread Reply:* that should help right? @Jakub Dardziński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 14:51:27
-
-

*Thread Reply:* query length shouldn’t be the issue, rather any python code

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 14:51:50
-
-

*Thread Reply:* I get your point too, we might figure out some mechanism to skip irrelevant parts of task instance so that it doesn’t fail then

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:52:12
-
-

*Thread Reply:* actually its failing on that task itself

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:52:33
-
-

*Thread Reply:* let me try it will be pretty quick

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:58:58
-
-

*Thread Reply:* @Jakub Dardziński but ur right we have to fix this at Openlineage side as well. Because ideally Openlineage shouldn't be causing any issue to the main DAG processing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 17:51:05
-
-

*Thread Reply:* it doesn’t break any airflow functionality, execution is wrapped into try/except block, only exception traceback is logged as you can see

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-10-27 05:25:54
-
-

*Thread Reply:* Can you migrate to Airflow 2.7 and use apache-airflow-providers-openlineage? Ideally we wouldn't make meaningful changes to openlineage-airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-27 11:35:44
-
-

*Thread Reply:* yup thats what im planning to do

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-27 13:59:03
-
-

*Thread Reply:* referencing to https://openlineage.slack.com/archives/C01CK9T7HKR/p1698398754823079?threadts=1698340358.557159&cid=C01CK9T7HKR|this conversation - what it takes to move to openlineage provider package from openlineage-airflow. Im updating Airflow to 2.7.2 but moving off of openlineage-airflow to provider package Im trying to estimate the amount of work it takes, any thoughts? reading changelogs I dont think its too much of a change but please share your thoughts and if somewhere its drafted please do share that as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-10-30 08:21:10
-
-

*Thread Reply:* Generally not much - I would maybe think of a operator coverage. For example, for BigQuery old openlineage-airflow supports BigQueryExecuteQueryOperator. However, new apache-airflow-providers-openlineage supports BigQueryInsertJobOperator - because it's intended replacement for BigQueryExecuteQueryOperator and Airflow community does not want to accept contributions to deprecated operators.

- - - -
- 🙏 harsh loomba -
- -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-31 15:00:38
-
-

*Thread Reply:* one question if someone is around - when im keeping both openlineage-airflow and apache-airflow-providers-openlineage in my requirement file, i see the following error - -from openlineage.airflow.extractors import Extractors -ModuleNotFoundError: No module named 'openlineage.airflow' -any thoughts?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-10-31 15:37:07
-
-

*Thread Reply:* I would usually do a pip freeze | grep openlineage as a sanity check to validate that the module is actually installed. Not sure how the provider and the module play together though

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-31 17:07:41
-
-

*Thread Reply:* yeah so @John Lukenoff im not getting how i can use the specific extractor when i run my operator. Say for example, I have custom datawarehouseOperator and i want to override getopenlineagefacetsonstart and getopenlineagefacetsoncomplete using the redshift extractor then how would i do that?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-10-27 05:49:25
-
-

Spark Integration Logs -Hey There -Are these events skipped because it's not supported or it's configured somewhere? -23/10/27 08:25:58 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart -23/10/27 08:25:58 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hitesh - (splicer9904@gmail.com) -
-
2023-10-27 08:12:32
-
-

Hi People, actually I want to intercept the OpenLineage spark events right after the job ends and before they are emitted, so that I can add some extra information to the events or remove some information that I don't want. -Is there any way of doing this? Can someone please help me

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-30 09:03:57
-
-

*Thread Reply:* It general, I think this kind of use case is probably best served by facets, but what do you think @Paweł Leszczyński?

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:01:12
-
-

Hello, has anyone run into similar error as posted in this github open issues[https://github.com/MarquezProject/marquez/issues/2468] while setting up marquez on an EC2 Instance, would appreciate any help to get past the errors

-
- - - - - - - -
-
Comments
- 5 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:04:30
-
-

*Thread Reply:* Hmm, have you looked over our Running on AWS docs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:06:08
-
-

*Thread Reply:* More specifically, the AWS RDS section. How are you deploying Marquez on Ec2?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:08:05
-
-

*Thread Reply:* we were primarily referencing this document on git - https://github.com/MarquezProject/marquez

-
- - - - - - - -
-
Website
- <https://marquezproject.ai> -
- -
-
Stars
- 1450 -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:09:05
-
-

*Thread Reply:* leveraged docker and docker-compose

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:13:10
-
-

*Thread Reply:* hmm so you’re running docker-compose up on an Ec2 instance you’ve ssh’d into? (just trying to understand your setup better)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:13:26
-
-

*Thread Reply:* yes, thats correct

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:16:39
-
-

*Thread Reply:* I’ve only used docker compose for local dev or integration tests. but, ok you’re probably in the PoC phase. Can you run the docker cmd on you local machine successfully? What OS is stalled on the Ec2 instance?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:18:00
-
-

*Thread Reply:* yes, i can run and the OS is Ubuntu 20.04.6 LTS

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:19:27
-
-

*Thread Reply:* we initiallly ran into a permission denied error related to postgressql.conf file and we had to update file permissions to 777 and after which we started to see below errors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:19:36
-
-

*Thread Reply:* marquez-db | 2023-10-27 20:35:52.512 GMT [35] FATAL: no pghba.conf entry for host "172.18.0.5", user "marquez", database "marquez", no encryption - marquez-db | 2023-10-27 20:35:52.529 GMT [36] FATAL: no pghba.conf entry for host "172.18.0.5", user "marquez", database "marquez", no encryption

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:20:12
-
-

*Thread Reply:* we then manually updated pg_hba.conf file to include host user and db details

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:20:42
-
-

*Thread Reply:* Did you also update the marquez.yml with the db user / password?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:20:48
-
-

*Thread Reply:* after which we started to see the errors posted in the github open issues page

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:21:33
-
-

*Thread Reply:* hmm are you using an external database or are you spinning up the entire Marquez stack with docker compose?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:21:56
-
-

*Thread Reply:* we are spinning up the entire Marquez stack with docker compose

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:23:24
-
-

*Thread Reply:* we did not change anything in the marquez.yml, i think we did not find that file in the github repo that we cloned into our local instance

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:26:31
-
-

*Thread Reply:* It’s important that the init-db.sh script runs, but I don’t think it is

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:26:56
-
-

*Thread Reply:* can you grab all the docker compose logs and share them? it’s hard to debug otherwise

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:29:59
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:33:15
-
-

*Thread Reply:* I would first suggest to remove the --build flag since you are specifying a version of Marquez to use via --tag

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:33:49
-
-

*Thread Reply:* no the issue per se, but will help clear up some of the logs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:35:06
-
-

*Thread Reply:* for sure thanks. we could get the logs without the --build portion, we tried with that option just once

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:35:40
-
-

*Thread Reply:* the errors were the same with/without --build option

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:36:02
-
-

*Thread Reply:* marquez-api | ERROR [2023-10-27 21:34:58,019] org.apache.tomcat.jdbc.pool.ConnectionPool: Unable to create initial connections of pool. - marquez-api | ! org.postgresql.util.PSQLException: FATAL: password authentication failed for user "marquez" - marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:693) - marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:203) - marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258) - marquez-api | ! at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54) - marquez-api | ! at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:253) - marquez-api | ! at org.postgresql.Driver.makeConnection(Driver.java:434) - marquez-api | ! at org.postgresql.Driver.connect(Driver.java:291) - marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:346) - marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:227) - marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:768) - marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:696) - marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.init(ConnectionPool.java:495) - marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.<init>(ConnectionPool.java:153) - marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.pCreatePool(DataSourceProxy.java:118) - marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.createPool(DataSourceProxy.java:107) - marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:131) - marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcUtils.openConnection(JdbcUtils.java:48) - marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcConnectionFactory.<init>(JdbcConnectionFactory.java:75) - marquez-api | ! at org.flywaydb.core.FlywayExecutor.execute(FlywayExecutor.java:147) - marquez-api | ! at org.flywaydb.core.Flyway.info(Flyway.java:190) - marquez-api | ! at marquez.db.DbMigration.hasPendingDbMigrations(DbMigration.java:73) - marquez-api | ! at marquez.db.DbMigration.migrateDbOrError(DbMigration.java:27) - marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:105) - marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:48) - marquez-api | ! at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:67) - marquez-api | ! at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:98) - marquez-api | ! at io.dropwizard.cli.Cli.run(Cli.java:78) - marquez-api | ! at io.dropwizard.Application.run(Application.java:94) - marquez-api | ! at marquez.MarquezApp.main(MarquezApp.java:60) - marquez-api | INFO [2023-10-27 21:34:58,024] marquez.MarquezApp: Stopping app...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:38:52
-
-

*Thread Reply:* debugging docker issues like this is so difficult

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:40:44
-
-

*Thread Reply:* it could be a number of things, but you are connected to the database it’s just that the marquez user hasn’t been created

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:41:59
-
-

*Thread Reply:* the /init-db.sh is what manages user creation

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:42:17
-
-

*Thread Reply:* so it’s possible that the script isn’t running for whatever reason on your Ec2 instance

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:44:20
-
-

*Thread Reply:* do you have other services running on that Ec2 instance? Like, other than Marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:44:52
-
-

*Thread Reply:* is there a postgres process running outside of docker?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 20:34:50
-
-

*Thread Reply:* no other services except marquez on this EC2 instance

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 20:35:49
-
-

*Thread Reply:* this was a new Ec2 instance that was spun up to install and use marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 20:36:09
-
-

*Thread Reply:* n we can confirm that no postgres process runs outside of docker

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-29 03:06:28
-
-

I realize in Spark 3.4+, some job ids don't have a start event. What part of the code is responsible for triggering the START and COMPLETE event

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-30 09:59:53
-
-

*Thread Reply:* hi @Jason Yip could you provide an example of such a job?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-30 16:51:55
-
-

*Thread Reply:* @Paweł Leszczyński same old:

- -

delete the old table if needed

- -

_ = spark.sql('DROP TABLE IF EXISTS transactions')

- -

expected structure of the file

- -

transactionsschema = StructType([ - StructField('householdid', IntegerType()), - StructField('basketid', LongType()), - StructField('day', IntegerType()), - StructField('productid', IntegerType()), - StructField('quantity', IntegerType()), - StructField('salesamount', FloatType()), - StructField('storeid', IntegerType()), - StructField('discountamount', FloatType()), - StructField('transactiontime', IntegerType()), - StructField('weekno', IntegerType()), - StructField('coupondiscount', FloatType()), - StructField('coupondiscountmatch', FloatType()) - ])

- -

read data to dataframe

- -

df = (spark - .read - .csv( - adlsRootPath + '/examples/data/csv/completejourney/transactiondata.csv', - header=True, - schema=transactionsschema))

- -

df.write\ - .format('delta')\ - .mode('overwrite')\ - .option('overwriteSchema', 'true')\ - .option('path', adlsRootPath + '/examples/data/csv/completejourney/silver/transactions')\ - .saveAsTable('transactions')

- -

df.count()

- -

# create table object to make delta lake queryable

- -

_ = spark.sql(f'''

- -

CREATE TABLE transactions

- -

USING DELTA

- -

LOCATION '{adlsRootPath}/examples/data/csv/completejourney/silver/transactions'

- -

''')

- -

show data

- -

display( - spark.table('transactions') - )

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-10-30 18:51:43
-
-

👋 Hi team, cross-posting from the Marquez Channel in case anyone here has a better idea of the spec

- -

> For most of our lineage extractors in airflow, we are using the rust sql parser from openlineage-sql to extract table lineage via sql statements. When errors occur we are adding an extractionError run facet similar to what is being done here. I’m finding in the case that multiple statements were extracted but one failed to parse while many others were successful, the lineage for these runs doesn’t appear as expected in Marquez. Is there any logic around the extractionError run facet that could be causing this? It seems reasonable to assume that we might take this to mean the entire run event is invalid if we have any extraction errors. -> -> I would still expect to see the other lineage we sent for the run but am instead just seeing the extractionError in the marquez UI, in the database, runs with an extractionError facet don’t seem to make it to the job_versions_io_mapping table

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-10-31 06:34:05
-
-

*Thread Reply:* Can you show the actual event? Should be in the events tab in Marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-31 11:59:07
-
-

*Thread Reply:* @John Lukenoff, would you mind posting the link to Marquez teams slack channel?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-10-31 12:15:37
-
-

*Thread Reply:* yep here is the link: https://marquezproject.slack.com/archives/C01E8MQGJP7/p1698702140709439

- -

This is the full event, sanitized of internal info: -{ - "job": { - "name": "some_dag.some_task", - "facets": {}, - "namespace": "default" - }, - "run": { - "runId": "a9565df2-f1a1-3ee3-b202-7626f8c4b92d", - "facets": { - "extractionError": { - "errors": [ - { - "task": "ALTER SESSION UNSET QUERY_TAG;", - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.24.0/client/python>", - "_schemaURL": "<https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet>", - "taskNumber": 0, - "errorMessage": "Expected one of TABLE or INDEX, found: SESSION" - } - ], - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.24.0/client/python>", - "_schemaURL": "<https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ExtractionErrorRunFacet>", - "totalTasks": 1, - "failedTasks": 1 - } - } - }, - "inputs": [ - { - "name": "foo.bar", - "facets": {}, - "namespace": "snowflake" - }, - { - "name": "fizz.buzz", - "facets": {}, - "namespace": "snowflake" - } - ], - "outputs": [ - { "name": "foo1.bar2", "facets": {}, "namespace": "snowflake" }, - { - "name": "fizz1.buzz2", - "facets": {}, - "namespace": "snowflake" - } - ], - "producer": "<https://github.com/MyCompany/repo/blob/next-master/company/data/pipelines/airflow_utils/openlineage_utils/client.py>", - "eventTime": "2023-10-30T02:46:13.367274Z", - "eventType": "COMPLETE" -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-31 12:43:07
-
-

*Thread Reply:* thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-31 13:14:29
-
-

*Thread Reply:* @John Lukenoff, sorry to trouble again, is the slack channel still active? for whatever reason i cant get to this workspace

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-10-31 13:15:26
-
-

*Thread Reply:* yep it’s still active, maybe you need to join the workspace first? https://join.slack.com/t/marquezproject/shared_invite/zt-266fdhg9g-TE7e0p~EHK50GJMMqNH4tg

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-31 13:25:51
-
-

*Thread Reply:* that was a good call. the link you just shared worked! thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-10-31 13:27:55
-
-

*Thread Reply:* yeah from OL perspective this looks good - the inputs and outputs are there, the extraction error facet looks like it should

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-10-31 13:28:05
-
-

*Thread Reply:* must be some Marquez hiccup 🙂

- - - -
- 👍 John Lukenoff -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-10-31 13:28:45
-
-

*Thread Reply:* Makes sense, I’ll tail my marquez logs today to see if I can find anything

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-11-01 19:37:06
-
-

*Thread Reply:* Somehow this started working after we switched from our beta to prod infrastructure. I suspect something was failing due to constraints on the size of our db and the load of poor quality data it was under after months of testing against it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-01 11:34:43
-
-

@channel -I’m opening a vote to release OpenLineage 1.5.0, including: -• support for Cassandra Connectors lineage in the Flink integration -• support for Databricks Runtime 13.3 in the Spark integration -• support for rdd and toDF operations from the Spark Scala API in Spark -• lowered requirements for attrs and requests packages in the Airflow integration -• lazy rendering of yaml configs in the dbt integration -• bug fixes, tests, infra fixes, doc changes, and more. -Three +1s from committers will authorize an immediate release.

- - - -
- ➕ Jakub Dardziński, William Angel, Abdallah, Willy Lulciuc, Paweł Leszczyński, Julien Le Dem -
- -
- 👍 Jason Yip -
- -
- 🚀 Luca Soato, tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-02 05:11:58
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated within 2 business days.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-01 13:29:09
-
-

@channel -The October 2023 issue of OpenLineage News is available now! to get in directly in your inbox each month.

-
-
apache.us14.list-manage.com
- - - - - - - - - - - - - - - -
- - - -
- 👍 Mars Lan, harsh loomba -
- -
- 🎉 tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-11-01 19:40:39
-
-

Hi team 👋 , we’re finding that for our Spark jobs we are almost always getting some junk characters in our dataset names. We’ve pushed the regex filter to its limits and would like to extend the logic of deriving the dataset name in openlineage-spark (currently on 1.4.1). I seem to recall hearing we could do this by implementing our own LogicalPlanVisitor or something along those lines? Is that still the recommended approach and if so would this be possible to implement in Scala vs. Java (scala noob here 🙂)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-02 03:34:15
-
-

*Thread Reply:* Hi John, we're always happy to help with the contribution.

- -

One of the possible solutions to this would be to do that just in openlineage-java client: -• introduce config entry like normalizeDatasetNameToAscii : enabled/disabled -• modify DatasetIdentifier class to contain static member boolean normalizeDatasetNameToAscii and normalize dataset name according to this setting -• additionally, you would need to add config entry in io.openlineage.client.OpenLineageYaml and make sure both loadOpenLineageYaml methods set DatasetIdentifier.normalizeDatasetNameToAscii based on the config -• document this in the doc -So, no Scala nor custom logical plan visitors required.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-02 03:34:47
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/blob/main/client/java/src/main/java/io/openlineage/client/utils/DatasetIdentifier.java

-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 John Lukenoff -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mike Fang - (fangmik@amazon.com) -
-
2023-11-01 20:30:38
-
-

I am looking to send OpenLineage events to an AWS API Gateway endpoint from an AWS MWAA instance. The problem is that all requests to AWS services need to be signed with SigV4, and using API Gateway with IAM authentication would require requests to API Gateway be signed with SigV4. Would the best way to do so be to just modify the python client HTTP transport to include a new config option for signing emitted OpenLineage events with SigV4? Are there any alternatives?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-02 02:41:50
-
-

*Thread Reply:* there’s actually an issue for that: -https://github.com/OpenLineage/OpenLineage/issues/2189

- -

but the way to do this is imho to create new custom transport (it might inherit from HTTP transport) and register it in transport factory

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike Fang - (fangmik@amazon.com) -
-
2023-11-02 13:05:05
-
-

*Thread Reply:* I am thinking of just modifying the HTTP transport and using requests.auth.AuthBase to create different auth methods instead of a TokenProvider class

- -

Classes which subclass requests.auth.AuthBase can also just directly be given to the requests call in the auth parameter

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-02 14:40:24
-
-

*Thread Reply:* would you like to contribute? 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike Fang - (fangmik@amazon.com) -
-
2023-11-02 14:43:05
-
-

*Thread Reply:* I was about to contribute, but I actually just realized that there is an existing way to provide a custom transport that would solve form y use case. My only question is how do I register this custom transport in my MWAA environment? Can I provide the custom transport as an Airflow plugin and then specify the class in the Openlineage.yml config? Will it automatically pick it up?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-02 15:45:56
-
-

*Thread Reply:* although I did not test this in MWAA but locally only: I’ve created Airflow plugin that in __init__.py has defined (or imported) following code: -```from openlineage.client.transport import register_transport, Transport, Config

- -

@register_transport -class FakeTransport(Transport): - kind = "fake" - config = Config

- -
def __init__(self, config: Config) -> None:
-    print(config)
-
-def emit(self, event) -> None:
-    print(event)```
-
- -

setting AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "fake"}' does take effect and I can see output in Airflow logs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-02 15:47:45
-
-

*Thread Reply:* in setup.py it’s: -..., - entry_points={ - 'airflow.plugins': [ - 'custom_transport = custom_transport:CustomTransportPlugin', - ], - }, - install_requires=["openlineage-python"] -)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike Fang - (fangmik@amazon.com) -
-
2023-11-03 12:52:55
-
-

*Thread Reply:* ok great thanks for following up on this, super helpful

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-02 12:00:00
-
-

@channel -We released OpenLineage 1.5.0, including: -• support for Cassandra Connectors lineage in the Flink integration by @Peter Huang -• support for Databricks Runtime 13.3 in the Spark integration by @Paweł Leszczyński -• support for rdd and toDF operations from the Spark Scala API in Spark by @Paweł Leszczyński -• lowered requirements for attrs and requests packages in the Airflow integration by @Jakub Dardziński -• lazy rendering of yaml configs in the dbt integration by @Jakub Dardziński -• bug fixes, tests, infra fixes, doc changes, and more. -Thanks to all the contributors, including new contributor @Sophie LY! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.5.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.4.1...1.5.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👍 Jason Yip, Sophie LY, Tristan GUEZENNEC -CROIX-, Mars Lan, Sangeeta Mishra -
- -
- 🚀 tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-02 14:49:18
-
-

@Paweł Leszczyński I tested 1.5.0, it works great now, but the environment facets is gone in START... which I very much want it.. any thoughts?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-03 04:18:11
-
-

actually, it shows up in one of the RUNNING now... behavior is consistent between 11.3 and 13.3, thanks for fixing this issue

- - - -
- 👍 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-04 15:44:22
-
-

*Thread Reply:* @Paweł Leszczyński looks like I need to bring bad news.. 13.3 is fixed for specific scenarios, but 11.3 is still reading output as dbfs.. there are scenarios that it's not producing input and output like:

- -

create table table using delta as -location 'abfss://....' -Select ** from parquet.`abfss://....'

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-04 15:44:31
-
-

*Thread Reply:* Will test more and ope issues

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-11-06 05:34:33
-
-

*Thread Reply:* @Jason Yiphow did you manage the get the environment attribute. it's not showing up to me at all. I've tried databricks abut also tried a local instance of spark.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-07 18:32:02
-
-

*Thread Reply:* @Rodrigo Maia its showing up in one of the RUNNING events, not in the START event anymore

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-11-08 03:04:32
-
-

*Thread Reply:* I never had a running event 🫠 Am I filtering something?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-08 13:03:26
-
-

*Thread Reply:* Umm.. ok show me your code, will try on my end

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-08 14:26:06
-
-

*Thread Reply:* @Paweł Leszczyński @Rodrigo Maia actually if you are using UC-enabled cluster, you won't get any RUNNING events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-03 12:00:07
-
-

@channel -This month’s TSC meeting (open to all) is next Thursday the 9th at 10am PT. On the agenda: -• announcements -• recent releases -• recent additions to the Flink integration by @Peter Huang -• recent additions to the Spark integration by @Paweł Leszczyński -• updates on proposals by @Julien Le Dem -• discussion topics -• open discussion -More info and the meeting link can be found on the website. All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 harsh loomba -
- -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-11-04 07:08:10
-
-

Hi Team , we are trying to customize the events by writing custom lineage listener extending OpenLineageSparkListener, but would need some direction how to capture the events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-04 07:11:46
-
-

*Thread Reply:* https://openlineage.slack.com/archives/C01CK9T7HKR/p1698315220142929 -Do you need some more guidance than that?

-
- - -
- - - } - - priya narayana - (https://openlineage.slack.com/team/U062Q95A1FG) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-11-04 07:13:47
-
-

*Thread Reply:* yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-04 07:15:21
-
-

*Thread Reply:* It seems pretty extensively described, what kind of help do you need?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-11-04 07:16:13
-
-

*Thread Reply:* io.openlineage.spark.api.OpenLineageEventHandlerFactory if i use this how will i pass custom listener to my spark submit

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-11-04 07:17:25
-
-

*Thread Reply:* I would like to know how will i customize my events using this . For example: - In "input" Facet i want only symlinks name i am not intereseted in anything else

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-11-04 07:17:32
-
-

*Thread Reply:* can you please provide some guidance

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-11-04 07:18:36
-
-

*Thread Reply:* @Jakub Dardziński this is the doubt i have

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-11-04 08:17:25
-
-

*Thread Reply:* Some one who did spark integration throw some light

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-04 08:21:22
-
-

*Thread Reply:* it's weekend for most of us so you probably need to wait until Monday for precise answers

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Goss - (david.goss@matillion.com) -
-
2023-11-06 04:03:42
-
-

👋 I raised a PR https://github.com/OpenLineage/OpenLineage/pull/2223 off the back of some Marquez conversations a while back to try and clarify how names of Snowflake objects should be expressed in OL events. I used Snowflake’s OL view as a guide, but also I appreciate there are other OL producers that involve Snowflake too (Airflow? dbt?). Any feedback on this would be appreciated!

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - -
-
Stars
- 11 -
- -
-
Last updated
- 3 months ago -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Goss - (david.goss@matillion.com) -
-
2023-11-08 10:42:35
-
-

*Thread Reply:* Thanks for merging this @Maciej Obuchowski!

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-11-06 05:22:03
-
-

Hey team! 👋

- -

We're trying to use openlineage-flink, and would like provide the openlineage.transport.type=http and configure other transport configs, but we're not able to find sufficient docs (tried this doc) on where/how these configs can be provided.

- -

For example, in spark, the changes mostly were delegated to the spark-submit command like -spark-submit --conf "spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener" \ - --packages "io.openlineage:openlineage_spark:&lt;spark-openlineage-version&gt;" \ - --conf "spark.openlineage.transport.url=http://{openlineage.client.host}/api/v1/namespaces/spark_integration/" \ - --class com.mycompany.MySparkApp my_application.jar -And the OpenLineageSparkListener has a method to retrieve the provided spark confs as an object in the ArgumentParser. Similarly, looking for some pointers on how the openlineage.transport configs can be provided to OpenLineageFlinkJobListener & how the flink listener parses/uses these configs

- -

TIA! 😄

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-07 05:56:09
-
-

*Thread Reply:* similarly to spark config, you can use flink config

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-11-07 22:36:53
-
-

*Thread Reply:* @Maciej Obuchowski - Got it. Our use-case is that we're trying to build a wrapper on top of openlineage-flink for productionising for our flink jobs.

- -

We're trying to have a wrapper class that extends OpenLineageFlinkJobListener class, and overwrites the HTTP transport endpoint/url to a constant value (say, example.com and /api/v1/flink). But we see that the OpenLineageFlinkJobListener constructor is defined as a private constructor - just wanted to check with the team whether it was just a default scope, or intended to be private. If it was just a default scope, can we contribute a PR to make it public, to make it friendly for teams trying to adopt & extend openlineage?

- -

And also, we wanted to understand better on where we're reading the HTTP transport endpoint/url configs in OpenLineageFlinkJobListener and what'd be the best place to override it to the constant endpoint/url for our use-case

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-08 05:55:43
-
-

*Thread Reply:* We parse flink conf to get that information: https://github.com/OpenLineage/OpenLineage/blob/26494b596e9669d2ada164066a73c44e04[…]ink/src/main/java/io/openlineage/flink/client/EventEmitter.java

- -

> But we see that the OpenLineageFlinkJobListener constructor is defined as a private constructor - just wanted to check with the team whether it was just a default scope, or intended to be private. -The way to construct is is a public builder in the same class

- -

I think easier way than wrapper class would be use existing flink configuration, or to set up OPENLINEAGE_URL env variable, or have openlineage.yml config file - not sure why this is the way you've chosen?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-11-09 12:41:02
-
-

*Thread Reply:* > I think easier way than wrapper class would be use existing flink configuration, or to set up OPENLINEAGE_URL env variable, or have openlineage.yml config file - not sure why this is the way you've chosen? -@Maciej Obuchowski - The reasoning behind going with a wrapper class is that we can abstract out the nitty-gritty like how/where we're publishing openlineage events etc - especially for companies that have a lot of teams that may be adopting openlineage.

- -

For example, if we wanna move away from http transport to kafka transport - we'd be changing only this wrapper class and ask folks to update their wrapper class dependency version. If we went without the wrapper class, then the exact config changes would need to be synced and done by many different teams, who may not have enough context.

- -

Similarly, if we wanna enable some other default best-practise configs, or inject any company-specific configs etc, the wrapper would be useful in abstracting out the details and be the 1 place that handles all openlineage related integrations for any future changes.

- -

That's why we wanna extend openlineage's listener class & leverage most of the OSS code as-is; and at the same time, have the ability to extend & inject customisations. I think that's where some things like having getters for the class object attributes, or having public constructors would be really helpful 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-09 13:03:56
-
-

*Thread Reply:* @Athitya Kumar that makes sense. Feel free to provide PR adding getters and stuff.

- - - -
- 🎉 Athitya Kumar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Libert - (yannick.libert.partner@decathlon.com) -
-
2023-11-07 06:03:49
-
-

Hi all, we (I work with @Sophie LY and @Abdallah) have a quick question regarding the spark integration: -if a spark app contains several jobs, they will be named "mysparkappname.job1" and "mysparkappname.job2" -eg: -sparkjob.collectlimit -sparkjob.mappartitionsparallelcollection

- -

If I understood correctly, the spark integration maps one Spark job to a single OpenLineage Job, and the application itself should be assigned a Run id at startup and each job that executes will report the application's Run id as its parent job run (taken from: https://openlineage.io/docs/integrations/spark/).

- -

In our case, the app Run Id is never created, and the jobs runs don't contain any parent facets. We tested it with a recent integration version in 1.4.1 and also an older one (0.26.0). -Did we miss something in the OL spark integration config?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-07 06:07:51
-
-

*Thread Reply:* hey, a name of the output dataset should be put at the end of the job name. This was introduced to help with jobs that call multiple spark actions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Libert - (yannick.libert.partner@decathlon.com) -
-
2023-11-07 07:05:52
-
-

*Thread Reply:* Hi Paweł, -Thanks for your answer, yes indeed with the newer version of OL, we automatically have the name of the output dataset at the end of the job name, but no App run id, nor any parent run facet.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-07 08:16:44
-
-

*Thread Reply:* yes, you're right. I mean you can set in config spark.openlineage.parentJobName which will be shared through whole app run, but this needs to be set manually

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Libert - (yannick.libert.partner@decathlon.com) -
-
2023-11-07 08:36:58
-
-

*Thread Reply:* I see, thanks a lot for your reply we'll try that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-11-07 10:49:25
-
-

if I have a dataset on adls gen2 which synapse connects to as an external delta table, is that the use case of a symlink dataset? the delta table is connected to by PBI and by Synapse, but the underlying data is exactly the same

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-08 10:49:04
-
-

*Thread Reply:* Sounds like it, yes - if the logical dataset names are different but physical one is the same

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-11-08 12:38:52
-
-

Has anyone here tried OpenLineage with Spark on Amazon EMR?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-08 13:01:16
-
-

*Thread Reply:* No but it should work the same I tried on AWS and Google Colab and Azure

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tristan GUEZENNEC -CROIX- - (tristan.guezennec@decathlon.com) -
-
2023-11-09 03:10:54
-
-

*Thread Reply:* Yes. @Abdallah could provide some details if needed.

- - - -
- 👍 Abdallah -
- -
- 🔥 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-11-20 11:29:26
-
-

*Thread Reply:* Thanks @Tristan GUEZENNEC -CROIX- -HI @Abdallah i was able to set up a spark cluster on AWS EMR but im struggling to configure the OL Listener. Ive tried with steps and bootstrap actions for the jar and it didn't work out. How did you manage to include the jar? Besides, what about the spark configuration? Could you send me a sample of these configs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-08 12:44:54
-
-

@channel -Friendly reminder: this month’s TSC meeting, open to all, is tomorrow at 10 am PT: https://openlineage.slack.com/archives/C01CK9T7HKR/p1699027207361229

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-10 15:25:45
-
-

@Paweł Leszczyński regarding to https://github.com/OpenLineage/OpenLineage/issues/2124, OL is parsing out the table location in Hive metastore, it is the location of the table in the catalog and not the physical location of the data. It is both right and wrong because it is a table, just it is an external table.

- -

https://docs.databricks.com/en/sql/language-manual/sql-ref-external-tables.html

-
-
docs.databricks.com
- - - - - - - - - - - - - - - - - -
-
- - - - - - - -
-
Labels
- integration/spark, integration/databricks -
- -
-
Comments
- 5 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-10 15:32:28
-
-

*Thread Reply:* Here's for more reference: https://dilorom.medium.com/finding-the-path-to-a-table-in-databricks-2c74c6009dbb

-
-
Medium
- - - - - - -
-
Reading time
- 2 min read -
- - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-11 03:29:33
-
-

@Paweł Leszczyński this is why if create a table with adls location it won't show input and output:

- -

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/spark35/src[…]k35/agent/lifecycle/plan/CreateReplaceOutputDatasetBuilder.java

- -

Because the catalog object is not there.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-11 03:30:44
-
-

Databricks needs to be re-written in a way that supports Databricks it seems like

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-13 03:00:42
-
-

@Paweł Leszczyński I went back to 1.4.1, output does show adls location. But environment facet is gone in 1.4.1. It shows up in 1.5.0 but namespace is back to dbfs....

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-13 03:18:37
-
-

@Paweł Leszczyński I diff CreateReplaceDatasetBuilder.java and CreateReplaceOutputDatasetBuilder.java and they are the same except for the class name, so I am not sure what is causing the change. I also realize you don't have a test case for ADLS

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-13 04:52:07
-
-

*Thread Reply:* Thanks @Jason Yip for your engagement in finding the cause and solution to this issue.

- -

Among the technical problems, another problem here is that our databricks integration tests are run on AWS and the issue you describe occurs in Azure. I would consider this a primary issue as it is difficult for me to verify the behaviour you describe and fix it with a failing integration test at the start.

- -

Are you able to reproduce the issue on AWS Databricks environment so that we could include it in our integration tests and make sure the behvaiour will not change later on in future?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-13 18:06:44
-
-

*Thread Reply:* I didn't know Azure and AWS Databricks are different. Let me try it on AWS as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Naresh reddy - (naresh.naresh36@gmail.com) -
-
2023-11-15 07:17:24
-
-

Hi -Can anyone point me to the deck on how Airflow can be integrated using Openlineage?

- - - -
-
-
-
- - - - - - - - - - - -
-
- - - - -
- -
Naresh reddy - (naresh.naresh36@gmail.com) -
-
2023-11-15 07:27:55
-
-

*Thread Reply:* thank you @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Naresh reddy - (naresh.naresh36@gmail.com) -
-
2023-11-15 11:09:24
-
-

Can anyone tell me why OL is better than other competitors if you can provide an analysis that would be great

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-11-16 11:46:16
-
-

*Thread Reply:* Hey @Naresh reddy can you help me understand what you mean by competitors? -OL is a specification that can be used to solve various problems, so if you have a clear problem statement, maybe I can help with pros/cons for that problem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Naresh reddy - (naresh.naresh36@gmail.com) -
-
2023-11-15 11:10:58
-
-

what are the pros and cons of OL. we often talk about positives to market it but what are the pain points using OL,how it's addressing user issues?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-16 13:38:42
-
-

*Thread Reply:* Hi @Naresh reddy, thanks for your question. We’ve heard that OpenLineage is attractive because of its desirable integrations, including a best-in-class Spark integration, its extensibility, the fact that it’s not destructive, and the fact that it’s open source. I’m not aware of pain points per se, but there are certainly features and integrations that we wish we could focus on but can’t at the moment — like the Dagster integration, which needs a new maintainer. OpenLineage is like any other open standard in that ecosystem coverage is a constant process rather than a journey, and it requires contributions in order to get close to 100%. Thankfully, we are gaining users and contributors all the time, and integrations are being added or improved upon daily. See the Ecosystem page on the website for a list of consumers and producers and links to more resources, and check out the GitHub repo for the codebase, commit history, contributors, governance procedures, and more. We’re quick to respond to messages here and issues on GitHub — usually within one day.

-
- - - - - - - -
-
Website
- <http://openlineage.io> -
- -
-
Stars
- 1449 -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
karthik nandagiri - (karthik.nandagiri@gmail.com) -
-
2023-11-19 23:57:38
-
-

Hi So we can use openlineage to identify column level lineage with Airflow , Spark? will it also allow to connect to Power BI and derive the downstream column lineage ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-20 06:07:36
-
-

*Thread Reply:* Yes, it works with Airflow and Spark - there is caveat that amount of operators that support it on Airflow side is fairly small and limited generally to most popular SQL operators. -> will it also allow to connect to Power BI and derive the downstream column lineage ? -No, there is no such feature yet 🙂 -However, there's nothing preventing this - if you wish to work on such implementation, we'd be happy to help.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
karthik nandagiri - (karthik.nandagiri@gmail.com) -
-
2023-11-21 00:20:11
-
-

*Thread Reply:* Thank you Maciej Obuchowski for the update. Currently we are looking out for a tool which can support connecting to Power Bi and pull column level lineage information for reports and dashboards. How this can be achieved with OL ? Can you give some idea?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-21 07:59:10
-
-

*Thread Reply:* I don't think I can help you with that now, unless you want to work on your own integration with PowerBI 🙁

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rafał Wójcik - (rwojcik@griddynamics.com) -
-
2023-11-21 07:02:08
-
-

Hi Everyone, first of all - big shout to all contributors - You do amazing job here. -I want to use OpenLineage in our project - to do so I want to setup some POC and experiment with possibilities library provides - I start working on sample from the conference talk: https://github.com/getindata/openlineage-bbuzz2023-column-lineage but when I go into spark transformation after staring context with openlineage I have issues with SessionHiveMetaStoreClient on section 3- does anyone has other plain sample to play with, to not setup everything from scratch?

-
- - - - - - - -
-
Language
- Jupyter Notebook -
- -
-
Last updated
- 5 months ago -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-21 07:37:00
-
-

*Thread Reply:* Can you provide details about those issues? Like exceptions, logs, details of the jobs and how do you run them?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rafał Wójcik - (rwojcik@griddynamics.com) -
-
2023-11-21 07:45:37
-
-

*Thread Reply:* Hi @Maciej Obuchowski - I rerun docker container after deleting metadata_db folder possibly created by other local test, and fix this one but got problem with OpenLineageListener - during initialization of spark: -while I execute: -spark = (SparkSession.builder.master('local') - .appName('Food Delivery') - .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') - .config('spark.jars', '&lt;local-path&gt;/openlineage-spark-0.27.2.jar,&lt;local-path&gt;/postgresql-42.6.0.jar') - .config('spark.openlineage.transport.type', 'http') - .config('spark.openlineage.transport.url', '<http://api:5000>') - .config('spark.openlineage.facets.disabled', '[spark_unknown;spark.logicalPlan]') - .config('spark.openlineage.namespace', 'food-delivery') - .config('spark.sql.warehouse.dir', '/tmp/spark-warehouse/') - .config("spark.sql.repl.eagerEval.enabled", True) - .enableHiveSupport() - .getOrCreate()) -I got -Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. -: org.apache.spark.SparkException: Exception when registering SparkListener - at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2563) - at org.apache.spark.SparkContext.&lt;init&gt;(SparkContext.scala:643) - at org.apache.spark.api.java.JavaSparkContext.&lt;init&gt;(JavaSparkContext.scala:58) - at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) - at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) - at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) - at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) - at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480) - at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) - at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) - at py4j.Gateway.invoke(Gateway.java:238) - at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) - at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) - at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) - at py4j.ClientServerConnection.run(ClientServerConnection.java:106) - at java.base/java.lang.Thread.run(Thread.java:833) -Caused by: java.lang.ClassNotFoundException: io.openlineage.spark.agent.OpenLineageSparkListener - at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445) - at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:587) - at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) - at java.base/java.lang.Class.forName0(Native Method) - at java.base/java.lang.Class.forName(Class.java:467) - at org.apache.spark.util.Utils$.classForName(Utils.scala:218) - at org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:2921) - at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293) - at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) - at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) - at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) - at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293) - at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290) - at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108) - at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2919) - at org.apache.spark.SparkContext.$anonfun$setupAndStartListenerBus$1(SparkContext.scala:2552) - at org.apache.spark.SparkContext.$anonfun$setupAndStartListenerBus$1$adapted(SparkContext.scala:2551) - at scala.Option.foreach(Option.scala:407) - at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2551) - ... 15 more -looks like by some reasons jars are not loaded - need to look into it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-21 07:58:09
-
-

*Thread Reply:* 🤔 Jars are added during image building: https://github.com/getindata/openlineage-bbuzz2023-column-lineage/blob/main/Dockerfile#L12C1-L12C29

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-21 07:58:28
-
-

*Thread Reply:* are you sure &lt;local-path&gt; is right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rafał Wójcik - (rwojcik@griddynamics.com) -
-
2023-11-21 08:00:49
-
-

*Thread Reply:* yes, it's same as in sample - wondering why it's not get added: -```from pyspark.sql import SparkSession

- -

spark = (SparkSession.builder.master('local') - .appName('Food Delivery') - .config('spark.jars', '/home/jovyan/jars/openlineage-spark-0.27.2.jar,/home/jovyan/jars/postgresql-42.6.0.jar') - .config('spark.sql.warehouse.dir', '/tmp/spark-warehouse/') - .config("spark.sql.repl.eagerEval.enabled", True) - .enableHiveSupport() - .getOrCreate())

- -

print(spark.sparkContext._jsc.sc().listJars())

- -

Vector()```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-21 08:04:31
-
-

*Thread Reply:* can you make sure jars are in this directory? just by docker run --entrypoint /usr/local/bin/bash IMAGE_NAME "ls /home/jovyan/jars"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-21 08:06:27
-
-

*Thread Reply:* another option to try is to replace spark.jars with spark.jars.packages io.openlineage:openlineage_spark:1.5.0,org.postgresql:postgresql:42.7.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-21 08:16:54
-
-

*Thread Reply:* I think this was done for the purpose of presentation to make sure the demo will work without internet access. This can be the reason to add jar manually to a docker. openlineage-spark can be added to Spark via spark.jar.packages , like we do here https://openlineage.io/docs/integrations/spark/quickstart_local

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rafał Wójcik - (rwojcik@griddynamics.com) -
-
2023-11-21 09:21:59
-
-

*Thread Reply:* got it guys - thanks a lot for help - it turns out that spark context from notebook 2 and 3 has come kind of metadata conflict - when I combine those 2 and recreate image to clean up old metadata it works. -One more note is that sometimes kernels return weird results but it may be caused by some local nuances - anyways thx !

- - - -
-
-
-
- - - - - - - - - - - \ No newline at end of file diff --git a/html_output/channel/gx-integration/index.html b/html_output/channel/gx-integration/index.html deleted file mode 100644 index 2b0e130..0000000 --- a/html_output/channel/gx-integration/index.html +++ /dev/null @@ -1,598 +0,0 @@ - - - - - - Slack Export - #gx-integration - - - - - -
- - - -
- - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-27 13:38:03
-
-

@Michael Robinson has joined the channel

- - - -
- 🎉 Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Don Heppner - (don@greatexpectations.io) -
-
2023-09-27 13:38:23
-
-

@Don Heppner has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-27 13:38:23
-
-

@Harel Shein has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-27 13:38:23
-
-

@Jakub Dardziński has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-27 13:41:17
-
-

@Maciej Obuchowski has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-27 14:44:18
-
-

@Don Heppner it was great meeting earlier, looking forward to collaborating on this!

- - - -
- ➕ Don Heppner, Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bill Dirks - (bill@greatexpectations.io) -
-
2023-09-28 11:59:31
-
-

@Bill Dirks has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-09 07:46:52
-
-

Hello guys! I’ve been looking recently into changes in GX. -https://greatexpectations.io/blog/the-fluent-way-to-connect-to-data-sources-in-gx/ -is this the major change you’d like to introduce in OL<-> GX?

-
-
greatexpectations.io
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-10-09 10:14:39
-
-

@Don Heppner @Bill Dirks ^^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bill Dirks - (bill@greatexpectations.io) -
-
2023-10-10 19:12:02
-
-

Just seeing this, we had a company holiday yesterday. Yes, fluent data sources are our new way of connecting to data and the older "block-style" is deprecated and will be removed when we cut 0.18.0. I'm not sure of the timing of that but likely in the next couple months.

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-10-13 08:47:55
-
-

@Bill Dirks would be great if we could get your eyes on this PR: https://github.com/OpenLineage/OpenLineage/pull/2134

-
- - - - - - - -
-
Labels
- integration/great-expectations, common -
- -
-
Comments
- 6 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bill Dirks - (bill@greatexpectations.io) -
-
2023-10-13 15:00:37
-
-

*Thread Reply:* I'm a bit slammed today but can look on Tuesday.

- - - -
- ✅ Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (sicotte.jason@gmail.com) -
-
2023-10-19 13:02:24
-
-

@Jason has joined the channel

- - - -
-
-
-
- - - -
-
- - - - \ No newline at end of file diff --git a/html_output/channel/london-meetup/index.html b/html_output/channel/london-meetup/index.html deleted file mode 100644 index 455ea1e..0000000 --- a/html_output/channel/london-meetup/index.html +++ /dev/null @@ -1,661 +0,0 @@ - - - - - - Slack Export - #london-meetup - - - - - -
- - - -
- - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-25 11:52:05
-
-

@Michael Robinson has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-08-25 11:52:15
-
-

@George Polychronopoulos has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-25 11:52:15
-
-

@Harel Shein has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-08-25 11:52:40
-
-

@Madhav Kakumani has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-08-25 11:52:49
-
-

Hi Michael

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-08-25 11:53:36
-
-

thanks so much !

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-25 11:53:54
-
-

Hi George, nice to meet you. Thanks for asking about future meetups. Would November be too soon, or what’s a good timeframe for you all?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-08-25 11:54:12
-
-

thats perfect !

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-25 11:54:31
-
-

Great! Do you happen to have space we could use?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-08-25 11:55:58
-
-

I will have to confirm but 99% yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-08-25 11:57:47
-
-

I am pretty sure you can use our 6point6 offices or at least part of it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-08-25 11:58:00
-
-

and if that not the case i can provide personal space

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-25 11:58:46
-
-

OK! Would you please let me know when you know, and we’ll go from there?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-08-25 11:59:29
-
-

yes absolutely will give you an answer by Monday

- - - -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-08-25 12:51:08
-
-

Thanks Michael for starting this channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-08-25 12:51:21
-
-

hopefully meet you soon in London

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-25 13:19:58
-
-

Yes, hope so! Thank you for your interest in joining a meetup!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-25 13:34:21
-
-

@Maciej Obuchowski has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike O'Connor - (mike@entos.ai) -
-
2023-08-31 16:04:17
-
-

@Mike O'Connor has joined the channel

- - - -
-
-
-
- - - -
-
- - - - \ No newline at end of file diff --git a/html_output/channel/mark-grover/index.html b/html_output/channel/mark-grover/index.html deleted file mode 100644 index 2cc8c15..0000000 --- a/html_output/channel/mark-grover/index.html +++ /dev/null @@ -1,944 +0,0 @@ - - - - - - Slack Export - #mark-grover - - - - - -
- - - -
- - - -
-
- -
- - -
- -
- -
-
2021-01-26 13:47:55
-
-

has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
slackbot - -
-
2021-01-26 23:53:17
-
-

Stemma has joined this channel by invitation from Subsurface.

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 00:18:31
-
-

has joined the channel

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 01:13:58
-
-

has joined the channel

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 13:48:27
-
-

Can you share a screenshot of what you see?

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 13:53:05
-
-

@U01K2E1DX8T, @U016EJ380H0, @U015ZRHGVAB can you help? On the phone with Mark

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 13:53:09
-
-

has joined the channel

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 13:54:26
-
-

@U016BBEQRRV are you able to click on the link in the bottom right?

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 13:55:26
-
-

@U016BBEQRRV Everything okay? Did it boot you out when you shared your screen?

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 13:55:29
-
-

Did you try the link in your calendar invite first?

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 13:56:35
-
-

Hi @UGHF1SHT9 Melissa is sending you a link right now

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 13:56:41
-
-

@here everything is running

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 13:56:50
-
-

Thank you for your patience Mark

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 13:57:52
-
-

Hi @U016BBEQRRV Here is the link - you do need to register https://hopin.com/events/subsurfacelivewinter2021?code=oRF56uHFWjmZORii8ddI1HqMi

-
-
hopin.com
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 14:00:03
-
-

@U016BBEQRRV did that link work for you?

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 14:01:22
-
-

*Thread Reply:* He is presenting right now

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 14:08:23
-
-

has joined the channel

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 14:08:34
-
-

has joined the channel

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 14:11:05
-
-

has joined the channel

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 14:18:26
-
-

has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
slackbot - -
-
2021-01-27 14:28:48
-
-

This message was deleted.

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 14:35:49
-
-

*Thread Reply:* Anytime!

- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
slackbot - -
-
2021-01-27 14:30:11
-
-

OpenLineage has joined this channel by invitation from Subsurface.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul De Schacht - (pauldeschacht@gmail.com) -
-
2021-01-27 14:30:11
-
-

@Paul De Schacht has joined the channel

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 14:30:46
-
-

has joined the channel

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 15:27:46
-
-

has joined the channel

- - - -
-
-
-
- - - - - -
-
- -
- - -
- -
- -
-
2021-01-27 18:21:11
-
-

has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
slackbot - -
-
2021-02-15 07:21:26
-
-

Subsurface has removed your organization from this channel. You’ll continue to have access to this archived copy.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
slackbot - -
-
2023-08-03 00:00:00
-
-

Some older messages are unavailable. Due to the retention policies of an organization in this channel, all their messages and files from before this date have been deleted.

- - - -
-
-
-
- - - -
-
- - - - \ No newline at end of file diff --git a/html_output/channel/nyc-meetup/index.html b/html_output/channel/nyc-meetup/index.html deleted file mode 100644 index 5ac62bd..0000000 --- a/html_output/channel/nyc-meetup/index.html +++ /dev/null @@ -1,477 +0,0 @@ - - - - - - Slack Export - #nyc-meetup - - - - - -
- - - -
- - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-04 14:10:04
-
-

@Michael Robinson has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-04-04 14:10:23
-
-

@Harel Shein has joined the channel

- - - -
- 🙌 Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-04-04 14:10:24
-
-

@Ross Turk has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-04-04 14:11:38
-
-

@Benji Lampel has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Peter Hicks - (peter@datakin.com) -
-
2023-04-19 15:20:00
-
-

@Peter Hicks has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Viraj Parekh - (vmpvmp94@gmail.com) -
-
2023-04-19 15:20:45
-
-

@Viraj Parekh has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-04 10:37:42
-
-

Please join the meetup group: https://www.meetup.com/data-lineage-meetup/

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yuanli Wang - (yuanliw@bu.edu) -
-
2023-06-01 18:34:18
-
-

@Yuanli Wang has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-12 14:24:40
-
-

Hi folks, we’re hosting another NYC meetup on 6/22 at Collibra (thanks, @Sheeri Cabral (Collibra)!). Please RSVP by 6/20. Hope to see you there.

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nam Nguyen - (nam@astrafy.io) -
-
2023-07-14 05:37:44
-
-

@Nam Nguyen has joined the channel

- - - -
-
-
-
- - - -
-
- - - - \ No newline at end of file diff --git a/html_output/channel/open-lineage-plus-bacalhau/index.html b/html_output/channel/open-lineage-plus-bacalhau/index.html deleted file mode 100644 index 693df2e..0000000 --- a/html_output/channel/open-lineage-plus-bacalhau/index.html +++ /dev/null @@ -1,1077 +0,0 @@ - - - - - - Slack Export - #open-lineage-plus-bacalhau - - - - - -
- - - -
- - - -
-
- - - - -
- -
David Aronchick - (aronchick@gmail.com) -
-
2022-12-07 04:14:09
-
-

@David Aronchick has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-12-07 04:14:20
-
-

@Julien Le Dem has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Enrico Rotundo - (enrico.rotundo@gmail.com) -
-
2022-12-07 04:14:20
-
-

@Enrico Rotundo has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Aronchick - (aronchick@gmail.com) -
-
2022-12-07 04:14:50
-
-

Hi all! I’ve created this channel to chat about bringing Open Lineage to Bacalhau (https://www.bacalhau.org/)

-
-
bacalhau.org
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Aronchick - (aronchick@gmail.com) -
-
2022-12-07 04:15:23
-
-

the idea would be at every step of a DAG execution, we automatically read the inputs (if any) and create a metadata file with open lineage information in it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Aronchick - (aronchick@gmail.com) -
-
2022-12-07 04:18:30
-
-

Enrico has done some early thinking on this - https://www.notion.so/pl-strflt/Initial-design-doc-Oct-22-d2b032bd16e340d3ada39171c9ad524d

-
-
PL-STRFLT on Notion
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Aronchick - (aronchick@gmail.com) -
-
2022-12-07 04:19:06
-
-

and in parallel created an Airflow operator -> https://github.com/enricorotundo/bacalhau-airflow-provider

-
- - - - - - - -
-
Website
- <https://github.com/filecoin-project/bacalhau> -
- -
-
Stars
- 1 -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Aronchick - (aronchick@gmail.com) -
-
2022-12-07 04:19:12
-
-

lemme know how i can help!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Aronchick - (aronchick@gmail.com) -
-
2022-12-07 04:19:30
-
-

cc @Enrico Rotundo @Julien Le Dem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Enrico Rotundo - (enrico.rotundo@gmail.com) -
-
2022-12-07 07:45:04
-
-

hey @Julien Le Dem glad to meet you! The TLDR; is we’re going to use Airflow to orchestrate Bacalhau pipelines, and would love to add open-linage to pipelines too. I see Marquez integrates already with Airflow so that may be a good fit!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Philippe - (philippe@polyphene.io) -
-
2022-12-07 08:40:28
-
-

@Philippe has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-12-07 22:03:29
-
-

Hello @Enrico Rotundo nice to meet you as well. FYI we have the monthly meeting on zoom tomorrow if you guys want to join.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-12-07 23:35:21
-
-

https://openlineage.slack.com/archives/C01CK9T7HKR/p1670432586277209

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Enrico Rotundo - (enrico.rotundo@gmail.com) -
-
2022-12-08 05:22:55
-
-

thanks for sharing that! I’ll join you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Enrico Rotundo - (enrico.rotundo@gmail.com) -
-
2023-02-08 04:54:05
-
-

Hi @Julien Le Dem As I’m starting to build an airflow operator for Bacalhau (which will include support for OpenLineage 🙂), I was wondering if you could share your knowledge about building operators. -Why did you place the current operator in the OpenLineage repo rather than raising a PR to the “official” community-built Airflow repo? ~Is there any specific reason (community guidelines, technical, etc.)?~

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Enrico Rotundo - (enrico.rotundo@gmail.com) -
-
2023-02-08 04:59:43
-
-

*Thread Reply:* Oh now that I’m reading your new operator draft AIP it all makes sense

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Enrico Rotundo - (enrico.rotundo@gmail.com) -
-
2023-02-08 05:21:02
-
-

*Thread Reply:* ok so at this point the question is… for newborn operators, do you suggest to start with their own package or try to merge into airflow.providers directly?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-02-08 20:42:34
-
-

*Thread Reply:* I think you can create your own Provider package with your operator. This is more a question for the airflow mailing list.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-02-08 20:43:25
-
-

*Thread Reply:* I would recommend this to add lineage support for your operator: -https://openlineage.io/docs/integrations/airflow/operator/ -https://openlineage.io/docs/integrations/airflow/extractors/default-extractors

-
-
openlineage.io
- - - - - - - - - - - - - - - -
-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-02-08 20:43:41
-
-

*Thread Reply:* And nice to heare from you @Enrico Rotundo!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-02-08 20:44:11
-
-

*Thread Reply:* The next monthly meeting is tomorrow if you want to join

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-02-08 20:45:05
-
-

*Thread Reply:* I added you to the invite just in case

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-02-08 20:46:17
-
-

*Thread Reply:* To answer your original question, we started outside the Airflow community, now that the project is more established it is easier to get this approved. Hopping to get this to a vote soon

- - - -
- 🙌 Enrico Rotundo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mike Dillion - (mike.dillion@gmail.com) -
-
2023-02-11 18:51:40
-
-

@Mike Dillion has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
jrich - (jasonrich85@icloud.com) -
-
2023-03-10 14:52:21
-
-

@jrich has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nam Nguyen - (nam@astrafy.io) -
-
2023-07-14 05:37:45
-
-

@Nam Nguyen has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Silvia Pina - (silviampina@gmail.com) -
-
2023-07-15 12:10:17
-
-

@Silvia Pina has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
YYYY XXXX - (mail4registering@gmail.com) -
-
2023-07-17 21:21:36
-
-

@YYYY XXXX has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
patrice quillevere - (patrice.quillevere.csgroup.eu@gmail.com) -
-
2023-07-18 11:11:13
-
-

@patrice quillevere has joined the channel

- - - -
-
-
-
- - - -
-
- - - - \ No newline at end of file diff --git a/html_output/channel/providence-meetup/index.html b/html_output/channel/providence-meetup/index.html deleted file mode 100644 index ba8dae9..0000000 --- a/html_output/channel/providence-meetup/index.html +++ /dev/null @@ -1,577 +0,0 @@ - - - - - - Slack Export - #providence-meetup - - - - - -
- - - -
- - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-24 14:37:52
-
-

@Michael Robinson has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-02-24 14:39:45
-
-

@Harel Shein has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-02-24 14:39:45
-
-

@Benji Lampel has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Viraj Parekh - (vmpvmp94@gmail.com) -
-
2023-02-24 14:39:46
-
-

@Viraj Parekh has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-02-24 14:39:46
-
-

@Sheeri Cabral (Collibra) has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-02-24 14:39:46
-
-

@Ross Turk has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Eric Veleker - (eric@atlan.com) -
-
2023-02-24 14:39:46
-
-

@Eric Veleker has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-24 14:42:28
-
-

Hi everyone! I created this for coordinating travel, etc. Please add anyone I forgot. Thanks and looking forward to meeting up IRL.

- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-02-24 14:57:53
-
-

Thanks, Michael! Looking forward to it!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-02-24 14:58:29
-
-

New Yorkers, assuming y’all don’t own a car, I’m happy to rent one and drive us up there.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-02-24 16:22:03
-
-

*Thread Reply:* @Viraj Parekh / @Benji Lampel anyone else?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-01 16:57:30
-
-

Some hotel ideas if you’re traveling and thinking of spending the night: -• The Graduate (formerly the Biltmore — kinda fancy but not too pricey) -• The Dean -• Aloft (a more expensive option but right next to the meetup location) -• Hampton Inn

- - - -
- 🙌 Ross Turk, Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
karen - (karenjn@outlook.fr) -
-
2023-03-06 11:19:07
-
-

@karen has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-09 10:17:29
-
-

Here’s the event space info for your convenience: -CIC Providence & District Hall -225 Dyer St. -Providence, RI 02903 - -Hope Island Room, 3rd Floor -Tell the concierge you’re looking for the Data Lineage Meetup

- - - -
- ✅ Sheeri Cabral (Collibra), Ross Turk -
- -
-
-
-
- - - - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-04 16:40:01
-
-

archived the channel

- - - -
-
-
-
- - - -
-
- - - - \ No newline at end of file diff --git a/html_output/channel/sf-meetup/index.html b/html_output/channel/sf-meetup/index.html deleted file mode 100644 index 73f22ed..0000000 --- a/html_output/channel/sf-meetup/index.html +++ /dev/null @@ -1,657 +0,0 @@ - - - - - - Slack Export - #sf-meetup - - - - - -
- - - -
- - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-04 10:35:50
-
-

@Michael Robinson has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-05-04 10:36:16
-
-

@Julien Le Dem has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Peter Hicks - (peter@datakin.com) -
-
2023-05-04 10:36:16
-
-

@Peter Hicks has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-05-04 10:36:16
-
-

@Willy Lulciuc has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Atif Tahir - (atif.tahir13@gmail.com) -
-
2023-05-23 14:38:04
-
-

@Atif Tahir has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-05-26 12:42:29
-
-

@Bernat Gabor has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yuanli Wang - (yuanliw@bu.edu) -
-
2023-06-01 18:33:53
-
-

@Yuanli Wang has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-21 17:40:49
-
-

Our first San Francisco meetup is next Tuesday at Astronomer’s HQ in the financial district https://www.meetup.com/meetup-group-bnfqymxe/events/293448130/

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jeff L - (jeffreyl@alumni.cmu.edu) -
-
2023-06-27 21:02:24
-
-

@Jeff L has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nam Nguyen - (nam@astrafy.io) -
-
2023-07-14 05:37:49
-
-

@Nam Nguyen has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-02 12:59:23
-
-

Our second SF meetup will be happening on August 30th at 5:30 PM PT, at Astronomer. Please medium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|sign up> to let us know you’re coming!

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-30 15:11:18
-
-

Adding the venue info in case it’s more convenient than the meetup page:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-30 15:12:55
-
-

Time: 5:30_8:30 pm -Address: 8 California St., San Francisco, CA, seventh floor -Getting in: someone from Astronomer will be in the lobby to direct you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Languasco - (kevin@haystack.tv) -
-
2023-08-31 18:29:01
-
-

@Kevin Languasco has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-08-31 23:18:10
-
-

Some pictures from last night

- -
- - - - - - - - - -
-
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Aaruna Godthi - (aaruna6@gmail.com) -
-
2023-09-23 16:47:37
-
-

@Aaruna Godthi has joined the channel

- - - -
-
-
-
- - - -
-
- - - - \ No newline at end of file diff --git a/html_output/channel/spec-compliance/index.html b/html_output/channel/spec-compliance/index.html deleted file mode 100644 index 213f96a..0000000 --- a/html_output/channel/spec-compliance/index.html +++ /dev/null @@ -1,1096 +0,0 @@ - - - - - - Slack Export - #spec-compliance - - - - - -
- - - -
- - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-12 08:55:50
-
-

@Julien Le Dem has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-12 09:00:07
-
-

This channel is to support discussion following up on @Sheeri Cabral (Collibra)’s presentation at the monthly meeting. Notes and recording here: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting#MonthlyTSCmeeting-December8,2022(10amPT) -Sheeri will start a google doc to document our conclusions and proposal to evolve the spec.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-01-12 09:00:32
-
-

@Sheeri Cabral (Collibra) has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mandy Chessell - (mandy.e.chessell@gmail.com) -
-
2023-01-12 09:00:33
-
-

@Mandy Chessell has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2023-01-12 09:00:33
-
-

@Ernie Ostic has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-12 09:13:48
-
-

@Maciej Obuchowski has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-12 11:40:03
-
-

@Michael Robinson has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-01-12 12:47:20
-
-

Document is here - https://docs.google.com/document/d/1ysZR13QwDvAiY_rQJedLHpnNn3deQeow7BmCNckd2uM/edit

- -

At this point I only have the notes + recording link 😄

- - - -
- :gratitude_thank_you: Michael Collado, Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-01-12 12:47:39
-
-

(right now everyone can edit it, if that becomes a problem, we can restrict it).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-01-12 13:40:17
-
-

@Harel Shein has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Laurent Paris - (laurent@datakin.com) -
-
2023-01-12 14:00:16
-
-

@Laurent Paris has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2023-01-12 14:03:24
-
-

@Michael Collado has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Henneberger - (me@danielhenneberger.com) -
-
2023-01-12 14:05:11
-
-

@Daniel Henneberger has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-01-12 14:05:59
-
-

I will synthesize what I think are the main points, but please change what I got wrong 😄

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-01-13 18:37:43
-
-

*Thread Reply:* thanks for kick-starting / driving this @Sheeri Cabral (Collibra) (long overdue)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-01-12 14:06:10
-
-

@Willy Lulciuc has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kengo Seki - (sekikn@gmail.com) -
-
2023-01-14 19:29:10
-
-

@Kengo Seki has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-15 10:49:44
-
-

@Ross Turk has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Orbit - -
-
2023-01-24 14:34:11
-
-

@Orbit has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-02-09 14:01:39
-
-

What are some custom facets people have created, or want to see?

- -

I’ll start! Custom facet of “rows affected” for DML (e.g. rows inserted, rows deleted, rows updated.

- -

(and a grand vision would be to set data quality thresholds against this - an application could warn if a run deviates more than 10% from the mean/median rows affected from previous run jobs)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-02-09 14:02:44
-
-

*Thread Reply:* Note that it’s possible this has already been implemented - that’s fine! We’re just gathering ideas, nothing is being set in stone here.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2023-02-09 14:03:04
-
-

*Thread Reply:* you mean output statistics?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-02-09 14:04:07
-
-

*Thread Reply:* Yes! it’s already part of output statistics. I was just giving one example of the kind of ideas we are looking for people to throw out there. Whether or not the facet exists already. 😄 it’s brainstorming, no wrong answers. (in this case the answer is “congrats! it’s already there!“)

- - - -
- 👏 Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2023-02-09 14:14:42
-
-

*Thread Reply:* I’ve always been interested in finding a way for devs to publish business metrics from their apps - i.e., metrics specific to the job or dataset like rowsFiltered or averageValues or something. I think enabling comparison of these metrics with lineage would provide tremendous value to data pipeline engineers

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-02-09 14:16:53
-
-

*Thread Reply:* oh yeah. even rowsExamined - e.g. rowsExamined - rowsFiltered = rowsAffected -some databases give those metrics from queries. And this is why they’re optional, BUT if you make a new company that has software to analyze pipeline metrics, that might be required for your software, even though it’s optional in the openlineage spec)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-02-10 10:07:13
-
-

*Thread Reply:* I’m thinking about some custom facets that would allow intractability between different frameworks (possibly vendors). for example, I can throw in a link to the Airflow UI for a specific task that we captured metadata for, and another tool (say, a data catalog) can also use my custom facet to help guide users directly to where the process is running. there’s a commercial aspect to this as well, but I think the community aspect is interesting. -I don’t think it should be required to comply with the spec, but I’d like to be able to share this custom facet for others to see if it exists and decorate accordingly. does that make sense?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike Dillion - (mike.dillion@gmail.com) -
-
2023-02-11 18:51:45
-
-

@Mike Dillion has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mahesh Gawde - (mahesh.gawde@originml.ai) -
-
2023-02-28 05:53:27
-
-

@Mahesh Gawde has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thijs Koot - (thijs.koot@gmail.com) -
-
2023-02-28 08:36:33
-
-

@Thijs Koot has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-09 14:12:58
-
-

@Anirudh Shrinivason has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
jrich - (jasonrich85@icloud.com) -
-
2023-03-10 14:52:40
-
-

@jrich has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-06-01 13:51:29
-
-

@Suraj Gupta has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yuanli Wang - (yuanliw@bu.edu) -
-
2023-07-12 17:18:25
-
-

@Yuanli Wang has joined the channel

- - - -
-
-
-
- - - -
-
- - - - \ No newline at end of file diff --git a/html_output/channel/toronto-meetup/index.html b/html_output/channel/toronto-meetup/index.html deleted file mode 100644 index 6b43ed5..0000000 --- a/html_output/channel/toronto-meetup/index.html +++ /dev/null @@ -1,693 +0,0 @@ - - - - - - Slack Export - #toronto-meetup - - - - - -
- - - -
- - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-16 14:21:43
-
-

@Michael Robinson has joined the channel

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-16 14:22:08
-
-

@Harel Shein has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-08-16 14:22:08
-
-

@Julien Le Dem has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-16 14:22:08
-
-

@Maciej Obuchowski has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-16 14:22:08
-
-

@Paweł Leszczyński has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-16 14:22:08
-
-

@Jakub Dardziński has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-08-16 14:32:41
-
-

@Willy Lulciuc has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Peter Hicks - (peter.hicks@astronomer.io) -
-
2023-08-16 14:33:10
-
-

@Peter Hicks has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-25 13:30:07
-
-

Some belated updates on this in case you’re not aware: -• Date: 9/18 -• Time: 5-8:00 PM ET -• Place: Canarts, 600 Bay St., #410 (around the corner from the Airflow Summit venue) -• Venue phone: -• Meetup for more info and to sign up: https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
- 🎉 Harel Shein, Maciej Obuchowski, Paweł Leszczyński, Willy Lulciuc -
- -
- 🙌 Harel Shein, Maciej Obuchowski, Paweł Leszczyński, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-25 13:33:42
-
-

really looking forward to meeting all of you in Toronto!!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-09-01 23:10:51
-
-

Most OpenLineage regular contributors will be there. It will be fun to be all in person. Everyone is encouraged to join

- - - -
- 🙌 Harel Shein, Maciej Obuchowski, Paweł Leszczyński, Michael Robinson, Willy Lulciuc, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-11 10:13:57
-
-

@channel -It’s hard to believe this is happening in just one week! Here’s the updated agenda:

- -
  1. Intros
  2. Evolution of spec presentation/discussion (project background/history)
  3. State of the community
  4. Integrating OpenLineage with Metaphor (by special guests Ye & Ivan)
  5. Spark/Column lineage update
  6. Airflow Provider update
  7. Roadmap Discussion
  8. Action items review/next steps -Find the details and RSVP https://www.meetup.com/openlineage/events/295488014/?utmmedium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|here.
  9. -
-
-
metaphor.io
- - - - - - - - - - - - - - - - - -
-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Greg Kim - (kgkwiz@gmail.com) -
-
2023-09-15 10:11:11
-
-

@Greg Kim has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-15 12:17:29
-
-

Looking forward to seeing you on Monday! Here’s the time/place info again for your convenience: -• Date: 9/18 -• Time: 5-8:00 PM ET -• Place: Canarts, 600 Bay St., #410 (around the corner from the Airflow Summit venue) -• Venue phone: -• Meetup page with more info and signup: https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link -Please send a message if you find yourself stuck in the lobby, etc.

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-18 16:20:33
-
-

Hi, if you’re wondering if you’re in the right place: look for Uncle Tetsu’s Cheesecake nextdoor and for the address (600 Bay St) above the door. The building is an older one (unlike the meeting space itself, which is modern and well-appointed)

- - - -
-
-
-
- - - -
-
- - - - \ No newline at end of file diff --git a/html_output/index.html b/html_output/index.html deleted file mode 100644 index b6a82de..0000000 --- a/html_output/index.html +++ /dev/null @@ -1,151812 +0,0 @@ - - - - - - Slack Export - #general - - - - - -
- - - -
- - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-20 21:01:02
-
-

@Julien Le Dem has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars.th.lan@gmail.com) -
-
2020-10-21 08:23:39
-
-

@Mars Lan has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Wes McKinney - (wesmckinn@gmail.com) -
-
2020-10-21 11:39:13
-
-

@Wes McKinney has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ryan Blue - (rblue@netflix.com) -
-
2020-10-21 12:46:39
-
-

@Ryan Blue has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Drew Banin - (drew@fishtownanalytics.com) -
-
2020-10-21 12:53:42
-
-

@Drew Banin has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2020-10-21 13:29:49
-
-

@Willy Lulciuc has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lewis Hemens - (lewis@dataform.co) -
-
2020-10-21 13:52:50
-
-

@Lewis Hemens has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-21 14:15:41
-
-

This is the official start of the OpenLineage initiative. Thank you all for joining. First item is to provide feedback on the doc: https://docs.google.com/document/d/1qL_mkd9lFfe_FMoLTyPIn80-fpvZUAdEIfrabn8bfLE/edit

- - - -
- 🎉 Willy Lulciuc, Abe Gong -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-10-21 23:22:03
-
-

@Abe Gong has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Shirshanka Das - (sdas@linkedin.com) -
-
2020-10-22 13:50:35
-
-

@Shirshanka Das has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
deleted_profile - (fengtao04@gmail.com) -
-
2020-10-23 15:03:44
-
-

@deleted_profile has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Chris White - (chris@prefect.io) -
-
2020-10-23 19:30:36
-
-

@Chris White has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-24 19:29:04
-
-

Thanks all for joining. In addition to the google doc, I have opened a pull request with an initial openapi spec: https://github.com/OpenLineage/OpenLineage/pull/1 -The goal is to specify the initial model (just plain lineage) that will be extended with various facets. -It does not intend to restrict to HTTP. Those same PUT calls without output can be translated to any async protocol

-
-
GitHub
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-24 19:31:09
- -
-
-
- - - - - -
-
- - - - -
- -
Wes McKinney - (wesmckinn@gmail.com) -
-
2020-10-25 12:13:26
-
-

Am I the only weirdo that would prefer a Google Group mailing list to Slack for communicating?

- - - -
- 👍 Ryan Blue -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-25 17:22:09
-
-

*Thread Reply:* slack is the new email?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Wes McKinney - (wesmckinn@gmail.com) -
-
2020-10-25 17:40:19
-
-

*Thread Reply:* :(

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ryan Blue - (rblue@netflix.com) -
-
2020-10-27 12:27:04
-
-

*Thread Reply:* I'd prefer a google group as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ryan Blue - (rblue@netflix.com) -
-
2020-10-27 12:27:25
-
-

*Thread Reply:* I think that is better for keeping people engaged, since it isn't just a ton of history to go through

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ryan Blue - (rblue@netflix.com) -
-
2020-10-27 12:27:38
-
-

*Thread Reply:* And I think it is also better for having thoughtful design discussions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-29 15:40:14
-
-

*Thread Reply:* I’m happy to create a google group if that would help.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-29 15:45:23
-
-

*Thread Reply:* Here it is: https://groups.google.com/g/openlineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-29 15:46:34
-
-

*Thread Reply:* Slack is more of a way to nudge discussions along, we can use github issues or the mailing list to discuss specific points

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-11-03 17:34:53
-
-

*Thread Reply:* @Ryan Blue and @Wes McKinney any recommendations on automating sending github issues update to that list?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ryan Blue - (rblue@netflix.com) -
-
2020-11-03 17:35:34
-
-

*Thread Reply:* I don't really know how to do that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ravi Suhag - (suhag.ravi@gmail.com) -
-
2021-04-02 07:18:25
-
-

*Thread Reply:* @Julien Le Dem How about using Github discussions. They are specifically meant to solve this problem. Feature is still in beta, but it be enabled from repository settings. One positive side i see is that it will really easy to follow through and one separate place to go and look for discussions and ideas which are being discussed.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-02 19:51:55
-
-

*Thread Reply:* I just enabled it: https://github.com/OpenLineage/OpenLineage/discussions

- - - -
- 🙌 Ravi Suhag -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Wes McKinney - (wesmckinn@gmail.com) -
-
2020-10-25 12:14:06
-
-

Or GitHub Issues

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-10-25 17:21:44
-
-

*Thread Reply:* the plan is to use github issues for discussions on the spec. This is to supplement

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Laurent Paris - (laurent@datakin.com) -
-
2020-10-26 19:28:17
-
-

@Laurent Paris has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Josh Benamram - (josh@databand.ai) -
-
2020-10-27 21:17:30
-
-

@Josh Benamram has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Victor Shafran - (victor.shafran@databand.ai) -
-
2020-10-28 04:07:27
-
-

@Victor Shafran has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Victor Shafran - (victor.shafran@databand.ai) -
-
2020-10-28 04:09:00
-
-

👋 Hi everyone!

- - - -
- 👋 Willy Lulciuc, Abe Gong, Drew Banin, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Zhamak Dehghani - (zdehghan@thoughtworks.com) -
-
2020-10-29 17:59:31
-
-

@Zhamak Dehghani has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-11-02 18:30:51
-
-

I’ve opened a github issue to propose OpenAPI as the way to define the lineage metadata: https://github.com/OpenLineage/OpenLineage/issues/2 -I have also started a thread on the OpenLineage group: https://groups.google.com/g/openlineage/c/2i7ogPl1IP4 -Discussion should happen there: ^

-
-
GitHub
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Evgeny Shulman - (evgeny.shulman@databand.ai) -
-
2020-11-04 10:56:00
-
-

@Evgeny Shulman has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-11-05 20:51:22
-
-

FYI I have updated the PR with a simple genrator: https://github.com/OpenLineage/OpenLineage/pull/1

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Comments
- 6 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Henneberger - (danny@datakin.com) -
-
2020-11-11 15:05:46
-
-

@Daniel Henneberger has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-08 17:27:57
-
-

Please send me your github ids if you wish to be added to the github repo

- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Fabrice Etanchaud - (fabrice.etanchaud@netc.fr) -
-
2020-12-10 02:10:35
-
-

@Fabrice Etanchaud has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-10 17:04:29
-
-

As mentioned on the mailing List, the initial spec is ready for a final review. Thanks for all who gave feedback so far.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-10 17:04:39
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/1

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Comments
- 6 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-10 17:04:51
-
-

The next step will be to define individual facets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-13 00:28:11
-
-

I have opened a PR to update the ReadMe: https://openlineage.slack.com/archives/C01EB6DCLHX/p1607835827000100

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - -
Pull request opened by julienledem
- - - - - - - - - - - - - - -
- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2020-12-14 17:55:46
-
-

*Thread Reply:* Looks great!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maxime Beauchemin - (max@preset.io) -
-
2020-12-13 17:45:49
-
-

👋

- - - -
- 👋 Shirshanka Das, Julien Le Dem, Willy Lulciuc, Arthur Wiedmer, Mario Measic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-14 20:19:57
-
-

I’m planning to merge https://github.com/OpenLineage/OpenLineage/pull/1 soon. That will be the base that we can iterate on and will enable starting the discussion on individual facets

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Comments
- 6 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-16 21:40:52
-
-

Thank you all for the feedback. I have made an update to the initial spec adressing the final comments

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-16 21:41:16
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/1

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Comments
- 7 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-19 11:21:27
-
-

The contributing guide is available here: https://github.com/OpenLineage/OpenLineage/blob/main/CONTRIBUTING.md -Here is an example proposal for adding a new facet: https://github.com/OpenLineage/OpenLineage/issues/9

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Josh Benamram, Victor Shafran -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-19 18:27:36
-
-

Welcome to the newly joined members 🙂 👋

- - - -
- 👋 Chris Lambert, Ananth Packkildurai, Arthur Wiedmer, Abe Gong, ale, James Le, Ha Pham, David Krevitt, Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ash Berlin-Taylor - (ash@apache.org) -
-
2020-12-21 05:23:21
-
-

Hello! Airflow PMC member here. Super interested in this effort

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-21 12:15:42
-
-

*Thread Reply:* Welcome!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ash Berlin-Taylor - (ash@apache.org) -
-
2020-12-21 05:25:07
-
-

I'm joining this slack now, but I'm basically done for the year, so will investigate proposals etc next year

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2020-12-21 10:02:37
-
-

Hey all 👋 Super curious what people's thoughts are on the best way for data quality tools i.e. Great Expectations to integrate with OpenLineage. Probably a Dataset level facet of some sort (from the 25 minutes of deep spec knowledge I have 😆), but curious if that's something being worked on? @Abe Gong

- - - -
- 👋 Abe Gong, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-12-21 10:30:51
-
-

*Thread Reply:* Yes, that’s about right.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-12-21 10:31:45
-
-

*Thread Reply:* There’s some subtlety here.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-12-21 10:32:02
-
-

*Thread Reply:* The initial OpenLineage spec is pretty explicit about linking metadata primarily to execution of specific tasks, which is appropriate for ValidationResults in Great Expectations

- - - -
- ✅ Zachary Friedman -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-12-21 10:32:57
-
-

*Thread Reply:* There isn’t as strong a concept of persistent data objects (e.g. a specific table, or batches of data from a specific table)

- - - -
- ✅ Zachary Friedman -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-12-21 10:33:20
-
-

*Thread Reply:* (In the GE ecosystem, we call these DataAssets and Batches)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-12-21 10:33:56
-
-

*Thread Reply:* This is also an important conceptual unit, since it’s the level of analysis where Expectations and data docs would typically attach.

- - - -
- ✅ Zachary Friedman -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Abe Gong - (abe@superconductive.com) -
-
2020-12-21 10:34:47
-
-

*Thread Reply:* @James Campbell and I have had some productive conversations with @Julien Le Dem and others about this topic

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-21 12:20:53
-
-

*Thread Reply:* Yep! The next step will be to open a few github issues with proposals to add to or amend the spec. We would probably start with a Descriptive Dataset facet of a dataset profile (or dataset update profile). There are other aspects to clarify as well as @Abe Gong is explaining above.

- - - -
- ✅ James Campbell -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2020-12-21 10:08:24
-
-

Also interesting to see where this would hook into Dagster. Because one of the many great features of Dagster IMO is it let you do stuff like this (without a formal spec albeit). An OpenLineageMaterialization could be interesting

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-21 12:23:41
-
-

*Thread Reply:* Totally! We had a quick discussion with Dagster. Looking forward to proposals along those lines.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harikiran Nayak - (hari@streamsets.com) -
-
2020-12-21 14:35:11
-
-

Congrats @Julien Le Dem @Willy Lulciuc and team on launching OpenLineage!

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2020-12-21 14:48:11
-
-

*Thread Reply:* Thanks, @Harikiran Nayak! It’s amazing to see such interest in the community on defining a standard for lineage metadata collection.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harikiran Nayak - (hari@streamsets.com) -
-
2020-12-21 15:03:29
-
-

*Thread Reply:* Yep! Its a validation that the problem is real!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kriti - (kathuriakritihp@gmail.com) -
-
2020-12-22 02:05:45
-
-

Hey folks! -Worked on a variety of lineage problems across domains. Super excited about this initiative!

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-22 13:23:43
-
-

*Thread Reply:* Welcome!

- - - -
- 👋 Kriti -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-30 22:30:23
-
-

*Thread Reply:* What are you current use cases for lineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-22 19:54:33
-
-

(for review) Proposal issue template: https://github.com/OpenLineage/OpenLineage/pull/11

-
-
GitHub
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-22 19:55:16
-
-

for people interested, <#C01EB6DCLHX|github-notifications> has the github integration that will notify of new PRs …

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Charrel - (martin.charrel@datadoghq.com) -
-
2020-12-29 09:39:46
-
-

👋 Hello! I'm currently working on lineage systems @ Datadog. Super excited to learn more about this effort

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-30 22:28:54
-
-

*Thread Reply:* Welcome!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2020-12-30 22:29:43
-
-

*Thread Reply:* Would you mind sharing your main use cases for collecting lineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marko Jamedzija - (marko@popcore.com) -
-
2021-01-03 05:54:34
-
-

Hi! I’m also working on a similar topic for some time. Really looking forward to having these ideas standardized 🙂

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Alexander Gilfillan - (agilfillan@dealerinspire.com) -
-
2021-01-05 11:29:31
-
-

I would be interested to see how to extend this to dashboards/visualizations. If that still falls with the scope of this project.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-05 12:55:01
-
-

*Thread Reply:* Definitely, each dashboard should become a node in the lineage graph. That way you can understand all the dependencies of a given dashboard. SOme example of interesting metadata around this: is the dashboard updated in a timely fashion (data freshness); is the data correct (data quality)? Observing changes upstream of the dashboard will provide insights to what’s hapening when freshness or quality suffer

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Alexander Gilfillan - (agilfillan@dealerinspire.com) -
-
2021-01-05 13:20:41
-
-

*Thread Reply:* 100%. On a granular scale, the difference between a visualization and dashboard can be interesting. One visualization can be connected to multiple dashboards. But of course this depends on the BI tool, Redash would be an example in this case.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-05 15:15:23
-
-

*Thread Reply:* We would need to decide how to model those things. Possibly as a Job type for dashboard and visualization.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Alexander Gilfillan - (agilfillan@dealerinspire.com) -
-
2021-01-06 18:20:06
-
-

*Thread Reply:* It could be. Its interesting in Redash for example you create custom queries that run at certain intervals to produce the data you need to visualize. Pretty much equivalent to job. But you then build certain visualizations off of that “job”. Then you build dashboards off of visualizations. So you could model it as an job or it could make sense for it to be more modeled like an dataset.

- -

Thats the hard part of this. How to you model a visualization/dashboard to all the possible ways they can be created since it differs depending on how the tool you use abstracts away creating an visualization.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Reid - (reid.david.jason@gmail.com) -
-
2021-01-05 17:06:02
-
-

👋 Hi everyone!

- - - -
- 🙌 Willy Lulciuc, Arthur Wiedmer -
- -
- 👋 Abe Gong -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Reid - (reid.david.jason@gmail.com) -
-
2021-01-05 17:10:22
-
-

*Thread Reply:* Part of my role at Netflix is to oversee our data lineage story so very interested in this effort and hope to be able to participate in its success

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-05 18:12:48
-
-

*Thread Reply:* Hi Jason and welcome

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-05 18:15:12
-
-

A reference implementation of the OpenLineage initial spec is in progress in Marquez: https://github.com/MarquezProject/marquez/pull/880

-
- - -
- - - } - - henneberger - (https://github.com/henneberger) -
- - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-07 12:46:19
-
-

*Thread Reply:* The OpenLineage reference implementation in Marquez will be presented this morning Thursday (01/07) at 10AM PST, at the Marquez Community meeting.

- -

When: Thursday, January 7th at 10AM PST -Wherehttps://us02web.zoom.us/j/89344845719?pwd=Y09RZkxMZHc2U3pOTGZ6SnVMUUVoQT09

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-07 12:46:36
- -
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-07 12:46:44
-
-

*Thread Reply:* that’s in 15 min

- - - -
-
-
-
- - - - - - - - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-12 17:10:23
-
-

*Thread Reply:* And it’s merged!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-12 17:10:53
-
-

*Thread Reply:* Marquez now has a reference implementation of the initial OpenLineage spec

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jon Loyens - (jon@data.world) -
-
2021-01-06 17:43:02
-
-

👋 Hi everyone! I'm one of the co-founder at data.world and looking forward to hanging out here

- - - -
- 👋 Julien Le Dem, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Elena Goydina - (egoydina@provectus.com) -
-
2021-01-11 11:39:20
-
-

👋 Hi everyone! I was looking for the roadmap and don't see any. Does it exist?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-13 19:06:34
-
-

*Thread Reply:* There’s no explicit roadmap so far. With the initial spec defined and the reference implementation implemented, next steps are to define more facets (for example, data shape, dataset size, etc), provide clients to facilitate integrations (java, python, …), implement more integrations (Spark in the works). Members of the community are welcome to drive their own initiatives around the core spec. One of the design goals of the facet is to enable numerous and independant parallel efforts

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-13 19:06:48
-
-

*Thread Reply:* Is there something you are interested about in particular?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-13 19:09:42
-
-

I have opened a proposal to move the spec to JSONSchema, this will make it more focused and decouple from http: https://github.com/OpenLineage/OpenLineage/issues/15

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Assignees
- julienledem -
- - - - - - - - - - -
- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-01-19 12:26:39
-
-

Here is a PR with the corresponding change: https://github.com/OpenLineage/OpenLineage/pull/17

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Xinbin Huang - (bin.huangxb@gmail.com) -
-
2021-02-01 17:07:50
-
-

Really excited to see this project! I am curious what's the current state and the roadmap of it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-01 17:55:59
-
-

*Thread Reply:* You can find the initial spec here: https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md -The process to contribute to the model is described here: https://github.com/OpenLineage/OpenLineage/blob/main/CONTRIBUTING.md -In particular, now we’d want to contribute more facets and integrations. -Marquez has a reference implementation: https://github.com/MarquezProject/marquez/pull/880 -On the roadmap: -• define more facets: data profile, etc -• more integrations -• java/python client -You can see current discussions here: https://github.com/OpenLineage/OpenLineage/issues

- - - -
- ✅ Xinbin Huang -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-01 17:56:43
-
-

For people curious about following github activity you can subscribe to: <#C01EB6DCLHX|github-notifications>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-01 17:57:05
-
-

*Thread Reply:* It is not on general, as it can be a bit noisy

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2021-02-09 13:50:17
-
-

Random-ish question: why is producer and schemaURL nested under nominalTime facet in the spec for postRunStateUpdate? It seems like the producer of its metadata isn’t related to the time of the lineage event?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-09 20:02:48
-
-

*Thread Reply:* Hi @Zachary Friedman! I replied bellow. https://openlineage.slack.com/archives/C01CK9T7HKR/p1612918909009900

-
- - -
- - - } - - Julien Le Dem - (https://openlineage.slack.com/team/U01DCLP0GU9) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-09 20:01:49
-
-

producer and schemaURL are defined in the BaseFacet type and therefore all facets (including nominalTime) have it. -• The producer is an identifier for the code that produced the metadata. The idea is that different facets in the same event can be produced by different libraries. For example In a Spark integration, Iceberg could emit it’s own facet in addition to other facets. The producer identifies what produced what. -• The _schemaURL is the identifier of the version of the schema for a given facet. Similarly an event could contain a mixture of Core facets from the spec as well as custom facets. This makes explicit what the definition for this facet is.

- - - -
- 👍 Zachary Friedman -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-09 21:27:05
-
-

As discussed previously, I have separated a Json Schema spec for the OpenLineage events from the OpenAPI spec defining a HTTP endpoint: https://github.com/OpenLineage/OpenLineage/pull/17

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Reviewers
- @wslulciuc, @henneberger -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-09 21:27:26
-
-

*Thread Reply:* Feel free to comment, this is ready to merge

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-02-11 20:12:18
-
-

*Thread Reply:* Thanks, Julien. The new spec format looks great 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-09 21:34:31
-
-

And the corresponding code generator to start the java (and other languages) client: https://github.com/OpenLineage/OpenLineage/pull/18

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Reviewers
- @wslulciuc -
- - - - - - - - - - -
- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-11 22:25:24
-
-

those are merged, we now have a jsonschema, an openapi spec that extends it and a generated java model

- - - -
- 🎉 Willy Lulciuc -
- -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-17 19:39:55
-
-

Following up on a previous discussion: -This proposal and the accompanying PR add the notion of InputFacets and OutputFacets: https://github.com/OpenLineage/OpenLineage/issues/20 -In summary, we are collecting metadata about jobs and datasets. -At the Job level, when it’s fairly static metadata (not changing every run, like the current code version of the job) it goes in a JobFacet. When it is dynamic and changes every run (like the schedule time of the run), it goes in a RunFacet. -This proposal is adding the same notion at the Dataset level: when it is static and doesn’t change every run (like the dataset schema) it goes in a Dataset facet. When it is dynamic and changes every run (like the input time interval of the dataset being read, or the statistics of the dataset being written) it goes in an inputFacet or an outputFacet. -This enables Job and Dataset versioning logic, to keep track of what changes in the definition of something vs runtime changes

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Labels
- proposal -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- 👍 Kevin Mellott, Petr Šimeček -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-19 14:27:23
-
-

*Thread Reply:* @Kevin Mellott and @Petr Šimeček Thanks for the confirmation on this slack message. To make your comment visible to the wider community, please chime in on the github issue as well: https://github.com/OpenLineage/OpenLineage/issues/20 -Thank you.

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Labels
- proposal -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-19 14:27:46
-
-

*Thread Reply:* The PR is out for this: https://github.com/OpenLineage/OpenLineage/pull/23

-
- - -
- - - } - - julienledem - (https://github.com/julienledem) -
- - - - - - -
-
Reviewers
- @jcampbell, @abegong, @henneberger -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Weixi Li - (ashlee.happy@gmail.com) -
-
2021-02-19 04:14:59
-
-

Hi, I am really interested in this project and Marquez. I am a bit not clear about the differences and relationship between those two projects. As my understanding, OpenLineage provides an api specification for other tools running jobs (e.g. Spark, Airflow) to send out an event to update the run state of the job, then for example Marquez can be the destination for those events and show the data lineage from those run state updates. When you are saying there is an reference implementation of the OpenLineage spec in Marquez, do you mean there is an /lineage endpoint implemented in the Marquez api https://github.com/MarquezProject/marquez/blob/main/api/src/main/java/marquez/api/OpenLineageResource.java? Then my question is what is next step after Marquez has this api? How does Marquez use that endpoint to integrate with airflow for example? I did not find the usage of that endpoint in Marquez project. The library marquez-airflow which integrates Airflow with Marquez seems like only use the other marquez apis to build the data lineage. Or did I misunderstand something? Thank you very much!

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Weixi Li - (ashlee.happy@gmail.com) -
-
2021-02-19 05:03:21
-
-

*Thread Reply:* Okay, I found the spark integration in Marquez calls the /lineage endpoint. But I am still curious about the future plan to integrate with other tools, like airflow?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-02-19 12:41:23
-
-

*Thread Reply:* Just restating some of my answers from teh marquez slack for the benefits of folks here.

- -

• OpenLineage defines the schema to collect metadata -• Marquez has a /lineage endpoint implementing the OpenLineage spec to receive this metadata, implemented by the OpenLineageResource you pointed out -• In the future other projects will also have OpenLineage endpoints to receive this metadata -•  The Marquez Spark integration produces OpenLineage events: https://github.com/MarquezProject/marquez/tree/main/integrations/spark -• The Marquez airflow integration still uses the original marquez api but will be migrated to open lineage. -• All new integrations will use OpenLineage metadata

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Weixi Li - (ashlee.happy@gmail.com) -
-
2021-02-22 03:55:18
-
-

*Thread Reply:* thank you! very clear answer🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2021-03-02 13:49:04
-
-

Hi Everyone. Just got started with the Marquez REST API and a little bit into the Open Lineage aspects. Very easy to use. Great work on the curl examples for getting started. I'm working with Postman and am happy to share a collection I have once I finish testing. A question about tags --- are there plans for a "post new tag" call in the API? ...or maybe I missed it. Thx. --ernie

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-02 17:51:29
-
-

*Thread Reply:* I forgot to reply in thread 🙂 https://openlineage.slack.com/archives/C01CK9T7HKR/p1614725462008300

-
- - -
- - - } - - Julien Le Dem - (https://openlineage.slack.com/team/U01DCLP0GU9) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-02 17:51:02
-
-

OpenLineage doesn’t have a Tag facet yet (but tags are defined in the Marquez api). Feel free to open a proposal on the github repo. https://github.com/OpenLineage/OpenLineage/issues/new/choose

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-03-16 11:21:37
-
-

Hey everyone. What's the story for stream processing (like Flink jobs) for OpenLineage? -It does not fit cleanly with runEvent model, which -It is required to issue 1 START event and 1 of [ COMPLETE, ABORT, FAIL ] event per run. -as unbounded stream jobs usually do not complete.

- -

I'd imagine few "workarounds" that work for some cases - for example, imagine a job calculating hourly aggregations of transactions and dumpling them into parquet files for further analysis. The job could issue OTHER event type adding additional output dataset every hour. Another option would be to create new "run" every hour, just indicating the added data.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-16 15:07:04
-
-

*Thread Reply:* Ha, I signed up just to ask this precise question!

- - - -
- 😀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-16 15:07:44
-
-

*Thread Reply:* I’m still looking into the spec myself. Are we required to have 1 or more runs per Job? Or can a Job exist without a run event?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ravi Suhag - (suhag.ravi@gmail.com) -
-
2021-04-02 07:24:39
-
-

*Thread Reply:* Run event can be emitted when it starts. and it can stay in RUNNING state unless something happens to the job. Additionally, you could send event periodically as state RUNNING to inform the system that job is healthy.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-16 15:09:31
-
-

Similar to @Maciej Obuchowski question about Flink / Streaming jobs - what about Streaming sources (eg: a Kafka topic)? It does fit into the dataset model, more or less. But, has anyone used this yet for a set of streaming sources? Particularly with schema changes over time?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 18:30:46
-
-

Hi @Maciej Obuchowski and @Adam Bellemare, streaming jobs are meant to be covered by the spec but I agree there are a few details to iron out.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 18:31:55
-
-

In particular, streaming job still have runs. If they run continuously they do not run forever and you want to track that a job has been started at a point in time with a given version of the code, then stopped and started again after being upgraded for example.

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 18:32:23
-
-

I agree with @Maciej Obuchowski that we would also send OTHER events to keep track of progress.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 18:32:46
-
-

For example one could track checkpointing this way.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 18:35:35
-
-

For a Kafka topic you could have streaming dataset specific facets or even Kafka specific facets (ex: list of offsets we stopped reading at, schema id, etc )

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-03-17 10:05:53
-
-

*Thread Reply:* That's good idea.

- -

Now I'm wondering - let's say we want to track on which offset checkpoint ended processing. That would mean we want to expose checkpoint id, time, and offset. -I suppose we don't want to overwrite previous checkpoint info, so we want to have some collection of data in this facet.

- -

Something like appendable facets would be nice, to just add new checkpoint info to the collection, instead of having to push all the checkpoint infos all the time we just want to add new data point.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 18:45:23
-
-

Let me know if you have more thoughts

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-17 09:18:49
-
-

*Thread Reply:* Thanks Julien! I will try to wrap my head around some use-cases and see how it maps to the current spec. From there, I can see if I can figure out any proposals

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-17 13:43:29
-
-

*Thread Reply:* You can use the proposal issue template to propose a new facet for example: https://github.com/OpenLineage/OpenLineage/issues/new/choose

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Carlos Zubieta - (carlos.zubieta@wizeline.com) -
-
2021-03-16 18:49:00
-
-

Hi everyone, I just hear about OpenLineage and would like to learn more about it. The talks in the repo explain nicely the purpose and general ideas but I have a couple of questions. Are there any working implementations to produce/consume the spec? Also, are there any discussions/guides standard information, naming conventions, etc. in the facets?

- - - -
-
-
-
- - - - - - - - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 20:05:06
-
-

Hi @Carlos Zubieta here are some pointers ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-16 20:06:51
-
-

Marquez has a reference implementation of an OpenLineage endpoint. The Spark integration emits OpenLineage events.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Carlos Zubieta - (carlos.zubieta@wizeline.com) -
-
2021-03-16 20:56:37
-
-

Thank you @Julien Le Dem!!! Will take a close look

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-17 15:41:50
-
-

Q related to People/Teams/Stakeholders/Owners with regards to Jobs and Datasets (didn’t find anything in search): -Let’s say I have a dataset , and there are a number of other downstream jobs that ingest from it. In the case that the dataset is mutated in some way (or deleted, archived, etc), how would I go about notifying the stakeholders of that set about the changes?

- -

Just to be clear, I’m not concerned about the mechanics of doing this, just that there is someone that needs to be notified, who has self-registered on this set. -Similarly, I want to manage the datasets I am concerned about , where I can grab a list of all the datasets I tagged myself on.

- -

This seems to suggest that we could do with additional entities outside of Dataset, Run, Job. However, at the same time, I can see how this can lead to an explosion of other entities. Any thoughts on this particular domain? I think I could achieve something similar with aspects, but this would require that I update the aspect on each entity if I want to wholesale update the user contact, say their email address.

- -

Has anyone else run into something like this? Have you any advice? Or is this something that may be upcoming in the spec?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-17 16:42:24
-
-

*Thread Reply:* One thing we were considering is just adding these in as Facets ( Tags as per Marquez), and then plugging into some external people managing system. However, I think the question can be generalized to “should there be some sort of generic entity that can enable relationships between itself and Datasets, Jobs, Runs) as part of an integration element?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-18 16:03:55
-
-

*Thread Reply:* That’s a great topic of discussion. I would definitely use the OpenLineage facets to capture what you describe as aspect above. The current Marquez model has a simple notion of ownership at the namespace model but this need to be extended to enable use cases you are describing (owning a dataset or a job) . Right now the owner is just a generic identifier as a string (a user id or a group id for example). Once things are tagged (in some way), you can use the lineage API to find all the downstream or upstream jobs and datasets. In OpenLineage I would start by being able to capture the owner identifier in a facet with contact info optional if it’s available at runtime. It will have the advantage of keeping track of how that changed over time. This definitely deserves its own discussion.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-18 17:52:13
-
-

*Thread Reply:* And also to make sure I understand your use case, you want to be able to notify the consumers of a dataset that it is being discontinued/replaced/… ? What else are you thinking about?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-22 09:15:19
-
-

*Thread Reply:* Let me pull in my colleagues

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Bellemare - (adam.bellemare@shopify.com) -
-
2021-03-22 09:15:24
-
-

*Thread Reply:* Standby

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Olessia D'Souza - (olessia.dsouza@shopify.com) -
-
2021-03-22 10:59:57
-
-

*Thread Reply:* 👋 Hi Julien. I’m Olessia, I’m working on the metadata collection implementation with Adam. Some thought on this:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Olessia D'Souza - (olessia.dsouza@shopify.com) -
-
2021-03-22 11:00:45
-
-

*Thread Reply:* To start off, we’re thinking that there often isn’t a single owner, but rather a set of Stakeholders that evolve over time. So we’d like to be able to attach multiple entries, possibly of different types, to a Dataset. We’re also thinking that a dataset should have at least one owner. So a few things I’d like to confirm/discuss options:

  • If I were to stay true to the spec as it’s defined atm I wouldn’t be able to add a required facet. True/false?
  • According to the readme, “...emiting a new facet with the same name for the same entity replaces the previous facet instance for that entity entirely”. If we were to store multiple stakeholders, we’d have a field “stakeholders” and its value would be a list? This would make queries involving stakeholders not very straightforward. If the facet is overwritten every time, how do I a) add individuals to the list b) track changes to the list over time. Let me know what I’m missing, because based on what you said above tracking facet changes over time is possible.
  • Run events are issued by a scheduler. Why should it be in the domain of the scheduler to know the entire list of Stakeholders?
  • I noticed that Marquez has separate endpoints to capture information about Datasets, and some additional information beyond what’s described in the spec is required. In this context, we could add a required Stakeholder facets on a Dataset, and potentially even additional end points to add and remove Stakeholders. Is that a valid way to go about this, in your opinion?
  • -
- -

Curious to hear your thoughts on all of this!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-24 17:06:50
-
-

*Thread Reply:* > To start off, we’re thinking that there often isn’t a single owner, but rather a set of Stakeholders that evolve over time. So we’d like to be able to attach multiple entries, possibly of different types, to a Dataset. We’re also thinking > that a dataset should have at least one owner. So a few things I’d like to confirm/discuss options: -> -> If I were to stay true to the spec as it’s defined atm I wouldn’t be able to add a required facet. True/false? -Correct, The spec defines what facets looks like (and how you can make your own custom facets) but it does not make statements about whether facets are required. However, you can have your own validation and make certain things required if you wish to on the client side? -  -> - According to the readme, “...emiting a new facet with the same name for the same entity replaces the previous facet instance for that entity entirely”. If we were to store multiple stakeholders, we’d have a field “stakeholders” and its value would be a list?  -Yes, I would indeed consider such a facet on the dataset with the stakeholder.

- -

> This would make queries involving stakeholders not very straightforward. If the facet is overwritten every time, how do I  -> a) add individuals to the list -You would provide the new list of stake holders. OpenLineage standardizes lineage collection and defines a format for expressing metadata. Marquez will keep track of how metadata has evolved over time.

- -

> b) track changes to the list over time. Let me know what I’m missing, because based on what you said above tracking facet changes over time is possible. -Each event is an observation at a point in time. In a sense they are each immutable. There’s a “current” version but also all the previous ones stored in Marquez. -Marquez stores each version of a dataset it received through OpenLineage and exposes an API to see how that evolved over time.

- -

> - Run events are issued by a scheduler. Why should it be in the domain of the scheduler to know the entire list of Stakeholders? -The scheduler emits the information that it knows about. For example: “I started this job and it’s reading from this dataset and is writing to this other dataset.” -It may or may not be in the domain of the scheduler to know the list of stakeholders. If not then you could emit different types of events to add a stakeholder facet to a dataset. We may want to refine the spec for that. Actually I would be curious to hear what you think should be the source of truth for stakeholders. It is not the intent to force everything coming from the scheduler.

  • example 1: stakeholders are people on call for the job, they are defined as part of the job and that also enables alerting
  • example 2: stakeholders are consumers of the jobs: they may be defined somewhere else
  • -
- -

> - I noticed that Marquez has separate endpoints to capture information about Datasets, and some additional information beyond what’s described in the spec is required. In this context, we could add a required Stakeholder facets on a Dataset, and potentially even additional end points to add and remove Stakeholders. Is that a valid way to go about this, in your opinion?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-24 17:06:50
-
-

*Thread Reply:* -Marquez existed before OpenLineage. In particular the /run end-point to create and update runs will be deprecated as the OpenLineage /lineage endpoint replaces it. At the moment we are mapping OpenLineage metadata to Marquez. Soon Marquez will have all the facets exposed in the Marquez API. (See: https://github.com/MarquezProject/marquez/pull/894/files) -We could make Marquez Configurable or Pluggable for validation purposes. There is already a notion of LineageListener for example. -Although Marquez collects the metadata. I feel like this validation would be better upstream or with some some other mechanism. The question is when do you create a dataset vs when do you become a stakeholder? What are the various stakeholder and what is the responsibility of the minimum one stakeholder? I would probably make it required to deploy the job that the stakeholder is defined. This would apply to the output dataset and would be collected in Marquez.

- -

In general, you are very welcome to make suggestion on additional endpoints for Marquez and I’m happy to discuss this further as those ideas are progressing.

- -

> Curious to hear your thoughts on all of this! -Thanks for taking the time!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-05-24 16:27:03
-
-

*Thread Reply:* https://openlineage.slack.com/archives/C01CK9T7HKR/p1621887895004200

-
- - -
- - - } - - Julien Le Dem - (https://openlineage.slack.com/team/U01DCLP0GU9) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-24 18:58:00
-
-

Thanks for the Python client submission @Maciej Obuchowski -https://github.com/OpenLineage/OpenLineage/pull/34

-
- - -
- - - } - - mobuchowski - (https://github.com/mobuchowski) -
- - - - - - - - - - - - - - - -
- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-24 18:59:50
-
-

I also have added a spec to define a standard naming policy. Please review: https://github.com/OpenLineage/OpenLineage/pull/31/files

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-03-31 23:45:35
-
-

We now have a python client! Thanks @Maciej Obuchowski

- - - -
- 👍 Maciej Obuchowski, Kevin Mellott, Ravi Suhag, Ross Turk, Willy Lulciuc, Mirko Raca -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2021-04-02 19:37:36
-
-

Question, what do you folks see as the canonical mechanism for receiving OpenLineage events? Do you see an agent like statsd? Or do you see this as purely an API spec that services could implement? Do you see producers of lineage data writing code to send formatted OpenLineage payloads to arbitrary servers that implement receipt of these events? Curious what the long-term vision is here related to how an ecosystem of producers and consumers of payloads would interact?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-02 19:54:52
-
-

*Thread Reply:* Marquez is the reference implementation for receiving events and tracking changes. But the definition of the API let’s other receive them (and also enables using openlineage events to sync between systems)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-02 19:55:32
-
-

*Thread Reply:* In particular, Egeria is involved in enabling receiving and emitting openlineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2021-04-03 18:03:01
-
-

*Thread Reply:* Thanks @Julien Le Dem. So to get specific, if dbt were to emit OpenLineage events, how would this work? Would dbt Cloud hypothetically allow users to configure an endpoint to send OpenLineage events to, similar in UI implementation to configuring a Stripe webhook perhaps? And then whatever server the user would input here would point to somewhere that implements receipt of OpenLineage payloads? This is all a very hypothetical example, but trying to ground it in something I have a solid mental model for.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-04-05 17:51:57
-
-

*Thread Reply:* hypothetically speaking, that all sounds right. so a user, who, e.g., has a dbt pipeline and an AWS glue pipeline could configure both of those projects to point to the same open lineage service and get their entire lineage graph even if the two pipelines aren't connected.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-04-06 20:33:51
-
-

*Thread Reply:* Yeah, OpenLineage events need to be published to a backend (can be Kafka, can be a graphDB, etc). Your Stripe webhook analogy is aligned with how events can be received. For example, in Marquez, we expose a /lineage endpoint that consumes OpenLineage events. We then map an OpenLineage event to the Marquez model (sources, datasets, jobs, runs) that’s persisted in postgres.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2021-04-07 10:47:06
-
-

*Thread Reply:* Thanks both!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-13 20:52:53
-
-

*Thread Reply:* sorry, I was away last week. Yes that sounds right.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Moravec - (jkb.moravec@gmail.com) -
-
2021-04-07 09:41:09
-
-

Hi everyone, I just started discovering OpenLineage and Marquez, it looks great and the quick-start tutorial is very helpful! One question though, I pushed some metadata to Marquez using the Lineage POST endpoint, and when I try to confirm that everything was created using Marquez REST API, everything is there ... but I don't see these new objects in the Marquez UI... what is the best way how to investigate where the issue is?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-04-14 13:12:31
-
-

*Thread Reply:* Welcome, @Jakub Moravec 👋 . Given that you're able to retrieve metadata using the marquezAPI, you should be able to also view dataset and job metadata in the UI. Mind using the search bar in the top right-hand corner in the UI to see if your metadata is searchable? The UI only renders jobs and datasets that are connected in the lineage graph. We're working towards a more general metadata exploration experience, but currently the lineage graph is the main experience.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakob Külzer - (jakob.kulzer@shopify.com) -
-
2021-04-08 11:23:18
-
-

Hi friends, we're exploring OpenLineage and while building out integration for existing systems we realized there is no obvious way for an input to specify what "version" of that dataset is being consumed. For example, we have a job that rolls up a variable number of what OpenLineage calls dataset versions. By specifying only that dataset, we can't represent the specific instances of it that are actually rolled up. We think that would be a very important part of the lineage graph.

- -

Are there any thoughts on how to address specific dataset versions? Is this where custom input facets would come to play?

- -

Furthermore, based on the spec, it appears that events can provide dataset facets for both inputs and outputs and this seems to open the door to race conditions in which two runs concurrently create dataset versions of a dataset. Is this where the eventTime field is supposed to be used?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-13 20:56:42
-
-

*Thread Reply:* Your intuition is right here. I think we should define an input facet that specifies which dataset version is being read. Similarly you would have an output facet that specifies what version is being produced. This would apply to storage layers like Deltalake and Iceberg as well.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-13 20:57:58
-
-

*Thread Reply:* Regarding the race condition, input and output facets are attached to the run. The version of the dataset that was read is an attribute of a run and should not modify the dataset itself.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-13 21:01:34
-
-

*Thread Reply:* See the Dataset description here: https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md#core-lineage-model

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Stephen Pimentel - (stephenpiment@gmail.com) -
-
2021-04-14 18:20:42
-
-

Hi everyone! I’m exploring what existing, open-source integrations are available, specifically for Spark, Airflow, and Trino (PrestoSQL). My team is looking both to use and contribute to these integrations. I’m aware of the integration in the Marquez repo: -• Spark: https://github.com/MarquezProject/marquez/tree/main/integrations/spark -• Airflow: https://github.com/MarquezProject/marquez/tree/main/integrations/airflow -Are there other efforts I should be aware of, whether for these two or for Trino? Thanks for any information!

- - - -
- 👋 Arthur Wiedmer, Maciej Obuchowski, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2021-04-19 16:17:06
-
-

*Thread Reply:* I think for Trino integration you'd be looking at writing a Trino extractor if I'm not mistaken, yes?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2021-04-19 16:17:23
-
-

*Thread Reply:* But extractor would obviously be at the Marquez layer not OpenLineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zachary Friedman - (zafriedman@gmail.com) -
-
2021-04-19 16:19:00
-
-

*Thread Reply:* And hopefully the metadata you'd be looking to extract from Trino wouldn't have any connector-specific syntax restrictions.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Antonio Moctezuma - (antoniomoctezuma@northwesternmutual.com) -
-
2021-04-16 15:37:24
-
-

Hey all! Right now I am working on getting OpenLineage integrated with some microservices here at Northwestern Mutual and was looking for some advice. The current service I am trying to integrate it with moves files from one AWS S3 bucket to another so i was hoping to track that movement with OpenLineage. However by my understanding the inputs that would be passed along in a runEvent are meant to be datasets that have schema and other properties. But I wanted to have that input represent the file being moved. Is this a proper usage of Open Lineage? Or is this a use case that is still being developed? Any and all help is appreciated!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-19 21:42:14
-
-

*Thread Reply:* This is a proper usage. That schema is optional if it’s not available.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-19 21:43:27
-
-

*Thread Reply:* You would model it as a job reading from a folder (the input dataset) in the input bucket and writing to a folder (the output dataset) in the output bucket

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-19 21:43:58
-
-

*Thread Reply:* This is similar to how this is modeled in the spark integration (spark job reading and writing to s3 buckets)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-19 21:47:06
-
-

*Thread Reply:* for reference: getting the urls for the inputs: https://github.com/MarquezProject/marquez/blob/c5e5d7b8345e347164aa5aa173e8cf35062[…]marquez/spark/agent/lifecycle/plan/HadoopFsRelationVisitor.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-19 21:47:54
- -
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-19 21:48:48
-
-

*Thread Reply:* See the spec (comments welcome) for the naming of S3 datasets: https://github.com/OpenLineage/OpenLineage/pull/31/files#diff-e3a8184544e9bc70d8a12e76b58b109051c182a914f0b28529680e6ced0e2a1cR87

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Antonio Moctezuma - (antoniomoctezuma@northwesternmutual.com) -
-
2021-04-20 11:11:38
-
-

*Thread Reply:* Hey Julien, thank you so much for getting back to me. I'll take a look at the documentation/implementations you've sent me and will reach out if I have anymore questions. Thanks again!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Antonio Moctezuma - (antoniomoctezuma@northwesternmutual.com) -
-
2021-04-20 17:39:24
-
-

*Thread Reply:* @Julien Le Dem I left a quick comment on that spec PR you mentioned. Just wanted to let you know.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-20 17:49:15
-
-

*Thread Reply:* thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Josh Quintus - (josh.quintus@gmail.com) -
-
2021-04-28 09:41:45
-
-

Hello all. I was reading through the OpenLineage documentation on GitHub and noticed a very minor typo (an instance where and should have been an). I was just about to create a PR for it but wanted to check with someone to see if that would be something that the team is interested in.

- -

Thanks for the tool, I'm looking forward to learning more about it.

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-04-28 20:56:53
-
-

*Thread Reply:* Thank you! Please do fix typos, I’ll approve your PR.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Josh Quintus - (josh.quintus@gmail.com) -
-
2021-04-28 23:21:44
-
-

*Thread Reply:* No problem. Here's the PR. https://github.com/OpenLineage/OpenLineage/pull/47

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Josh Quintus - (josh.quintus@gmail.com) -
-
2021-04-28 23:22:41
-
-

*Thread Reply:* Once I fixed the ones I saw I figured "Why not just run it through a spell checker just in case... " and found a few additional ones.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2021-05-20 16:30:05
-
-

For your enjoyment, @Julien Le Dem was on the Data Engineering Podcast talking about OpenLineage!

- -

https://www.dataengineeringpodcast.com/openlineage-data-lineage-specification-episode-187/

-
-
Data Engineering Podcast
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Willy Lulciuc, Maciej Obuchowski, Peter Hicks, Mario Measic -
- -
- ❤️ Willy Lulciuc, Maciej Obuchowski, Peter Hicks, Rogier Werschkull, A Pospiech, Kedar Rajwade, James Le -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2021-05-20 16:30:09
-
-

share and enjoy 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-05-21 18:21:23
-
-

Also happened yesterday: OpenLineage being accepted by the LFAI&Data.

- - - -
- 🎉 Abe Gong, Willy Lulciuc, Peter Hicks, Maciej Obuchowski, Daniel Henneberger, Harel Shein, Antonio Moctezuma, Josh Quintus, Mariusz Górski, James Le -
- -
- 👏 Matt Turck -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-05-21 19:20:55
-
-

*Thread Reply:* Huge milestone! 🙌💯🎊

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-05-24 16:24:55
-
-

I have created a channel to discuss <#C022MMLU31B|user-generated-metadata> since this came up in a few discussions.

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jonathon Mitchal - (bigmit83@gmail.com) -
-
2021-05-31 01:28:35
-
-

hey guys, does anyone have any sample openlineage schemas for S3 please? potentially including facets for attributes in a parquet file? that would help heaps thanks. i am trying to slowly bring in a common metadata interface and this will help shape some of the conversations 🙂 with a move to marquez/datahub et al over time

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-01 17:56:16
-
-

*Thread Reply:* We currently don’t have S3 (or distributed filesystem specific facets) at the moment, but such support would be a great addition! @Julien Le Dem would be best to answer if any work has been done in this area 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-01 17:57:19
-
-

*Thread Reply:* Also, happy to answer any Marquez specific questions, @Jonathon Mitchal when you’re thinking of making the move. Marquez supports OpenLineage out of the box 🙌

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-01 19:58:21
-
-

*Thread Reply:* @Jonathon Mitchal You can follow the naming strategy here for referring to a S3 dataset: https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md#s3

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-01 19:59:30
-
-

*Thread Reply:* There is no facet yet for the attributes of a Parquet file. I can give you feedback if you want to start defining one. https://github.com/OpenLineage/OpenLineage/blob/main/CONTRIBUTING.md#proposing-changes

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-01 20:00:50
-
-

*Thread Reply:* Adding Parquet metadata as a facet would make a lot of sense. It is mainly a matter of specifying what the json would look like

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-01 20:01:54
-
-

*Thread Reply:* for reference the parquet metadata is defined here: https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jonathon Mitchal - (bigmit83@gmail.com) -
-
2021-06-01 23:20:50
-
-

*Thread Reply:* Thats awesome, thanks for the guidance Willy and Julien ... will report back on how we get on

- - - -
- 🙏 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Pedram - (pedram@hightouch.io) -
-
2021-06-01 17:52:08
-
-

hi all! just wanted to introduce myself, I'm the Head of Data at Hightouch.io, we build reverse etl pipelines from the warehouse into various destinations. I've been following OpenLineage for a while now and thought it would be nice to build and expose our runs via the standard and potentially save that back to the warehouse for analysis/alerting. Really interesting concept, looking forward to playing around with it

- - - -
- 👋 Willy Lulciuc, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-01 20:02:34
-
-

*Thread Reply:* Welcome! Let use know if you have any questions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Leo - (leorobinovitch@gmail.com) -
-
2021-06-03 19:22:10
-
-

Hi all! I have a noob question. As I understand it, one of the main purposes of OpenLineage is to avoid runaway proliferation of bespoke connectors for each data lineage/cataloging/provenance tool to each data source/job scheduler/query engine etc. as illustrated in the problem diagram from the main repo below.

- -

My understanding is that instead, things push to OpenLineage which provides pollable endpoints for metadata tools.

- -

I’m looking at Amundsen, and it seems to have bespoke connectors, but these are pull-based - I don’t need to instrument my data resources to push to Amundsen, I just need to configure Amundsen to poll my data resources (e.g. the Postgres metadata extractor here).

- -

Can OpenLineage do something similar where I can just point it at something to extract metadata from it, rather than instrumenting that thing to push metadata to OpenLineage? If not, I’m wondering why?

- -

Is it the case that Open Lineage defines the general framework but doesn’t actually enforce push or pull-based implementations, it just so happens that the reference implementation (Marquez) uses push?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-06-04 04:45:15
-
-

*Thread Reply:* > Is it the case that Open Lineage defines the general framework but doesn’t actually enforce push or pull-based implementations, it just so happens that the reference implementation (Marquez) uses push? -Yes, at core OpenLineage just enforces format of the event. We also aim to provide clients - REST, later Kafka, etc. and some reference implementations - which are now in Marquez repo. https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/doc/Scope.png

- -

There are several differences between push and poll models. Most important one is that with push model, latency between your job and emitting OpenLineage events is very low. With some systems, with internal, push based model you have more runtime metadata available than when looking from outside. Another one would be that naive poll implementation would need to "rebuild the world" on each change. There are also disadvantages, such as that usually, it's easier to write plugin that extracts data from outside the system than hooking up to the internals.

- -

Integration with Amundsen specifically is planned. Although, right now it seems to me that way to do it is to bypass the databuilder framework and push directly to underlying database, such as Neo4j, or make Marquez backend for Metadata Service: https://raw.githubusercontent.com/amundsen-io/amundsen/master/docs/img/Amundsen_Architecture.png

-
- - - - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - - - -
- - - -
- ❤️ Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Leo - (leorobinovitch@gmail.com) -
-
2021-06-04 10:39:51
-
-

*Thread Reply:* This is really helpful, thank you @Maciej Obuchowski!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Leo - (leorobinovitch@gmail.com) -
-
2021-06-04 10:40:59
-
-

*Thread Reply:* Similar to what you say about push vs pull, I found DataHub’s comment to be interesting yesterday: -> Push is better than pull: While pulling metadata directly from the source seems like the most straightforward way to gather metadata, developing and maintaining a centralized fleet of domain-specific crawlers quickly becomes a nightmare. It is more scalable to have individual metadata providers push the information to the central repository via APIs or messages. This push-based approach also ensures a more timely reflection of new and updated metadata.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-04 21:59:59
-
-

*Thread Reply:* yes. You can also “pull-to-push” for things that don’t push.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mariusz Górski - (gorskimariusz13@gmail.com) -
-
2021-06-17 10:01:37
-
-

*Thread Reply:* @Maciej Obuchowski any particular reason for bypassing databuilder and go directly to neo4j? By design databuilder is supposed to be very abstract so any kind of backend can be used with Amundsen. Currently there are at least 4 and neo4j is just one of them.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-06-17 10:28:52
-
-

*Thread Reply:* Databuilder's pull model is very different than OpenLineage's push model, where the events are generated while the dataset itself is generated.

- -

So, how would you see using it? Just to proxy the events to concrete search and metadata backend?

- -

I'm definitely not an Amundsen expert, so feel free to correct me if I'm getting it wrong.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-07 19:59:28
-
-

*Thread Reply:* @Mariusz Górski my slide that Maciej is referring to might be a bit misleading. The Amundsen integration does not exist yet. Please add your input in the ticket: https://github.com/OpenLineage/OpenLineage/issues/86

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mariusz Górski - (gorskimariusz13@gmail.com) -
-
2021-07-09 02:22:06
-
-

*Thread Reply:* thanks Julien! will take a look

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kedar Rajwade - (kedar@cloudzealous.com) -
-
2021-06-08 10:00:47
-
-

@here Hello, My name is Kedar Rajwade. I happened to come across the OpenLineage project and it looks quite interesting. Is there some kind of getting start guide that I can follow. Also are there any weekly/bi-weekly calls that I can attend to know the current/future plans ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-08 14:16:42
-
-

*Thread Reply:* Welcome! You can look here: https://github.com/OpenLineage/OpenLineage/blob/main/CONTRIBUTING.md

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-08 14:17:19
-
-

*Thread Reply:* We’re starting a monthly call, I will publish more details here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-08 14:17:48
-
-

*Thread Reply:* Do you have a specific use case in mind?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kedar Rajwade - (kedar@cloudzealous.com) -
-
2021-06-08 21:32:02
-
-

*Thread Reply:* Nothing specific yet

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-09 00:49:09
-
-

The first instance of the OpenLineage Monthly meeting is tomorrow June 9 at 9am PT: https://calendar.google.com/event?action=TEMPLATE&tmeid=MDRubzk0cXAwZzA4bXRmY24yZjBkdTZzbDNfMjAyMTA2MDlUMTYwMDAwWiBqdWxpZW5AZGF0YWtpbi5jb20&tmsrc=julien%40datakin.com&scp=ALL|https://calendar.google.com/event?action=TEMPLATE&tmeid=MDRubzk0cXAwZzA4bXRmY24yZjBkdT[…]qdWxpZW5AZGF0YWtpbi5jb20&tmsrc=julien%40datakin.com&scp=ALL

-
-
accounts.google.com
- - - - - - - - - - - - - - - -
- - - -
- 🎉 Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Victor Shafran - (victor.shafran@databand.ai) -
-
2021-06-09 08:33:45
-
-

*Thread Reply:* Hey @Julien Le Dem, I can’t add a link to my calendar… Can you send an invite?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Leo - (leorobinovitch@gmail.com) -
-
2021-06-09 11:00:05
-
-

*Thread Reply:* Same!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-09 11:01:45
-
-

*Thread Reply:* Will do. Also if you send your email in dm you can get added to the invite

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-09 11:59:22
- -
-
-
- - - - - -
-
- - - - -
- -
Kedar Rajwade - (kedar@cloudzealous.com) -
-
2021-06-09 12:00:30
-
-

*Thread Reply:* @Julien Le Dem Can't access the calendar.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kedar Rajwade - (kedar@cloudzealous.com) -
-
2021-06-09 12:00:43
-
-

*Thread Reply:* Can you please share the meeting details

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-09 12:01:12
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-09 12:01:24
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-06-09 12:01:55
-
-

*Thread Reply:* The calendar invite says 9am PDT, not 10am. Which is right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kedar Rajwade - (kedar@cloudzealous.com) -
-
2021-06-09 12:01:58
-
-

*Thread Reply:* Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-09 13:25:13
-
-

*Thread Reply:* it is 9am,thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-09 18:37:02
-
-

*Thread Reply:* I have posted the notes on the wiki (includes link to recording) https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+meeting+archive

- - - -
- 🙌 Willy Lulciuc, Victor Shafran -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Pedram - (pedram@hightouch.io) -
-
2021-06-10 13:53:18
-
-

Hi! Are there some 'close-to-real' sample events available to build off and compare to? I'd like to make sure what I'm outputting makes sense but it's hard when only comparing to very synthetic data.

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-10 13:55:51
-
-

*Thread Reply:* We’ve recently worked on a getting started guide for OpenLineage that we’d like to publish on the OpenLineage website. That should help with making things a bit more clear on usage. @Ross Turk / @Julien Le Dem might know of when that might become available. Otherwise, happy to answer any immediate questions you might have about posting/collecting OpenLineage events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Pedram - (pedram@hightouch.io) -
-
2021-06-10 13:58:58
-
-

*Thread Reply:* Here's a sample of what I'm producing, would appreciate any feedback if it's on the right track. One of our challenges is that 'dataset' is a little loosely defined for us as outputs since we take data from a warehouse/database and output to things like Salesforce, Airtable, Hubspot and even Slack.

- -

{ - eventType: 'START', - eventTime: '2021-06-09T08:45:00.395+00:00', - run: { runId: '2821819' }, - job: { - namespace: '<hightouch://my-workspace>', - name: '<hightouch://my-workspace/sync/123>' - }, - inputs: [ - { - namespace: '<snowflake://abc1234>', - name: '<snowflake://abc1234/my_source_table>' - } - ], - outputs: [ - { - namespace: '<salesforce://mysf_instance.salesforce.com>', - name: 'accounts' - } - ], - producer: 'hightouch-event-producer-v.0.0.1' -} -{ - eventType: 'COMPLETE', - eventTime: '2021-06-09T08:45:30.519+00:00', - run: { runId: '2821819' }, - job: { - namespace: '<hightouch://my-workspace>', - name: '<hightouch://my-workspace/sync/123>' - }, - inputs: [ - { - namespace: '<snowflake://abc1234>', - name: '<snowflake://abc1234/my_source_table>' - } - ], - outputs: [ - { - namespace: '<salesforce://mysf_instance.salesforce.com>', - name: 'accounts' - } - ], - producer: 'hightouch-event-producer-v.0.0.1' -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Pedram - (pedram@hightouch.io) -
-
2021-06-10 14:02:59
-
-

*Thread Reply:* One other question I have is really around how customers might take the metadata we emit at Hightouch and integrate that with OpenLineage metadata emitted from other tools like dbt, Airflow, and other integrations to create a true lineage of their data.

- -

For example, if the data goes from S3 -&gt; Snowflake via Airflow and then from Snowflake -&gt; Salesforce via Hightouch, this would mean both Airflow/Hightouch would need to define the Snowflake dataset in exactly the same way to get the benefits of lineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-17 19:13:14
-
-

*Thread Reply:* Hey, @Dejan Peretin! Sorry for the late replay here! Your OL events look solid and only have a few of suggestions:

- -
  1. I would use a valid UUID for the run ID as the spec will standardize on that type, see https://github.com/OpenLineage/OpenLineage/pull/65
  2. You don’t need to provide the input dataset again on the COMPLETE event as the input datasets have already been associated with the run ID
  3. For the producer, I’d recommend using a link to the producer source code version to link the producer version with the OL event that was emitted.
  4. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-17 19:13:59
-
-

*Thread Reply:* You can now reference our OL getting started guide for a close-to-real example 🙂 , see http://openlineage.io/getting-started

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-17 19:18:19
-
-

*Thread Reply:* > … this would mean both Airflow/Hightouch would need to define the Snowflake dataset in exactly the same way to get the benefits of lineage? -Yes, the dataset and the namespace that it was registered under would have to be the same to properly build the lineage graph. We’re working on defining unique dataset names and have made some good progress in this area. I’d suggest reviewing the OL naming conventions if you haven’t already: https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

- - - -
- 🙌 Pedram -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Pedram - (pedram@hightouch.io) -
-
2021-06-19 01:09:27
-
-

*Thread Reply:* Thanks! I'm really excited to see what the future holds, I think there are so many great possibilities here. Will be keeping a watchful eye. 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-22 15:14:39
-
-

*Thread Reply:* 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Antonio Moctezuma - (antoniomoctezuma@northwesternmutual.com) -
-
2021-06-11 09:53:39
-
-

Hey everyone! I've been running into a minor OpenLineage issue and I was curious if anyone had any advice. So according to OpenLineage specs its suggested that for a dataset coming from S3 that its namespace be in the form of s3://<bucket>. We have implemented our code to do so and RunEvents are published without issue but when trying to retrieve the information of this RunEvent (like the job) I am unable to retrieve it based on namespace from both /api/v1/namespaces/s3%3A%2F%2F<bucket name> (encoding since : and / are special characters in URL) and the beta endpoint of /api/v1-beta/lineage?nodeId=<dataset>:<namespace>:<name> and instead get a 400 error with a "Ambiguous Segment in URI" message.

- -

Any and all advice would be super helpful! Thank you so much!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-06-11 10:16:41
-
-

*Thread Reply:* Sounds like problem is with Marquez - might be worth to open issue here: https://github.com/MarquezProject/marquez/issues

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Antonio Moctezuma - (antoniomoctezuma@northwesternmutual.com) -
-
2021-06-11 10:25:58
-
-

*Thread Reply:* Thank you! Will do.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-11 15:31:41
-
-

*Thread Reply:* Thanks for reporting Antonio

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-16 19:01:52
-
-

I have opened a proposal for versioning and publishing the spec: https://github.com/OpenLineage/OpenLineage/issues/63

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-06-18 15:00:20
-
-

We have a nice OpenLineage website now. https://openlineage.io/ -Thank you to contributors: @Ross Turk @Willy Lulciuc @Michael Collado!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- ❤️ Ross Turk, Kevin Mellott, Leo, Peter Hicks, Willy Lulciuc, Edgar Ramírez Mondragón, Maciej Obuchowski, Supratim Mukherjee -
- -
- 👍 Kedar Rajwade, Mukund -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Leo - (leorobinovitch@gmail.com) -
-
2021-06-18 15:09:18
-
-

*Thread Reply:* Very nice!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:08:43
-
-

Hi everyone! Im trying to run a spark job with openlineage and marquez...But Im getting some errors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:09:28
-
-

*Thread Reply:* Here is the error...

- -

21/06/20 11:02:56 WARN ArgumentParser: missing jobs in [, api, v1, namespaces, spark_integration] at 5 -21/06/20 11:02:56 WARN ArgumentParser: missing runs in [, api, v1, namespaces, spark_integration] at 7 -21/06/20 11:03:01 ERROR AsyncEventQueue: Listener SparkListener threw an exception -java.lang.NullPointerException - at marquez.spark.agent.SparkListener.onJobEnd(SparkListener.java:165) - at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:39) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92) - at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92) - at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) - at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:10:41
-
-

*Thread Reply:* Here is my code ...

- -

```from pyspark.sql import SparkSession -from pyspark.sql.functions import lit

- -

spark = SparkSession.builder \ - .master('local[1]') \ - .config('spark.jars.packages', 'io.github.marquezproject:marquezspark:0.15.2') \ - .config('spark.extraListeners', 'marquez.spark.agent.SparkListener') \ - .config('openlineage.url', 'http://localhost:5000/api/v1/namespaces/spark_integration/') \ - .config('openlineage.namespace', 'sparkintegration') \ - .getOrCreate()

- -

Supress success

- -

spark.sparkContext.jsc.hadoopConfiguration().set('mapreduce.fileoutputcommitter.marksuccessfuljobs', 'false') -spark.sparkContext.jsc.hadoopConfiguration().set('parquet.summary.metadata.level', 'NONE')

- -

dfsourcetrip = spark.read \ - .option('inferSchema', True) \ - .option('header', True) \ - .option('delimiter', '|') \ - .csv('/Users/bcanal/Workspace/poc-marquez/pocspark/resources/data/source/trip.csv') \ - .createOrReplaceTempView('sourcetrip')

- -

dfdrivers = spark.table('sourcetrip') \ - .select('driver') \ - .distinct() \ - .withColumn('drivername', lit('Bruno')) \ - .withColumnRenamed('driver', 'driverid') \ - .createOrReplaceTempView('source_driver')

- -

df = spark.sql( - """ - SELECT d., t. - FROM sourcetrip t, sourcedriver d - WHERE t.driver = d.driver_id - """ -)

- -

df.coalesce(1) \ - .drop('driverid') \ - .write.mode('overwrite') \ - .option('path', '/Users/bcanal/Workspace/poc-marquez/pocspark/resources/data/target') \ - .saveAsTable('trip')```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:12:27
-
-

*Thread Reply:* After this execution, I can see just the source from first dataframe called dfsourcetrip...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:13:04
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:13:45
-
-

*Thread Reply:* I was expecting to see all source dataframes, target dataframes and the job

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:14:35
-
-

*Thread Reply:* I`m running spark local on my laptop and I followed marquez getting start to up it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno Canal - (bcanal@gmail.com) -
-
2021-06-20 10:14:44
-
-

*Thread Reply:* Can anyone help me?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-06-22 14:42:03
-
-

*Thread Reply:* I think there's a race condition that causes the context to be missing when the job finishes too quickly. If I just add -spark.sparkContext.setLogLevel('info') -to the setup code, everything works reliably. Also works if you remove the master('local[1]') - at least when running in a notebook

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 13:48:34
-
-

@here Hi everyone,

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 13:49:10
-
-

i need to implement export functionality for my data lineage project.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 13:50:26
-
-

as part of this i need to convert the information fetched from graph db (neo4j) to CSV format and send in response.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 13:51:21
-
-

can someone please direct me to the CSV format of open lineage data

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-22 15:26:55
-
-

*Thread Reply:* Hey, @anup agrawal. This is a great question! The OpenLineage spec is defined using the Json Schema format, and it’s mainly for the transport layer of OL events. In terms of how OL events are eventually stored, that’s determined by the backend consumer of the events. For example, Marquez stores the raw event in a lineage_events table, but that’s mainly for convenience and replayability of events . As for importing / exporting OL events from storage, as long as you can translate the CSV to an OL event, then HTTP backends like Marquez that support OL can consume them

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-22 15:27:29
-
-

*Thread Reply:* > as part of this i need to convert the information fetched from graph db (neo4j) to CSV format and send in response. -Depending on the exported CSV, I would translate the CSV to an OL event, see https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.json

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-22 15:29:58
-
-

*Thread Reply:* When you say “send in response”, who would be the consumer of the lineage metadata exported for the graph db?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 23:33:05
-
-

*Thread Reply:* so far what i understood about my requirement is that. 1. my service will receive OL events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 23:33:24
-
-

*Thread Reply:* 2. store it in graph db (neo4j)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 23:38:28
-
-

*Thread Reply:* 3. this lineage information will be displayed on ui, based on the request.

- -
  1. now my part in that is to implement an Export functionality, so that someone can download it from UI. in UI there will be option to download the report.
  2. so i need to fetch data from storage and convert it into CSV format, send to UI
  3. they can download the report from UI.
  4. -
- -

SO my question here is that i have never seen how that CSV report look like and how do i achieve that ? -when i had asked my team how should CSV look like they directed me to your website.

- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-01 19:18:35
-
-

*Thread Reply:* I see. @Julien Le Dem might have some thoughts on how an OL event would be represented in different formats like CSV (but, of course, there’s also avro, parquet, etc). The Json Schema is the recommended format for importing / exporting lineage metadata. And, for a file, each line would be an OL event. But, given that CSV is a requirement, I’m not sure how that would be structured. Or at least, it’s something we haven’t previously discussed

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
anup agrawal - (anup.agrawal500@gmail.com) -
-
2021-06-22 13:51:51
-
-

i am very new to this .. sorry for any silly questions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-06-22 20:29:22
-
-

*Thread Reply:* There are no silly questions! 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdulmalik AN - (lord.of.d1@gmail.com) -
-
2021-06-29 11:46:33
-
-

Hello, I have read every topic and listened to 4 talks and the podcast episode about OpenLineage and Marquez due to my basic understanding for the data engineering field, I have a couple of questions which I did not understand: -1- What are events and facets and what are their purpose? -2- Can I implement the OpenLineage API to any software? or does the software needs to be integrated with the OpenLineage API? -3- Can I say that OpenLineage is about observability and Marquez is about collecting and storing the metadata? -Thank you all for being cooperative.

- - - -
- 👍 Stephen Pimentel, Kedar Rajwade -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-01 19:07:27
-
-

*Thread Reply:* Welcome, @Abdulmalik AN 👋 Hopefully the talks / podcasts have been informative! And, sure, happy to clarify a few things:

- -

> What are events and facets and what are their purpose? -An OpenLineage event is used to capture the lineage metadata at a point in time for a given run in execution. That is, the runs state transition, the inputs and outputs consumed/produced and the job associated with the run are part of the event. The metadata defined in the event can then be consumed by an HTTP backend (as well as other transport layers). Marquez is an HTTP backend implementation that consumes OL events via a REST API call. The OL core model only defines the metadata that should be captured in the context of a run, while the processing of the event is up to the backend implementation consuming the event (think consumer / producer model here). For Marquez, the end-to-end lineage metadata is stored for pipelines (composed of multiple jobs) with built-in metadata versioning support. Now, for the second part of your question: the OL core model is highly extensible via facets. A facet is user-defined metadata and enables entity enrichment. I’d recommend checking out the getting started guide for OL 🙂

- -

> Can I implement the OpenLineage API to any software? or does the software needs to be integrated with the OpenLineage API? -Do you mean HTTP vs other protocols? Currently, OL defines an API spec for HTTP backends, that Marquez has adopted to ingest OL events. But there are also plans to support Kafka and many others.

- -

> Can I say that OpenLineage is about observability and Marquez is about collecting and storing the metadata? -> Thank you all for being cooperative. -Yep! OL defines the metadata to collect for running jobs / pipelines that can later be used for root cause analysis / troubleshooting failing jobs, while Marquez is a metadata service that implements the OL standard to both consume and store lineage metadata while also exposing a REST API to query dataset, job and run metadata.

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Kedar Rajwade -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Nic Colley - (nic.colley@alation.com) -
-
2021-06-30 17:46:52
-
-

Hi OpenLineage team! Has anyone got this working on databricks yet? I’ve been working on this for a few days and can’t get it to register lineage. I’ve attached my notebook in this thread.

- -

silly question - does the jar file need be on the cluster? -Which versions of spark does OpenLineage support?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nic Colley - (nic.colley@alation.com) -
-
2021-06-30 18:16:58
-
-

*Thread Reply:* I based my code on this previous post https://openlineage.slack.com/archives/C01CK9T7HKR/p1624198123045800

-
- - -
- - - } - - Bruno Canal - (https://openlineage.slack.com/team/U025LV2BJUB) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nic Colley - (nic.colley@alation.com) -
-
2021-06-30 18:36:59
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-07-01 13:45:42
-
-

*Thread Reply:* In your first cell, you have -from pyspark.sql import SparkSession -from pyspark.sql.functions import lit -spark.sparkContext.setLogLevel('info') -unfortunately, the reference to sparkContext in the third line forces the initialization of the SparkContext so that in the next cell, your new configuration is ignored. In pyspark, you must initialize your SparkSession before any references to the SparkContext. It works if you remove the setLogInfo call from the first cell and make your 2nd cell -spark = SparkSession.builder \ - .config('spark.jars.packages', 'io.github.marquezproject:marquez_spark:0.15.2') \ - .config('spark.extraListeners', 'marquez.spark.agent.SparkListener') \ - .config('openlineage.url', '<https://domain.com>') \ - .config('openlineage.namespace', 'my-namespace') \ - .getOrCreate() -spark.sparkContext.setLogLevel('info')

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Samia Rahman - (srahman@thoughtworks.com) -
-
2021-06-30 19:26:42
-
-

How would one capture lineage for job that's processing streaming data? Is that in scope for OpenLineage?

- - - -
- ➕ Josh Quintus, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-01 16:32:18
-
-

*Thread Reply:* It’s absolutely in scope! We’ve primarily focused on the batch use case (ETL jobs, etc), but the OpenLineage standard supports both batch and streaming jobs. You can check out our roadmap here, where you’ll find Flink and Beam on our list of future integrations.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-01 16:32:57
-
-

*Thread Reply:* Is there a streaming framework you’d like to see added to our roadmap?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
mohamed chorfa - (chorfa672@gmail.com) -
-
2021-06-30 20:33:25
-
-

👋 Hello everyone!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-01 16:24:16
-
-

*Thread Reply:* Welcome, @mohamed chorfa 👋 . Let’s us know if you have any questions!

- - - -
- 👍 mohamed chorfa -
- -
-
-
-
- - - - - -
-
- - - - -
- -
mohamed chorfa - (chorfa672@gmail.com) -
-
2021-07-03 19:37:58
-
-

*Thread Reply:* Really looking follow the evolution of the specification from RawData to the ML-Model

- - - -
- ❤️ Julien Le Dem, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-02 16:53:01
-
-

Hello OpenLineage community, -We have been working on fleshing out the OpenLineage roadmap. -See on github on the currently prioritized effort: https://github.com/OpenLineage/OpenLineage/projects -Please add your feedback to the roadmap by either commenting on the github issues or opening new issues.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-02 17:04:13
-
-

In particular, I have opened an issue to finalize our mission statement: https://github.com/OpenLineage/OpenLineage/issues/84

-
- - - - - - - - - - - - - - - - -
- - - -
- ❤️ Ross Turk, Maciej Obuchowski, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-07 19:53:17
-
-

*Thread Reply:* Based on community feedback, -The new proposed mission statement: “to enable the industry at-large to collect real-time lineage metadata consistently across complex ecosystems, creating a deeper understanding of how data is produced and used”

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-07 20:23:24
-
-

I have updated the proposal for the spec versioning: https://github.com/OpenLineage/OpenLineage/issues/63

-
- - - - - - - -
-
Assignees
- julienledem -
- -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jorik - (jorik.blaas-sigmond@nn.nl) -
-
2021-07-08 07:06:53
-
-

Hi all. I'm trying to get my bearings on openlineage. Love the concept. In our data transformation pipelines, output datasets are explicitly versioned (we have an incrementing snapshot id). Our storage layer (deltalake) allows us to also ingest 'older' versions of the same dataset, etc. If I understand it correctly I would have to add some inputFacets and outputFacets to run to store the actual version being referenced. Is that something that is currently available, or on the roadmap, or is it something I could extend myself?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-08 18:57:44
-
-

*Thread Reply:* It is on the roadmap and there’s a ticket open but nobody is working on it at the moment. You are very welcome to contribute a spec and implementation

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-08 18:59:00
-
-

*Thread Reply:* Please comment here and feel free to make a proposal: https://github.com/OpenLineage/OpenLineage/issues/35

-
- - - - - - - -
-
Labels
- proposal -
- -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorik - (jorik.blaas-sigmond@nn.nl) -
-
2021-07-08 07:07:29
-
-

TL;DR: our database supports time-travel, and runs can be set up to use a specific point-in-time of an input. How do we make sure to keep that information within openlineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mariusz Górski - (gorskimariusz13@gmail.com) -
-
2021-07-09 02:23:29
-
-

Hi, on a subject of spark integrations - I know that there is spark-marquez but was curious did you also consider https://github.com/AbsaOSS/spline-spark-agent ? It seems like this and spark-marquez are doing similar thing and maybe it would make sense to add openlineage support to spline spark agent?

-
- - - - - - - -
-
Website
- <https://absaoss.github.io/spline/> -
- -
-
Stars
- 36 -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mariusz Górski - (gorskimariusz13@gmail.com) -
-
2021-07-09 02:23:42
-
-

*Thread Reply:* cc @Julien Le Dem @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-07-09 04:28:38
-
-

*Thread Reply:* @Michael Collado

- - - -
- 👀 Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-12 21:17:12
-
-

The OpenLineage Technical Steering Committee meetings are Monthly on the Second Wednesday 9:00am to 10:00am US Pacific and the link to join the meeting is https://us02web.zoom.us/j/81831865546?pwd=RTladlNpc0FTTDlFcWRkM2JyazM4Zz09 -The next meeting is this Wednesday -All are welcome. -•  Agenda: - ◦ Finalize the OpenLineage Mission Statement - ◦ Review OpenLineage 0.1 scope - ◦ Roadmap - ◦ Open discussion  - ◦ Slides: https://docs.google.com/presentation/d/1fD_TBUykuAbOqm51Idn7GeGqDnuhSd7f/edit#slide=id.ge4b57c6942_0_46 -notes are posted here: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting.,.,_

- - - -
- 🙌 Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-12 21:18:04
-
-

*Thread Reply:* Feel free to share your email with me if you want to be added to the gcal invite

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-14 12:03:31
-
-

*Thread Reply:* It is starting now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jiří Sedláček - (yirie.sedlahczech@gmail.com) -
-
2021-07-13 08:22:40
-
-

Hello, is it possible to track lineage on column level? For example for SQL like this: -CREATE TABLE T2 AS SELECT c1,c2 FROM T1; -I would like to record this lineage: -T1.C1 -- job1 --&gt; T2.C1 -T1.C2 -- job1 --&gt; T2.C2 -Would that be possible to record in OL format?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jiří Sedláček - (yirie.sedlahczech@gmail.com) -
-
2021-07-13 08:29:52
-
-

(the important thing for me is to be able to tell that T1.C1 has no effect on T2.C2)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-14 17:00:12
-
-

I have updated the notes and added the link to the recording of the meeting this morning: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-14 17:04:18
-
-

*Thread Reply:* In particular, please review the versioning proposal: https://github.com/OpenLineage/OpenLineage/issues/63

-
- - - - - - - -
-
Assignees
- julienledem -
- -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-14 17:04:33
-
-

*Thread Reply:* and the mission statement: https://github.com/OpenLineage/OpenLineage/issues/84

-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-14 17:05:02
-
-

*Thread Reply:* for this one, please give explicit approval in the ticket

- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-14 21:10:42
-
-

*Thread Reply:* @Zhamak Dehghani @Daniel Henneberger @Drew Banin @James Campbell @Ryan Blue @Maciej Obuchowski @Willy Lulciuc ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-27 18:58:35
-
-

*Thread Reply:* Per the votes in the github ticket, I have finalized the charter here: https://docs.google.com/document/d/11xo2cPtuYHmqRLnR-vt9ln4GToe0y60H/edit

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jarek Potiuk - (jarek@potiuk.com) -
-
2021-07-16 01:25:56
-
-

Hi Everyone. I am PMC member and committer of Apache Airflow. Watched the talk at the summit https://airflowsummit.org/sessions/2021/data-lineage-with-apache-airflow-using-openlineage/ and thought I might help (after the Summit is gone 🙂 with making OpenLineage/Marquez more seemlesly integrated in Airflow

-
-
airflowsummit.org
- - - - - - - - - - - - - - - - - -
- - - -
- ❤️ Abe Gong, WingCode, Maciej Obuchowski, Ross Turk, Julien Le Dem, Michael Collado, Samia Rahman, mohamed chorfa -
- -
- 🙌 Maciej Obuchowski -
- -
- 👍 Jorik -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Samia Rahman - (srahman@thoughtworks.com) -
-
2021-07-20 16:38:38
-
-

*Thread Reply:* The demo in this does not really use the openlineage spec does it?

- -

Did I miss something - the API that was should for lineage was that of Marquez, how does Marquest use the open lineage spec?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Samia Rahman - (srahman@thoughtworks.com) -
-
2021-07-20 18:09:01
-
-

*Thread Reply:* I have a question about the SQLJobFacet in the job schema - isn't it better to call it the TransformationJob Facet or the ProjecessJobFacet such that any logic in the appropriate language and be described? Am I misinterpreting the intention of SQLJobFacet is to capture the logic that runs for a job?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-26 19:06:43
-
-

*Thread Reply:* > The demo in this does not really use the openlineage spec does it? -@Samia Rahman In our Airflow talk, the demo used the marquez-airflow lib that sends OpenLineage events to Marquez’s . You can check out the how does Airflow works with OpenLineage + Marquez here https://openlineage.io/integration/apache-airflow/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-26 19:07:51
-
-

*Thread Reply:* > Did I miss something - the API that was should for lineage was that of Marquez, how does Marquest use the open lineage spec? -Yes, Marquez ingests OpenLineage events that confirm to the spec via the . Hope this helps!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kenton (swiple.io) - (kknoxparton@gmail.com) -
-
2021-07-21 07:52:32
-
-

Hi all, does OpenLineage intend on creating lineage off of query logs?

- -

From what I have read, there are a number of supported integrations but none that cater to regular SQL based ETL. Is this on the OpenLineage roadmap?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-26 18:54:46
-
-

*Thread Reply:* I would say this is more of an ingestion pattern, then something the OpenLineage spec would support directly. Though I completely agree, query logs are a great source of lineage metadata with minimal effort. On our roadmap, we have Kafka as a supported backend which would enable streaming lineage metadata from query logs into a topic. That said, confluent has some great blog posts on Change Data Capture: -• https://www.confluent.io/blog/no-more-silos-how-to-integrate-your-databases-with-apache-kafka-and-cdc/ -• https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/

-
-
Confluent
- - - - - - - - - - - - - - - - - -
-
-
Confluent
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-26 18:57:59
-
-

*Thread Reply:* Q: @Kenton (swiple.io) Are you planning on using Kafka connect? If so, I see 2 reasonable options:

- -
  1. Stream query logs to a topic using the JDBC source connector, then have a consumer read the query logs off the topic, parse the logs, then stream the result of the query parsing to another topic as an OpenLineage event
  2. Add direct support for OpenLineage to the JDBC connector or any other application you planned to use to read the query logs.
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-26 19:01:31
-
-

*Thread Reply:* Either way, I think this is a great question and a common ingestion pattern we should document or have best practices for. Also, more details on how you plan to ingestion the query logs would be help drive the discussion.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kenton (swiple.io) - (kknoxparton@gmail.com) -
-
2021-08-05 12:01:55
-
-

*Thread Reply:* Using something like sqlflow could be a good starting point? Demo https://sqlflow.gudusoft.com/?utm_source=gspsite&utm_medium=blog&utm_campaign=support_article#/

-
-
sqlflow.gudusoft.com
- - - - - - - - - - - - - - - -
-
- - - - - - - -
-
Stars
- 79 -
- -
-
Language
- Python -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-09-21 20:22:26
-
-

*Thread Reply:* @Kenton (swiple.io) I haven’t heard of sqlflow but it does look promising. It’s not on our current roadmap, but I think there is a need to have support for parsing query logs as OpenLineage events. Do you mind opening an issue and outlining you thoughts? It’d be great to start the discussion if you’d like to drive this feature and help prioritize this 💯

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Samia Rahman - (srahman@thoughtworks.com) -
-
2021-07-21 08:49:23
-
-

The openlineage implementation for airflow and spark code integration currently lives in Marquez repo, my understanding from the open lineage scope is that the the integration implementation is the scope of open lineage, are the spark code migrations going to be moved to open lineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2021-07-21 11:35:12
-
-

@Samia Rahman Yes, that is the plan. For details you can see https://github.com/OpenLineage/OpenLineage/issues/73

-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 Samia Rahman, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Samia Rahman - (srahman@thoughtworks.com) -
-
2021-07-21 18:13:11
-
-

I have a question about the SQLJobFacet in the job schema - isn't it better to call it the TransformationJob Facet or the ProjecessJobFacet such that any logic in the appropriate language and be described, can be scala or python code that runs in the job facet and processing streaming or batch data? Am I misinterpreting the intention of SQLJobFacet is to capture the logic that runs for a job?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-21 18:22:01
-
-

*Thread Reply:* Hey, @Samia Rahman 👋. Yeah, great question! The SQLJobFacet is used only for SQL-based jobs. That is, it’s not intended to capture the code being executed, but rather the just the SQL if it’s present. The SQL fact can be used later for display purposes. For example, in Marquez, we use the SQLJobFacet to display the SQL executed by a given job to the user via the UI.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-07-21 18:23:03
-
-

*Thread Reply:* To capture the logic of the job (meaning, the code being executed), the OpenLineage spec defines the SourceCodeLocationJobFacet that builds the link to source in version control

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-07-22 17:56:41
-
-

The process started a few months back when the LF AI & Data voted to accept OpenLineage as part of the foundation. It is now official, OpenLineage joined the LFAI & data Foundation. - https://lfaidata.foundation/blog/2021/07/22/openlineage-joins-lf-ai-data-as-new-sandbox-project/

-
-
LF AI
- - - - - - -
-
Written by
- Jacqueline Z Cardoso -
- -
-
Est. reading time
- 3 minutes -
- - - - - - - - - - - - -
- - - -
- 🙌 Ross Turk, Luke Smith, Maciej Obuchowski, Gyan Kapur, Dr Daniel Smith, Jarek Potiuk, Peter Hicks, Kedar Rajwade, Abe Gong, Damian Warszawski, Willy Lulciuc -
- -
- ❤️ Ross Turk, Jarek Potiuk, Peter Hicks, Abe Gong, Willy Lulciuc -
- -
- 🎉 Laurent Paris, Rifa Achrinza, Minkyu Park, Peter Hicks, mohamed chorfa, Jarek Potiuk, Abe Gong, Damian Warszawski, Willy Lulciuc, James Le -
- -
- 👏 Matt Turck -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Namron - (ian.norman@avanade.com) -
-
2021-07-29 11:20:17
-
-

Hi, I am trying to create lineage between two datasets. Following the Spec, I can see the syntax for declaring the input and output datasets, and for all creating the associated Job (which I take to be the process in the middle joining the two datasets together). What I can't see is where in the specification to relate the job to the inputs and outputs. Do you have an example of this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-07-30 17:24:44
-
-

*Thread Reply:* The run event is always tied to exactly one job. It's up to the backend to store the relationship between the job and its inputs/outputs. E.g., in marquez, this is where we associate the input datasets with the job- https://github.com/MarquezProject/marquez/blob/main/api/src/main/java/marquez/db/OpenLineageDao.java#L132-L143

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-03 15:06:58
-
-

the OuputStatistics facet PR is updated based on your comments @Michael Collado https://github.com/OpenLineage/OpenLineage/pull/114

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- 🙌 Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-03 15:11:56
-
-

*Thread Reply:* /|~~~ - ///| - /////| - ///////| - /////////| - \==========|===/ -~~~~~~~~~~~~~~~~~~~~~

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-03 19:59:03
-
-

*Thread Reply:*

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-03 19:59:38
-
-

I have updated the DataQuality metrics proposal and the corresponding PR: https://github.com/OpenLineage/OpenLineage/issues/101 -https://github.com/OpenLineage/OpenLineage/pull/115

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 Willy Lulciuc, Bruno González -
- -
- 💯 Willy Lulciuc, Dominique Tipton -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Oleksandr Dvornik - (oleksandr.dvornik@getindata.com) -
-
2021-08-04 10:42:48
-
-

Guys, I've merged circleCI publish snapshot PR

- -

Snapshots can be found bellow: -https://datakin.jfrog.io/artifactory/maven-public-libs-snapshot-local/io/openlineage/openlineage-java/0.0.1-SNAPSHOT/ -openlineage-java-0.0.1-20210804.142910-6.jar -https://datakin.jfrog.io/artifactory/maven-public-libs-snapshot-local/io/openlineage/openlineage-spark/0.1.0-SNAPSHOT/ -openlineage-spark-0.1.0-20210804.143452-5.jar

- -

Build on main passed (edited)

- -
- - - - - - - -
- - -
- 🎉 Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-04 23:08:08
-
-

I added a mechanism to enforce spec versioning per: https://github.com/OpenLineage/OpenLineage/issues/63 -https://github.com/OpenLineage/OpenLineage/pull/140

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ben Teeuwen-Schuiringa - (ben.teeuwen@booking.com) -
-
2021-08-05 10:02:49
-
-

Hi all, at Booking.com we’re using Spline to extract granular lineage information from spark jobs to be able to trace lineage on column-level and the operations in between. We wrote a custom python parser to create graph-like structure that is sent into arangodb. But tbh, the process is far from stable and is not able to quickly answer questions like ‘which root input columns are used to construct column x’.

- -

My impression with openlineage thus far is it’s focusing on less granular, table input-output information. Is anyone here trying to accomplish something similar on a column-level?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-08-05 12:56:48
-
-

*Thread Reply:* Also interested in use case / implementation differences between Spline and OL. Watching this thread.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-05 14:46:44
-
-

*Thread Reply:* It would be great to have the option to produce the spline lineage info as OpenLineage. -To capture the column level lineage, you would want to add a ColumnLineage facet to the Output dataset facets. -Which is something that is needed in the spec. -Here is a proposal, please chime in: https://github.com/OpenLineage/OpenLineage/issues/148 -Is this something you would be interested to do?

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-09 19:49:51
-
-

*Thread Reply:* regarding the difference of implementation, the OpenLineage spark integration focuses on extracting metadata and exposing it as a standard representation. (The OpenLineage LineageEvents described in the JSON-Schema spec). The goal is really to have a common language to express lineage and related metadata across everything. We’d be happy if Spline can produce or consume OpenLineage as well and be part of that ecosystem.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ben Teeuwen-Schuiringa - (ben.teeuwen@booking.com) -
-
2021-08-18 08:09:38
-
-

*Thread Reply:* Does anyone know if the Spline developers are in this slack group?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ben Teeuwen-Schuiringa - (ben.teeuwen@booking.com) -
-
2022-08-03 03:07:56
-
-

*Thread Reply:* @Luke Smith how have things progressed on your side the past year?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-09 19:39:28
-
-

I have opened an issue to track the facet versioning discussion: -https://github.com/OpenLineage/OpenLineage/issues/153

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-09 20:16:18
-
-

I have updated the agenda to the OpenLineage monthly TSC meeting: -https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting -(meeting information bellow for reference, you can also DM me your email to get added to a google calendar invite)

- -

The OpenLineage Technical Steering Committee meetings are Monthly on the Second Wednesday 9:00am to 10:00am US Pacific and the link to join the meeting is https://us02web.zoom.us/j/81831865546?pwd=RTladlNpc0FTTDlFcWRkM2JyazM4Zz09 -All are welcome.

- -

Aug 11th 2021 -• Agenda: - ◦ Coming in OpenLineage 0.1 - ▪︎ OpenLineage spec versioning - ▪︎ Clients - ◦ Marquez integrations imported in OpenLineage - ▪︎ Apache Airflow: - • BigQuery  - • Postgres - • Snowflake - • Redshift - • Great Expectations - ▪︎ Apache Spark - ▪︎ dbt - ◦ OpenLineage 0.2 scope discussion - ▪︎ Facet versioning mechanism - ▪︎ OpenLineage Proxy Backend () - ▪︎ Kafka client - ◦ Roadmap - ◦ Open discussion -• Slides: https://docs.google.com/presentation/d/1Lxp2NB9xk8sTXOnT0_gTXicKX5FsktWa/edit#slide=id.ge80fbcb367_0_14

- - - -
- 🙌 Willy Lulciuc, Maciej Obuchowski, Dr Daniel Smith -
- -
- 💯 Willy Lulciuc, Dr Daniel Smith -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-11 10:05:27
-
-

*Thread Reply:* Just a reminder that this is in 2 hours

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-11 18:50:32
-
-

*Thread Reply:* I have added the notes to the meeting page: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-11 18:51:19
-
-

*Thread Reply:* The recording of the meeting is linked there: -https://us02web.zoom.us/rec/share/2k4O-Rjmmd5TYXzT-pEQsbYXt6o4V6SnS6Vi7a27BPve9aoMmjm-bP8UzBBzsFzg.uY1je-PyT4qTgYLZ?startTime=1628697944000 -• Passcode: =RBUj01C

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Avancini - (dpavancini@gmail.com) -
-
2021-08-11 13:30:52
-
-

Hi guys, great discussion today. Something we are particularly interested on is the integration with Airflow 2. I've been searching into Marquez and Openlineage repos and I couldn't find a clear answer on the status of that. I did some work locally to update the marquez-airflow package but I would like to know if someone else is working on this and maybe we could give it some help too.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-08-11 13:36:43
-
-

*Thread Reply:* @Daniel Avancini I'm working on it. Some changes in airflow made current approach unfeasible, so slight change in a way how we capture events is needed. You can take a look at progress here: https://github.com/OpenLineage/OpenLineage/tree/airflow/2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Avancini - (dpavancini@gmail.com) -
-
2021-08-11 13:48:36
-
-

*Thread Reply:* Thank you Maciej. I'll take a look

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-11 20:37:09
-
-

I have migrated the Marquez issues related to OpenLineage integrations to the OpenLineage repo

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-08-13 19:02:54
-
-

And OpenLineage 0.1.0 is out ! https://github.com/OpenLineage/OpenLineage/releases/tag/0.1.0

- - - -
- 🙌 Peter Hicks, Maciej Obuchowski, Willy Lulciuc, Oleksandr Dvornik, Luke Smith, Daniel Avancini, Matt Gee -
- -
- ❤️ Willy Lulciuc, Matt Gee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Oleksandr Dvornik - (oleksandr.dvornik@getindata.com) -
-
2021-08-16 11:42:24
-
-

PR ready for review

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-08-20 13:54:08
-
-

Anyone have experience parsing spark's logical plan to generate column-level lineage and DAGs with more human readable operations? I assume I could recreate a graph like the one below using the spark.logicalPlan facet. The analysts writing the SQL / spark queries aren't familiar with ShuffledRowRDD , MapPartitionsRDD, etc... It'd be better if I could convert this plan into spark SQL (or capture spark SQL as a facet at runtime).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-26 16:46:53
-
-

*Thread Reply:* The logicalPlan facet currently returns the Logical Plan, not the physical plan. This means you end up with expressions like Aggregate and Join rather than WholeStageCodegen and Exchange. I don't know if it's possible to reverse engineer the SQL- it's worth looking into the API and trying to find a way to generate that

- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-08-31 14:26:35
-
-

👋 Hi everyone!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-08-31 14:27:00
-
-

Nice to e-meet you 🙂 -I want to use OpenLineage integration for spark in my Azure Databricks clusters, but I am having problems with the configuration of the listener in the cluster, I was wondering if you could help me, if you know any tutorial for the integration of spark with Azure Databricks that can help me, or some more specific guide for this scenario, I would really appreciate it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-08-31 14:27:33
-
-

I added this configuration to my cluster :

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-08-31 14:28:37
-
-

I receive this error message:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-08-31 14:30:00
-
-

*Thread Reply:* Hey, @Erick Navarro 👋 . Are you using the openlineage-spark lib? (Note, the marquez-spark lib has been deprecated)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-08-31 14:43:20
-
-

*Thread Reply:* My team had this issue as well. Our read of the error is that Databricks attempts to register the listener before installing packages defined with either spark.jars or spark.jars.packages. Since the listener lib is not yet installed, the listener cannot be found. To solve the issue, we

- -
  1. copy the OL JAR to a staging directory on DBFS (we use /dbfs/databricks/init/lineage)
  2. using an init script, copy the JAR from the staging directory to the default JAR location for the Databricks driver -- /mnt/driver-daemon/jars
  3. Within the same init script, write the spark config parameters to a .conf file in /databricks/driver/conf (we use open-lineage.conf) -The .conf file will be read by the driver on initialization. It should follow this format (lineagehosturl should point to your API): -[driver] { -"spark.jars" = "/mnt/driver-daemon/jars/openlineage-spark-0.1-SNAPSHOT.jar" -"spark.extraListeners" = "com.databricks.backend.daemon.driver.DBCEventLoggingListener,openlineage.spark.agent.OpenLineageSparkListener" -"spark.openlineage.url" = "$lineage_host_url" -} -Your cluster must be configured to call the init script (enabling lineage for entire cluster). OL is not friendly to notebook-level init as far as we can tell.
  4. -
- -

@Willy Lulciuc -- I have some utils and init script templates that simplify this process. May be worth adding them to the OL repo along with a readme.

- - - -
- 🙏 Erick Navarro -
- -
- ❤️ Erick Navarro -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-08-31 14:51:46
-
-

*Thread Reply:* Absolutely, thanks for elaborating on your spark + OL deployment process and I think that’d be great to document. @Michael Collado what are your thoughts?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-31 14:57:02
-
-

*Thread Reply:* I haven't tried with Databricks specifically, but there should be no issue registering the OL listener in the Spark config as long as it's done before the Spark session is created- e.g., this example from the README works fine in a vanilla Jupyter notebook- https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#openlineagesparklistener-as-a-plain-spark-listener

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-31 15:11:37
-
-

*Thread Reply:* Looks like Databricks' notebooks come with a Spark instance pre-configured- configuring lineage within the SparkSession configuration doesn't seem possible- https://docs.databricks.com/notebooks/notebooks-manage.html#attach-a-notebook-to-a-cluster 😞

-
-
docs.databricks.com
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-31 15:11:53
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-08-31 15:59:38
-
-

*Thread Reply:* Right, Databricks provides preconfigured spark context / session objects. With Spline, you can set some cluster level config (e.g. spark.spline.lineageDispatcher.http.producer.url ) and install the library on the cluster, but then enable tracking at a notebook level with:

- -

%scala -import za.co.absa.spline.harvester.SparkLineageInitializer._ -sparkSession.enableLineageTracking() -In OL, it would be nice to install and config OL at a cluster level, but to enable it at a notebook level. This way, users could control whether all notebooks run on a cluster emit lineage or just those with lineage explicitly enabled.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-31 16:01:00
-
-

*Thread Reply:* Seems, at the very least, we need to provide a way to specify the job name at the notebook level

- - - -
- 👍 Luke Smith -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-08-31 16:03:50
-
-

*Thread Reply:* Agreed. I'd like a default that uses the notebook name that can also be overridden in the notebook.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-31 16:10:42
-
-

*Thread Reply:* if you have some insight into the available options, it would be great if you can open an issue on the OL project. I'll have to carve out some time to play with a databricks cluster and learn what options we have

- - - -
- 👍 Luke Smith -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-08-31 18:26:11
-
-

*Thread Reply:* Thank you @Luke Smith, the method you recommend works for me, the cluster is running and apparently it fetch the configuration it was my first progress in over a week testing openlineage in azure databricks. Thank you!

- -

Now I have this:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-08-31 18:52:15
-
-

*Thread Reply:* Is this error thrown during init or job execution?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-08-31 18:55:30
-
-

*Thread Reply:* this is likely a race condition- I've seen it happen for jobs that start and complete very quickly- things like defining temp views or similar

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-08-31 19:59:15
-
-

*Thread Reply:* During the execution of the job @Luke Smith, thank you @Michael Collado, that was exactly the scenario, the job that I executed was empty, now the cluster is running ok, I don't have errors, I have run some jobs successfully, but I don't see any information in my datakin explorer

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-08-31 20:00:46
-
-

*Thread Reply:* Awesome! Great to hear you’re up and running. For datakin specific questions, mind if we move the discussion to the datakin user slack channel?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-08-31 20:01:17
-
-

*Thread Reply:* Yes Willy, thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-09-02 10:06:00
-
-

*Thread Reply:* Hi , @Luke Smith, thank you for your help, are you familiar with this error in azure databricks when you use OL?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-09-02 10:07:07
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-09-02 10:17:17
-
-

*Thread Reply:* I found the solution here: -https://docs.microsoft.com/en-us/answers/questions/170730/handshake-fails-trying-to-connect-from-azure-datab.html

-
-
docs.microsoft.com
- - - - - - - - - - - - - - - -
- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-09-02 10:17:28
-
-

*Thread Reply:* It works now! 😄

- - - -
- 👍 Luke Smith, Maciej Obuchowski, Minkyu Park, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-09-02 16:33:01
-
-

*Thread Reply:* @Erick Navarro This might be a helpful to add to our openlineage spark docs for others trying out openlineage-spark with Databricks. Let me know if that’s something you’d like to contribute 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erick Navarro - (Erick.Navarro@gt.ey.com) -
-
2021-09-02 19:59:10
-
-

*Thread Reply:* Yes of course @Willy Lulciuc, I will prepare a small tutorial for my colleagues and I will share it with you 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-09-02 20:44:36
-
-

*Thread Reply:* Awesome. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-02 03:47:35
-
-

Hello everyone! I am currently evaluating OpenLineage and am finding it very interesting as Prefect is in the list of integrations. However, I am not seeing any documentation or code for this. How far are you from supporting Prefect?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-02 04:57:55
-
-

*Thread Reply:* Hey! If you mean this picture, it provides concept of how OpenLineage works, not current state of integration. We don't have Prefect support yet; hovewer, it's on our roadmap.

-
- - - - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-02 05:22:15
-
-

*Thread Reply:* great, thanks 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-02 11:49:48
-
-

*Thread Reply:* @Thomas Fredriksen Feel free to chime in the github issue Maciej linked if you want.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-09-02 13:13:05
-
-

What's the timeline to support spark 3.0 within OL? One breaking change we've found is within DatasetSourceVisitor.java -- the DataSourceV2 is deprecated in spark 3.0. There may be other issues we haven't found yet. Is there a good feel for the scope of work required to make OL spark 3.0 compatible?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-02 14:28:11
-
-

*Thread Reply:* It is being worked on right now. @Oleksandr Dvornik is adding an integration test in the build so that we run test for both spark 2.4 and spark 3. Please open an issue with the stack trace if you can. From our perspective, it should be mostly compatible with a few exceptions like this one that we’d want to add test cases for.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-02 14:36:19
-
-

*Thread Reply:* The goal is to be able to make a release in the next few weeks. The integration is being used with Spark 3 already.

- - - -
- 🙌 Luke Smith -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-09-02 15:50:14
-
-

*Thread Reply:* Great, I'll take some time to open an issue for this particular issue and a few others.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-09-02 17:33:08
-
-

*Thread Reply:* are you actually using the DatasetSource interface in any capacity? Or are you just scanning the source code to find incompatibilities?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-09-03 12:36:20
-
-

*Thread Reply:* Turns out this has more to do with a how Databricks handles the delta format. It's related to https://github.com/AbsaOSS/spline-spark-agent/issues/96.

-
- - - - - - - -
-
Labels
- question -
- -
-
Comments
- 5 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-09-03 13:42:43
-
-

*Thread Reply:* I haven't been chasing this issue down on my team -- turns out some things were lost in communication. There are really two problems here:

- -
  1. When attempting to do delta I/O with Spark 3 on Databricks, e.g. -insert into . . . values . . . -We get an error related to DataSourceV2: -java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation.source()Lorg/apache/spark/sql/sources/v2/DataSourceV2;
  2. Using Spline, which is Spark 3 compatible, we have issues with the way Databricks handles delta table io. This is related: https://github.com/AbsaOSS/spline-spark-agent/issues/96
  3. -
- -

So there are two stacked issues related to spark 3 on Databricks with delta IO, not just one. Hope this clears things up.

-
- - - - - - - -
-
Labels
- question -
- -
-
Comments
- 5 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-09-03 13:44:54
-
-

*Thread Reply:* So, the first issue is OpenLineage related directly, and the second issue applies to both OpenLineage and Spline?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-09-03 13:45:49
-
-

*Thread Reply:* Yes, that's my read of what I'm getting from others on the team.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-09-03 13:46:56
-
-

*Thread Reply:* For the first issue- can you give some details about the target of the INSERT INTO... ? Is it a data source defined in Databricks? a Hive table? a view on GCS?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-09-03 13:47:40
-
-

*Thread Reply:* oh, it's a Delta table?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-09-03 14:48:15
-
-

*Thread Reply:* Yes, it's created via

- -

CREATE TABLE . . . using DELTA location "/dbfs/mnt/ . . . "

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-02 14:28:53
-
-

I have opened a PR to fix some outdated language in the spec: https://github.com/OpenLineage/OpenLineage/pull/241 Thank you @Mandy Chessell for the feedback

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-02 14:37:27
-
-

The next OpenLineage monthly meeting is next week. Please chime in this thread if you’d like something added to the agenda

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
marko - (marko.kristian.helin@gmail.com) -
-
2021-09-04 12:53:54
-
-

*Thread Reply:* Apache Beam integration? I have a very crude integration at the moment. Maybe it’s better to integrate on the orchestration level (airflow, luigi). Thoughts?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-05 13:06:19
-
-

*Thread Reply:* I think it makes a lot of sense to have a Beam level integration similar to the spark one. Feel free to post a draft PR if you want to share.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-07 21:04:09
-
-

*Thread Reply:* I have added Beam as a topic for the roadmap discussion slide: https://docs.google.com/presentation/d/1fI0u8aE0iX9vG4GGrnQYAEcsJM9z7Rlv/edit#slide=id.ge7d4b64ef4_0_0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-07 21:03:08
-
-

I have prepared slides for the OpenLineage meeting tomorrow morning: https://docs.google.com/presentation/d/1fI0u8aE0iX9vG4GGrnQYAEcsJM9z7Rlv/edit#slide=id.ge7d4b64ef4_0_0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-07 21:03:32
-
-

*Thread Reply:* There will be a quick demo of the dbt integration (thanks @Willy Lulciuc!)

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-07 21:05:13
-
-

*Thread Reply:* Information to join and archive of previous meetings: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-08 14:49:52
-
-

*Thread Reply:* The recording and notes are now available: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Venkatesh Tadinada - (venkat@mlacademy.io) -
-
2021-09-08 21:58:09
-
-

*Thread Reply:* Good meeting today. @Julien Le Dem. Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Shreyas Kaushik - (shreyask@gmail.com) -
-
2021-09-08 04:03:29
-
-

Hello, was looking to get some lineage out for BQ in my Airflow DAGs and saw that the BQ extractor here - https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/openlineage/airflow/extractors/bigquery_extractor.py#L47 is using an operator that has been deprecated by Airflow - https://github.com/apache/airflow/blob/main/airflow/contrib/operators/bigquery_operator.py#L44 and most of my DAGs are using the operator BigQueryExecuteQueryOperator mentioned there. I presume with this lineage extraction wouldn’t work and some work is needed to support both these operators with the same ( or differnt) extractor. Is that correct or am I missing something ?

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-08 04:27:04
-
-

*Thread Reply:* We're working on updating our integration to airflow 2. Some changes in airflow made current approach unfeasible, so slight change in a way how we capture events is needed. You can take a look at progress here: https://github.com/OpenLineage/OpenLineage/tree/airflow/2

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Shreyas Kaushik - (shreyask@gmail.com) -
-
2021-09-08 04:27:38
-
-

*Thread Reply:* Thanks @Maciej Obuchowski When is this expected to land in a release ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Zagales - (dzagales@gmail.com) -
-
2021-11-11 06:35:24
-
-

*Thread Reply:* hi @Maciej Obuchowski I wanted to follow up on this to understand when the more recent BQ Operators will be supported, specifically BigQueryInsertJobOperator

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-11 22:30:31
-
-

The PR to separate facets in their own file (and allowing versioning them independently) is now available: https://github.com/OpenLineage/OpenLineage/pull/118

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jose Badeau - (jose.badeau@gmail.com) -
-
2021-09-13 03:46:20
-
-

Hi, new to the channel but I think OL is a great initiative. Currently we are focused on beam/spark/delta but are moving to beam/flink/iceberg and I’m happy to help where I can.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-09-13 15:40:01
-
-

*Thread Reply:* Welcome, @Jose Badeau 👋. That’s exciting to hear as we have Beam, Flink and Iceberg on our roadmap! Your welcome to join the discussion :)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-13 20:56:11
-
-

Per the discussion last week, Ryan updated the metadata that would be available in Iceberg: https://github.com/OpenLineage/OpenLineage/issues/167#issuecomment-917237320

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-13 21:00:54
-
-

I have also created tickets for follow up discussions: (#269 and #270): https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 04:50:22
-
-

Hello. I find OpenLineage an interesting tool however can someone help me with integration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 04:52:50
-
-

I am trying to capture lineage from spark 3.1.1 but when executing i constantly get: java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2.writer()Lorg/apache/spark/sql/sources/v2/writer/DataSourceWriter; - at openlineage.spark.agent.lifecycle.plan.DatasetSourceVisitor.findDatasetSource(DatasetSourceVisitor.java:57) as if i would be using openlineage on wrong spark version (2.4) I have tried also spark jar from branch feature/itspark3. Is there any branch or release that works or can be tried with spark 3+?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Oleksandr Dvornik - (oleksandr.dvornik@getindata.com) -
-
2021-09-14 05:03:45
-
-

*Thread Reply:* Hello Tomas. We are currently working on support for spark v3. Can you please raise an issue with stack trace, that would help us to track and solve it. We are currently adding integration tests. Next step would be fix changes in method signatures for v3 (that's what you have)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 05:12:45
-
-

*Thread Reply:* Hi @Oleksandr Dvornik i raised https://github.com/OpenLineage/OpenLineage/issues/272

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Oleksandr Dvornik -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 08:47:39
-
-

I also tried to downgrade spark to 2.4.0 and retry with 0.2.2 but i also faced issue.. so my preferred way would be to push for spark 3.1.1 but depends a bit on when you plan to release version supporting it. As backup plan i would try spark 2.4.0 but this is blocking me also: https://github.com/OpenLineage/OpenLineage/issues/274

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-14 08:55:44
-
-

*Thread Reply:* I think this might be actually spark issue: https://stackoverflow.com/questions/53787624/spark-throwing-arrayindexoutofboundsexception-when-parallelizing-list/53787847

-
-
Stack Overflow
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-14 08:56:10
-
-

*Thread Reply: Can you try newer version in 2.4.* line, like 2.4.7?

- - - -
- 👀 Tomas Satka -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-14 08:57:30
-
-

*Thread Reply:* This might be also spark 2.4 with scala 2.12 issue - I'd recomment 2.11 versions.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 09:04:26
-
-

*Thread Reply:* @Maciej Obuchowski with 2.4.7 i get following exc:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 09:04:27
-
-

*Thread Reply:* 21/09/14 15:03:25 WARN RddExecutionContext: Unable to access job conf from RDD -java.lang.NoSuchFieldException: config$1 - at java.base/java.lang.Class.getDeclaredField(Class.java:2411)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 09:04:48
-
-

*Thread Reply:* i can also try to switch to 2.11 scala

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 09:05:37
-
-

*Thread Reply:* or do you have some recommended setup that works for sure?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-14 09:09:58
-
-

*Thread Reply:* One more check - you're using Java 8 with this, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-14 09:10:17
-
-

*Thread Reply:* This is what works for me: --&gt; % cat tools/spark-2.4/RELEASE -Spark 2.4.8 (git revision 4be4064) built for Hadoop 2.7.3 -Build flags: -B -Pmesos -Pyarn -Pkubernetes -Pflume -Psparkr -Pkafka-0-8 -Phadoop-2.7 -Phive -Phive-thriftserver -DzincPort=3036

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-14 09:11:23
-
-

*Thread Reply:* spark-shell: -Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 09:12:05
-
-

*Thread Reply:* awesome let me try 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 09:26:00
-
-

*Thread Reply:* data has been sent to marquez. coolio. however i noticed nullpointer being thrown: 21/09/14 15:23:53 ERROR AsyncEventQueue: Listener OpenLineageSparkListener threw an exception -java.lang.NullPointerException - at io.openlineage.spark.agent.OpenLineageSparkListener.onJobEnd(OpenLineageSparkListener.java:164) - at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:39) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92) - at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92) - at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) - at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 10:59:45
-
-

*Thread Reply:* closed related issue #274

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 11:02:42
-
-

does openlineage capture streaming in spark? as this example is not showing me anything unless i replace readStream() with batch read() and writeStream() with write() -```SparkSession.Builder builder = SparkSession.builder(); - SparkSession session = builder - .appName("quantweave") - .master("local[**]") - .config("spark.jars.packages", "io.openlineage:openlineage_spark:0.2.2") - .config("spark.extraListeners", "io.openlineage.spark.agent.OpenLineageSparkListener") - .config("spark.openlineage.url", "http://localhost:5000/api/v1/namespaces/spark_integration/") - .getOrCreate();

- -
    Dataset&lt;Row&gt; df = session
-            .readStream()
-            .format("kafka")
-            .option("kafka.bootstrap.servers", "localhost:9092")
-            .option("subscribe", "topic1")
-            .option("startingOffsets", "earliest")
-            .load();
-
-    Dataset&lt;Row&gt; dff = df
-            .selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)").as("data");
-
-    dff
-            .writeStream()
-            .format("kafka")
-            .option("kafka.bootstrap.servers", "localhost:9092")
-            .option("topic", "topic2")
-            .option("checkpointLocation", "/tmp/checkpoint")
-            .start();```
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-14 13:38:09
-
-

*Thread Reply:* Not at the moment, but it is in scope. You are welcome to open an issue with your example to track this or even propose an implementation if you have the time.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Oleksandr Dvornik - (oleksandr.dvornik@getindata.com) -
-
2021-09-14 15:12:01
-
-

*Thread Reply:* @Tomas Satka it would be great, if you can add an containerized integration test for kafka with your test case. You can take this as an example here

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 18:02:05
-
-

*Thread Reply:* Hi @Oleksandr Dvornik i wrote a test for simple read/write from kafka topic using kafka testcontainer. However i discovered a bug. When writing to kafka topic getting java.lang.IllegalArgumentException: One of the following options must be specified for Kafka source: subscribe, subscribepattern, assign. See the docs for more details.

- -

• How would you like me to add the test? Fork openlineage and create PR

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 18:02:50
-
-

*Thread Reply:* • Shall i raise bug for writing to kafka that should have only "topic" instead of "subscribe"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-14 18:03:42
-
-

*Thread Reply:* • Since i dont know expected payload to openlineage mock server can somebody help me to create it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Oleksandr Dvornik - (oleksandr.dvornik@getindata.com) -
-
2021-09-14 19:06:41
-
-

*Thread Reply:* Hi @Tomas Satka, yes you should create a fork and raise a PR from that. For more details, please take a look at. Not sure about kafka, cause we don't have that integration yet. About expected payload, as a first step, I would suggest to leave that test without assertion for now. Second step would be investigation (what we can get from that plan node). Third step - implementation and asserting a payload. Basically we parse spark optimized plan, and get as much information as we can for specific implementation. You can take a look at recent PR for HIVE. We visit root node and leaves to get output datasets and input datasets accordingly.

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-15 04:37:59
-
-

*Thread Reply:* Hi @Oleksandr Dvornik PR for step one : https://github.com/OpenLineage/OpenLineage/pull/279

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Oleksandr Dvornik -
- -
- 🙌 Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luke Smith - (luke.smith@kinandcarta.com) -
-
2021-09-14 15:52:41
-
-

There may not be an answer to these questions yet, but I'm curious about the plan for Tableau lineage.

- -

• How will this integration be packaged and attached to Tableau instances? - ◦ via Extensions API, REST API? -• What is the architecture? -https://github.com/OpenLineage/OpenLineage/issues/78

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-15 01:58:37
-
-

Hi everyone - Following up on my previous post on prefect. The technical integration does not seem very difficult, but I am wondering about how to structure the lineage logic. -Is it the case that each prefect task should be mapped to a lineage job? If so, how do we connect the jobs together? Does there have to be a dataset between each job? -I am OpenLineage with Marquez by the way

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 09:19:23
-
-

*Thread Reply:* Hey Thomas!

- -

Following what we do with Airflow, yes, I think that each task should be mapped to job.

- -

You don't need datasets between each tasks. It's necessary only where you consume and produce datasets - and it does not matter where in uour job graph you've produced them.

- -

To map tasks togther In Airflow, we use ParentRunFacet , and the same approach could be used here. In Prefect, I think using flow_run_id would work.

- - - -
- 👍 Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-15 09:26:21
-
-

*Thread Reply:* this is very helpful, thank you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-15 09:26:43
-
-

*Thread Reply:* what would be the namespace used in the Job -definition of each task?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 09:31:34
-
-

*Thread Reply:* In contrast to dataset namespaces - which we try to standardize, job namespaces should be provided by user, or operator of particular scheduler.

- -

For example, it would be good if it helped you identify Prefect instance where the job was run.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 09:32:23
-
-

*Thread Reply:* If you use openlineage-python client, you can provide namespace either in client constuctor, or via OPENLINEAGE_NAMESPACE env variable.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-15 09:32:55
-
-

*Thread Reply:* awesome, thank you 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-15 17:03:07
-
-

*Thread Reply:* Hey @Thomas Fredriksen - just chiming in, I’m also keen for a prefect integration. Let me know if I can help out at all

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-15 17:27:20
-
-

*Thread Reply:* Please chime in on https://github.com/OpenLineage/OpenLineage/issues/81

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-15 18:29:20
-
-

*Thread Reply:* Done!

- - - -
- ❤️ Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 00:06:41
-
-

*Thread Reply:* For now I'm prototyping in a separate repo https://github.com/limx0/caching_flow_runner/tree/open_lineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-17 01:55:08
-
-

*Thread Reply:* I really like your PR, @Brad. I think that using FlowRunner and TaskRunner may be a more "proper" way of doing this, as opposed as adding a state-handler to each task the way I do it.

- -

How are you dealing with Prefect-library tasks such as the included BigQuery-tasks and such? Is it necessary to create DatasetTask for them to show up in the lineage graph?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:04:19
-
-

*Thread Reply:* Hey @Thomas Fredriksen! At the moment I'm not dealing with any task-specific things. The plan (in my head, and after speaking with another prefect user @davzucky) would be that we add a LineageTask subclass where you could define custom facets on a per task basis

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:05:21
-
-

*Thread Reply:* or some sort of other hook where basically you would define some lineage attribute or put something in the prefect.context that the TaskRunner would find and attach

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:06:23
-
-

*Thread Reply:* Sorry I misread your question - any tasks should be automatically tracked (I believe but have not tested yet!)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-17 02:16:02
-
-

*Thread Reply:* @Brad Could you elaborate a bit on your ideas around adding custom context attributes?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:21:57
-
-

*Thread Reply:* yeah so basically we just need some hooks that you can easily access from the task decorator or somewhere else that we can pass through to the open lineage adapter to do things like custom facets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:24:31
-
-

*Thread Reply:* like for your bigquery example - you might want to record some facets like in https://github.com/OpenLineage/OpenLineage/blob/main/integration/common/openlineage/common/provider/bigquery.py and we need a way to do that with the Prefect bigquery task

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:28:28
-
-

*Thread Reply:* @davzucky

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-17 02:29:12
-
-

*Thread Reply:* I see. Is this supported by the airflow-integration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:29:32
-
-

*Thread Reply:* I think so, yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:30:51
- -
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 02:31:54
-
-

*Thread Reply:* (I don't actually use airflow or bigquery - but for my own use case I can see wanting to do thing like this)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-17 03:18:27
-
-

*Thread Reply:* Interesting, I like how dynamic this is

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Chris Baynes - (chris@contiamo.com) -
-
2021-09-15 09:09:21
-
-

HI all, I have a clarification question about dataset namespaces. What's the difference between a dataset namespace (in the input/output) and a dataSource name (in the dataSource facet)? -The dbt integration appears to set those to the same value (e.g. <snowflake://myprofile>), however it seems that Marquez assumes the dataset namespace to be a more generic concept (similar to a nice user provided name like the job namespace).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 09:29:25
-
-

*Thread Reply:* Hey. -Generally, dataSource name should be namespace of particular dataset.

- -

In some cases, like Postgres, dataSource facet is used to provide additionally connection strings, with info like particular host and port that we're connected to.

- -

In case of Snowflake - or Bigquery, or S3, or multiple systems where we have only "global" instance, so the dataSource facet does not carry any other additional information.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Chris Baynes - (chris@contiamo.com) -
-
2021-09-15 10:11:19
-
-

*Thread Reply:* Thanks. So then perhaps marquez could differentiate a bit more between job & dataset namespaces. Right now it doesn't quite feel right to have a single global list of namespaces for jobs & datasets, especially as they also have a separate concept of sources (which are not in a namespace).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 10:18:59
-
-

*Thread Reply:* @Willy Lulciuc what do you think?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Chris Baynes - (chris@contiamo.com) -
-
2021-09-15 10:41:20
-
-

*Thread Reply:* As an example, in marquez I have this list of namespaces (from some sample data): dbt-sales, default, <snowflake://my-account1>, <snowflake://my-account2>. -I think the new marquez UI with the nice namespace dropdown and job/dataset search is awesome, and I'd expect to be able to filter by job namespace everywhere, but how about being able to filter datasets by source (which would be populated by the OL dataset namespace) and not persist dataset namespaces in the global namespace table?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-15 18:38:03
-
-

The dbt integration (https://github.com/OpenLineage/OpenLineage/tree/main/integration/dbt) is pretty awesome but there are still a few improvements we could make. -Here are a few thoughts. -• In dbt-ol if the configuration is wrong or missing we will fail silently. This one seems like a good first thing to fix by logging the error to stdout -• We need to wait until the end to know if it worked at all. It would be nice if we checked the config at the beginning and display an error right away. Possibly by adding a parent job/run with a start event at the beginning and an end event at the end when all is done. -• While we are sending events at the end the console will hang until it’s done. It’s not clear that progress is made. We could have a simple progress bar by printing a dot for every event sent. (ex: sending 10 OpenLineage events: .........) -• We could also write at the beginning that the OL events will be sent at the end so that the user knows what to expect. -What do you think? (@Maciej Obuchowski in particular, but anyone using dbt in general)

- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-15 18:43:18
-
-

*Thread Reply:* Last point is that we should persist the configuration and not just have it in environment variables. What is the best way to do this in dbt?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 18:49:21
-
-

*Thread Reply:* We could have something similar to https://docs.getdbt.com/dbt-cli/configure-your-profile - or even put our config in there

- - - -
- ❤️ Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 18:51:42
-
-

*Thread Reply:* I think we should assume that variables/config should be set and valid - and fail the run if they aren't. After all, if someone wouldn't need lineage events, they wouldn't use our wrapper.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 18:56:36
-
-

*Thread Reply:* 3rd point would be easy to address if we could send events async/in parallel. But there could be dataset version dependencies, and we don't want to get into needless complexity of recognizing that, building a dag etc.

- -

We could batch events if the network roundtrips are responsible for majority of the slowdown. However, we can't assume any particular environment.

- -

Maybe just notifying about the progress is the best thing we can do right now.

- - - -
- 👀 Mario Measic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-15 18:58:22
-
-

*Thread Reply:* About second point, I want to add recognizing if we already have a parent run - for example, if running via airflow. If not, creating run for this purpose is a good idea.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-15 21:31:35
-
-

*Thread Reply:* @Maciej Obuchowski can you open github issues to propose those changes?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 09:11:31
-
-

*Thread Reply:* Done

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2021-09-16 12:05:10
-
-

*Thread Reply:* FWIW, I have been putting my config in ~/.openlineage/config so it can be mapped into a container

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 17:56:23
-
-

*Thread Reply:* Makes sense, also, all clients could use that config

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-18 04:47:08
-
-

*Thread Reply:* if dbt could actually stream the events, that would be great.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-18 09:59:12
-
-

*Thread Reply:* Unfortunately, this seems very unlikely for now, due to the fact that we rely on metadata files that dbt only produces after end of execution.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-15 22:52:09
-
-

The split of facets in their own schemas is ready to be merged: https://github.com/OpenLineage/OpenLineage/pull/118

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 00:12:02
-
-

Hey @Julien Le Dem I'm going to start a thread here for any issues I run into trying to build a prefect integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 00:16:44
-
-

*Thread Reply:* This might be useful to others https://github.com/OpenLineage/OpenLineage/pull/284

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 00:18:44
-
-

*Thread Reply:* So I'm trying to push a simple event to marquez, but getting the following response: -'{"code":400,"message":"Unable to process JSON"}' -The JSON I'm pushing:

- -

{ - "eventTime": "2021-09-16T04:00:28.343702", - "eventType": "START", - "inputs": {}, - "job": { - "facets": {}, - "name": "prefect.core.parameter.p", - "namespace": "default" - }, - "producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.0.0/integration/prefect>", - "run": { - "facets": {}, - "runId": "3bce33cb-9495-4c58-b326-6aac71634ace" - } -} -Does anything look obviously wrong here?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
marko - (marko.kristian.helin@gmail.com) -
-
2021-09-16 02:41:11
-
-

*Thread Reply:* What I did previously when debugging something like this was to remove half of the payload until I found the culprit. Binary search essentially. I was running Marquez locally, so probably could’ve enabled better logging as well. -Aren’t inputs and facets arrays?

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 03:14:54
-
-

*Thread Reply:* Thanks for the response @marko - this is a greatly reduced payload already (but I'll keep going). Yep they are supposed to be arrays (I've since fixed that)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 03:46:01
-
-

*Thread Reply:* okay it was my timestamp 🥲

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 19:07:16
-
-

*Thread Reply:* Okay - I've got a simply working example now https://github.com/limx0/caching_flow_runner/blob/open_lineage/caching_flow_runner/task_runner.py

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 19:07:37
-
-

*Thread Reply:* I might move this into a proper PR @Julien Le Dem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 19:08:12
-
-

*Thread Reply:* Successfully got a basic prefect flow working

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 02:11:53
-
-

A question about DatasetType - is there a representation for a file-like type? For files stored in S3/FTP/NFS etc (assuming a fully resolvable url)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 09:53:24
-
-

*Thread Reply:* I think there was some talk somewhere to actually drop the DatasetType concept; can't find where though.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 10:04:09
-
-

*Thread Reply:* I've taken a look at your repo. Looks great so far!

- -

One thing I've noticed I don't think you need to use any stuff from Marquez to emit events. It's lineage ingestion API is deprecated - you can just use openlineage-python client. If there's something you think it's missing from it, feel free to write that here or open issue.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 17:12:31
-
-

*Thread Reply:* And would that be replaced by just some Input/Output notion @Maciej Obuchowski?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 17:13:26
-
-

*Thread Reply:* Oh yeah I got a little confused by the single lineage endpoint - but I’ve realised how it all works now. I’m still using the marquez backend to view things but I’ll use the openlineage-client to talk to it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 17:34:46
-
-

*Thread Reply:* Yes 🙌

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-16 06:04:30
-
-

When trying to fix failing checks, i see integration-test-integration-airflow to fail -```#!/bin/bash -eo pipefail -if [[ GCLOUDSERVICEKEY,GOOGLEPROJECTID == "" ]]; then - echo "No required environment variables to check; moving on" -else - IFS="," read -ra PARAMS <<< "GCLOUDSERVICEKEY,GOOGLEPROJECTID"

- -

for i in "${PARAMS[@]}"; do - if [[ -z "${!i}" ]]; then - echo "ERROR: Missing environment variable {i}" >&2

- -
  if [[ -n "" ]]; then
-    echo "" &gt;&amp;2
-  fi
-
-  exit 1
-else
-  echo "Yes, ${i} is defined!"
-fi
-
- -

done -fi

- -

ERROR: Missing environment variable {i}

- -

Exited with code exit status 1 -CircleCI received exit code 1``` -However i havent touch airflow at all.. can somebody help please?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 06:59:34
-
-

*Thread Reply:* Hey, Airflow integration tests do not pass env variables to PRs from forks due to security reasons - everyone could create malicious PR and dump secrets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 07:00:29
-
-

*Thread Reply:* So, they will fail and there's nothing to do from your side 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 07:00:55
-
-

*Thread Reply:* We probably should split those into ones that don't touch external systems, and run those for all PRs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-16 07:08:03
-
-

*Thread Reply:* ah okie. good to know. -and in build-integration-spark Could not resolve all artifacts. Is that also known issue? Or something from my side that i could fix?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 07:11:12
-
-

*Thread Reply:* Looks like gradle server problem? -&gt; Could not get resource '<https://plugins.gradle.org/m2/com/diffplug/spotless/spotless-lib/2.13.2/spotless-lib-2.13.2.module>'. - &gt; Could not GET '<https://plugins.gradle.org/m2/com/diffplug/spotless/spotless-lib/2.13.2/spotless-lib-2.13.2.module>'. Received status code 500 from server: Internal Server Error

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 07:34:44
-
-

*Thread Reply:* After retry, there's spotless error:

- -

+········.orElse(Collections.emptyList()).stream()

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 07:35:15
-
-

*Thread Reply:* I think this is due to mismatch between behavior of spotless in Java 8 and Java 11+ - which you probably used 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-16 07:40:01
-
-

*Thread Reply:* ah.. i used java11. so shall i rerun something with java8 setup as sdk?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-16 07:44:31
-
-

*Thread Reply:* For spotless, you can just fix this one line 🙂 -Though I don't guarantee that tests that run later will pass, so you might need Java 8 for later testing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-16 08:04:36
-
-

*Thread Reply:* yup looks better now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-16 08:04:41
-
-

*Thread Reply:* thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomas Satka - (satka.tomas@gmail.com) -
-
2021-09-16 14:27:02
-
-

*Thread Reply:* will somebody please review my PR? had to already adjust due to updates on same test class 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 20:36:28
-
-

Hey team - I've opened https://github.com/OpenLineage/OpenLineage/pull/293 for a very WIP prefect integration

- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-16 20:37:27
-
-

*Thread Reply:* @Thomas Fredriksen would love any feedback

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-09-17 04:21:13
-
-

*Thread Reply:* nicely done! As we discussed in another thread - the way you have implemented lineage using FlowRunner and TaskRunner is likely the best way to do this. Let me know if you need any help, I would love to see this PR get merged!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-17 07:28:33
-
-

*Thread Reply:* Hey @Brad, it looks great!

- -

I've seen you're using task_qualified_name to name datasets and I don't think it's the right way. -I'd take a look at naming conventions here: https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

- -

Getting that right is key to making sure that lineage is properly tracked between systems - for example, if you use Prefect to schedule dbt runs or pyspark jobs, the unified naming makes sure that all those integrations properly refer to the same dataset.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-17 08:12:50
-
-

*Thread Reply:* Hey @Maciej Obuchowski thanks for the feedback. Yep the naming was a bit of a placeholder. Open to any recommendations.. I think things like dbt or pyspark are straight forward (we could add special handling for tasks like that) but what about regular transformation type tasks that run in a scheduler? Do you have any naming preference? Say I just had some pandas transform task in prefect for example

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-17 08:28:04
-
-

*Thread Reply:* First of all, not all tasks are producing and consuming datasets. For example, I wouldn't expect any of the Github tasks to have any datasets.

- -

Second, in Airflow we have a concept of Extractor where you can write specialized code to expose datasets. For example, for BigQuery we extract datasets from query plan. Now, I'm not sure if this concept would translate well to Prefect - but if yes, then we have some helpers inside openlineage common library that could be reused. Also, this way allows to emit additional facets, some of which are really useful - like query statistics for BigQuery, and data quality tests for dbt.

- -

Third, if we're talking about generalized tasks like FunctionTask or ShellTask, then I think the right way is to expose functionality to user to expose lineage themselves. I'm not sure how exactly that would look in Prefect.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-19 23:03:14
-
-

*Thread Reply:* You've raised some good points @Maciej Obuchowski - I might have been thinking about this integration in slightly the wrong way. I think based on your comments I'll refactor some of the code to hook into the Results object in prefect (The Result object is the way in which data is serialized and persisted).

- -

> Now, I'm not sure if this concept would translate well to Prefect - but if yes, then we have some helpers inside openlineage common library that could be reused -This definitely applies to prefect and the similar tasks exist in prefect and we should definitely leverage the common library in this case.

- -

> Third, if we're talking about generalized tasks like FunctionTask or ShellTask, then I think the right way is to expose functionality to user to expose lineage themselves. I'm not sure how exactly that would look in Prefect. -Yeah I agree with this. I'd like to make it as easy a possible to opt-in, but I think you're right that there needs to be some hooks for user defined lineage. I'll think about this a little more.

- -

> First of all, not all tasks are producing and consuming datasets. For example, I wouldn't expect any of the Github tasks to have any datasets. -My initial thoughts here were that it would still be good to have lineage as these tasks do have side effects, and downstream consumers of the lineage data might want to know about these tasks. However I don't have a good feeling yet how best to do this, so I'm going to park those thoughts for now.

- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-20 06:30:51
-
-

*Thread Reply:* > Yeah I agree with this. I'd like to make it as easy a possible to opt-in, but I think you're right that there needs to be some hooks for user defined lineage. I'll think about this a little more. -First version of an integration doesn't have to be perfect. in particular, not handling this use case would be okay, since it does not lock us into some particular way of doing it later.

- -

> My initial thoughts here were that it would still be good to have lineage as these tasks do have side effects, and downstream consumers of the lineage data might want to know about these tasks. However I don't have a good feeling yet how best to do this, so I'm going to park those thoughts for now. -I'd think of two options first, before modeling it as a dataset: -Won't existence of a event be enough? After all, we'll still have it despite it not having any input and output datasets. -If not, then wouldn't custom run or job facet be a better fit?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-09-23 17:27:49
-
-

*Thread Reply:* > Won’t existence of a event be enough? After all, we’ll still have it despite it not having any input and output datasets. -Duh, yep you’re right @Maciej Obuchowski, I’m over thinking this. I’m going to clean this up based on your comments

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-10-06 03:39:28
-
-

*Thread Reply:* Hi @Brad. How will this integration work for Prefect flows running in Prefect Cloud or on Prefect Server?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-06 03:40:44
-
-

*Thread Reply:* Hi @Thomas Fredriksen - it'll relate to the agent actually - you'll need to pass the flow runner class to the agent when running

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-10-06 03:48:14
-
-

*Thread Reply:* nice!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-06 03:48:54
-
-

*Thread Reply:* Unfortunately I've been a little busy the past week, and I will be for the rest of this week

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-06 03:49:09
-
-

*Thread Reply:* but I do plan to pick this up next week

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-06 03:49:23
-
-

*Thread Reply:* (the additional changes I mention above)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas Fredriksen - (thomafred90@gmail.com) -
-
2021-10-06 03:50:08
-
-

*Thread Reply:* looking forward to it 🙂 let me know if you need any help!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-06 03:50:34
-
-

*Thread Reply:* yeah when I get this next lot of stuff in - I'd love for people to test it out

- - - -
- 🙌 Thomas Fredriksen, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Adam Pocock - (adam.pocock@oracle.com) -
-
2021-09-20 17:38:51
-
-

Is there a preferred academic citation for OpenLineage? I’m writing a paper on the provenance system in our machine learning library, and I’d like to cite OpenLineage as an example of future work on data lineage to integrate with.

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-20 19:18:53
-
-

*Thread Reply:* I think you can reffer to https://openlineage.io/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-09-20 19:31:30
-
-

We’re starting to see the beginning of larger contributions (Spark streaming, prefect, …) and I think we need to define a way to accept those contributions incrementally. -If we take the example of Streaming (Spark streaming, Flink or Beam) support (but really this applies in general, sorry to pick on you Tomas, this is great!): -The first Spark streaming PR ( https://github.com/OpenLineage/OpenLineage/pull/279 ) lays the ground work for testing spark streaming but there’s more work to have a full feature. -I’m in favor of merging Spark streaming support into main once it’s working end to end (possibly with partial input/output coverage). -So I see 2 options:

- -
  1. start a branch for spark streaming support. Have PRs like this one go into it until it’s completed (smaller reviews). Then merge the whole thing as a PR in main when it’s finished
  2. Keep working on that PR until it’s fully implemented, but it will get big, and make reviews difficult. -I have seen the model 1) work well. It’s easier to do multiple smaller reviews for larger projects.
  3. -
-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
- 👍 Ross Turk, Maciej Obuchowski, Faouzi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Endrion - (yannick.endrion@gmail.com) -
-
2021-09-24 05:10:04
-
-

Thank you @Ross Turk for this really useful article: https://openlineage.io/blog/dbt-with-marquez/?s=03 -Is anyone aware of additional environment being supported by the dbt<->OpenLineage<->Marquez integration ? I think only Snowflake and BigQuery are supported now. -I am really interested by SQLServer or even Dremio (which could be great because capable of read from multiples DB).

- -

Thank you

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🎉 Minkyu Park, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-24 05:15:31
-
-

*Thread Reply:* It should be really easy to add additional databases. Basically, we'd need to know how to get namespace for that database: https://github.com/OpenLineage/OpenLineage/blob/main/integration/common/openlineage/common/provider/dbt.py#L467

- -

The first step should be to add SQLServer or Dremio to the dataset naming schema here https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Endrion - (yannick.endrion@gmail.com) -
-
2021-10-04 16:22:59
-
-

*Thread Reply:* Thank you @Maciej Obuchowski, -I tried to give it a try but without success yet. Not sure where I am suppose to add the sqlserver naming schema... -If you have any documentation that I could read I would be glad =) -Many thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 15:13:43
-
-

*Thread Reply:* This would be adding a paragraph similar to this one: https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md#snowflake

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 15:14:30
-
-

*Thread Reply:* Snowflake -See: Object Identifiers — Snowflake Documentation -Datasource hierarchy: -• account name -Naming hierarchy: -• Database: {database name} => unique across the account -• Schema: {schema name} => unique within the database -• Table: {table name} => unique within the schema -Identifier: -• Namespace: snowflake://{account name} - ◦ Scheme = snowflake - ◦ Authority = {account name} -• Name: {database}.{schema}.{table} - ◦ URI = snowflake://{account name}/{database}.{schema}.{table}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marty Pitt - (martypitt@vyne.co) -
-
2021-09-24 06:53:05
-
-

Hi all. I'm the Founder / CTO of a data discovery & transformation platform that captures very rich lineage information. We're interested in exposing / making our lineage data consumable via open standards, which is what lead me to this project. A couple of questions:

- -

A) Am I right in considering that's the goal of this project? -B) Are you also considering provedance as well as lineage? -C) What's a good starting point to understand the models we should be exposing our data in, to make it consumable?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marty Pitt - (martypitt@vyne.co) -
-
2021-09-24 07:06:20
-
-

*Thread Reply:* For clarity on the provedance vs lineage point (in case I'm using those terms incorrectly...)

- -

Our platform performs automated enrichment and processing of data. In doing so, we often make calls to functions or out to other data services (such as APIs, or SELECTs against databases). We capture the inputs that pass to these, along with the outputs. (And, if the input is derived from other outputs, we capture the full chain, right back to the root).

- -

That's the kinda stuff our customers are really interested in, and we feel like there's value in making is consumable.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-24 08:47:35
-
-

*Thread Reply:* Not sure I understand you right, but are you interested in tracking individual API calls, and for example, values of some parameters passed for one call?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-24 08:51:16
-
-

*Thread Reply:* I guess that's not in OpenLineage scope, as we're interested more in tracking metadata for whole datasets. But I might be wrong, some other people might chime in.

- -

We could of course model this situation, but that would capture for example schema of those parameters. Not their values.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-24 08:52:16
-
-

*Thread Reply:* I think this might be better suited for https://opentelemetry.io/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marty Pitt - (martypitt@vyne.co) -
-
2021-09-24 10:55:54
-
-

*Thread Reply:* Kinda, but not really. Telemetery data is metadata about the API calls. We have that, but it's not interesting to our customers. It's the metadata about the data that Vyne provides that we want to expose.

- -

Our customers use Vyne to fetch data from lots of different sources. Eg:

- -

> "Whenever a trade is booked, calculate it's compliance against these regulations, to report to the regulators". -or

- -

> "Whenever a customer buys a $thing, capture the transaction data, client data, and account data, and store it in this table." -Providing answers to those questions involves fetching and transforming data, before storing it, or outputting it. We capture all that data, on a per-attribute basis, so we can answer the question "how did we get this value?" That's the lineage information we want to publish.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-09-30 15:10:51
-
-

*Thread Reply:* The core OpenLineage model is documented at https://github.com/OpenLineage/OpenLineage/#core-model . The model is really focused on Jobs and Datasets. Jobs have Runs which have start and end times (typically scheduled start/end times as well) and read from and/or write to the target datasets. If your transformation chain fits within that model, then I think you can definitely record and share the lineage information with your customers. The existing implementations are all focused on batch data access, though streaming should be possible to capture as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Drew Bittenbender - (drew@salt.io) -
-
2021-09-29 11:10:29
-
-

Hello. I am trying the openlineage-airflow integration with Marquez as the backend and have 3 questions.

- -
  1. Does it only work for PostgresOperators?
  2. Which is the recommended integration: marquez-airflow or openlineage-airflow
  3. How do you enable more detailed logging? I tried OPENLINEAGELOGLEVEL and MARQUEZLOGLEVEL and neither seemed to affect logging. I assume this is logged to the airflow worker
  4. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Faouzi - (faouzi@dataroots.io) -
-
2021-09-29 13:46:59
-
-

*Thread Reply:* Hello @Drew Bittenbender!

- -

For your two first questions:

- -

• Yes right now only the PostgresOperator is integrated. I learnt it the hard way ^_^. Spent hours trying with MySQL. There were attempts to integrate with MySQL actually. If engineers do not integrate it I will allocate myself some time to try to implement other airflow db operators. -• Use the openlineage one. It is the recommended approach now.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Drew Bittenbender - (drew@salt.io) -
-
2021-09-29 13:49:41
-
-

*Thread Reply:* Thank you @Faouzi. Is there any documentation/best practices to write your own extractor, or is it "read the code"? We use the Python, Docker and SSH operators a lot. Maybe those don't fit into the lineage paradigm well, but want to give it a shot

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Faouzi - (faouzi@dataroots.io) -
-
2021-09-29 13:52:16
-
-

*Thread Reply:* To the best of my knowledge there is no documentation to guide through the design of your own extractor. So yes we need to read the code. Here a link where you can see how they did for postgre extractor and others. https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow/openlineage/airflow/extractors

- - - -
- 👍 Drew Bittenbender -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-09-30 05:08:53
-
-

*Thread Reply:* I think in case of "bring your own code" operators like Python or Docker ones, it might be better to use lineage_run_id macro and use openlineage-python library inside, instead of implementing extractor.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-09-30 15:14:47
-
-

*Thread Reply:* I think @Maciej Obuchowski is right here. The airflow integration will create the parent jobs, but to get the dataset input/output links, it's best to do that directly from the python/docker scripts. If you report the parent run id, Marquez will link the jobs together correctly

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 15:09:55
-
-

*Thread Reply:* To clarify on what airflow operators are supported out of the box: -• postgres -• bigquery -• snowflake -• Great expectations (with extra config) -See: https://github.com/OpenLineage/OpenLineage/blob/3a1ccbd854bbf202bbe6437bf81786cb01[…]ntegration/airflow/openlineage/airflow/extractors/extractors.py -Mysql is not at the moment. We should track it as an issue

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yuki Tannai - (tannai-yuki@dmm.com) -
-
2021-09-30 09:21:35
-
-

Hi there! -I’m trying to enhance the lineage functionality of a data infrastructure I’m working on. -All of the tools I found only visualize the relationships between tables before and after the transformation, but the DataHub RFC discusses Field Level Lineage, which I thought was close to the functionality I was looking for. -Does OpenLineage support the same functionality? -https://datahubproject.io/docs/rfc/active/1841-lineage/field_level_lineage/

-
-
datahubproject.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 15:03:40
-
-

*Thread Reply:* OpenLineage doesn’t have field level lineage yet. Here is the proposal for adding it: https://github.com/OpenLineage/OpenLineage/issues/148

-
- - - - - - - -
-
Labels
- proposal -
- -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
- 👀 Yuki Tannai, Ricardo Gaspar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 15:04:36
-
-

*Thread Reply:* Those two specs look compatible, so Datahub should be able to consume this lineage metadata in the future

- - - -
- 👍 Yuki Tannai -
- -
-
-
-
- - - - - -
-
- - - - -
- -
павел клопотюк - (klopotuk@gmail.com) -
-
2021-10-04 14:27:24
-
-

Hello, everyone. I'm trying to work with OL and Airflow 2.1.4 and it doesn't work. I found that OL is supported for Airflow 1.10.12++. Does it support Airflow 2.X.Y?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2021-10-04 15:38:47
-
-

*Thread Reply:* Hi! Airflow 2.x is currently in development - you can follow along with the progress here: -https://github.com/OpenLineage/OpenLineage/issues/205

-
- - - - - - - -
-
Assignees
- mobuchowski -
- -
-
Comments
- 5 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
павел клопотюк - (klopotuk@gmail.com) -
-
2021-10-05 03:01:54
-
-

*Thread Reply:* Thank you for your reply!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 15:02:23
-
-

*Thread Reply:* There should be a first version of Airflow 2.X support soon: https://github.com/OpenLineage/OpenLineage/pull/305 -We’re labelling it experimental because the config step might change as discussion in the airflow github evolve. It will track succesful jobs in its current state.

-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-04 23:14:26
-
-

Hi All, I’m working on openlineage-dbt integration with Marquez as backend. I want to integrate OL with DBT cloud, would you please help to provide steps that I need to follow?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-05 04:18:42
-
-

*Thread Reply:* Take a look at this: https://docs.getdbt.com/docs/dbt-cloud/dbt-cloud-api/metadata/metadata-overview

-
-
docs.getdbt.com
- - - - - - - - - - - - - - - - - -
- - - -
- ✅ SAM -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 14:58:24
-
-

*Thread Reply:* @SAM Let us know of your progress.

- - - -
- 👍 SAM -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-05 16:23:41
-
-

Hey folks 😊 -I’m trying to run dbt-ol with Redshift target, but I get the following error message -Traceback (most recent call last): - File "/usr/local/bin/dbt-ol", line 61, in &lt;module&gt; - main() - File "/usr/local/bin/dbt-ol", line 54, in main - events = processor.parse().events() - File "/usr/local/lib/python3.8/site-packages/openlineage/common/provider/dbt.py", line 97, in parse - self.extract_dataset_namespace(profile) - File "/usr/local/lib/python3.8/site-packages/openlineage/common/provider/dbt.py", line 368, in extract_dataset_namespace - self.dataset_namespace = self.extract_namespace(profile) - File "/usr/local/lib/python3.8/site-packages/openlineage/common/provider/dbt.py", line 382, in extract_namespace - raise NotImplementedError( -NotImplementedError: Only 'snowflake' and 'bigquery' adapters are supported right now. Passed redshift -I know that Redshift is not the best cloud DWH we can use… 😅 -But, still….do you have any plan to support it? -Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-05 16:41:30
-
-

*Thread Reply:* Hey, can you create ticket in OpenLineage repository? FWIW Redshift is very similar to postgres, so supporting it won't be hard.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-05 16:43:39
-
-

*Thread Reply:* Hey @Maciej Obuchowski 😊 -Yep, will do now! Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-05 16:46:26
-
-

*Thread Reply:* Well...will do tomorrow morning 😅

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-06 03:03:16
-
-

*Thread Reply:* Here’s the issue: https://github.com/OpenLineage/OpenLineage/issues/318

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 14:51:08
-
-

*Thread Reply:* Thanks a lot. I pulled it in the current project.

- - - -
- 👍 ale -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 05:48:28
-
-

*Thread Reply:* @Julien Le Dem @Maciej Obuchowski I’m not familiar with dbt-ol codebase, but I’m willing to help on this if you guys can give me a bit of guidance 😅

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 05:53:05
-
-

*Thread Reply:* @ale can you help us define naming schema for redshift, as we have for other databases? https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 05:53:21
-
-

*Thread Reply:* Sure!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 05:54:21
-
-

*Thread Reply:* will work on this today and I’ll try to submit a PR by EOD

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 06:36:12
-
-

*Thread Reply:* There you go https://github.com/OpenLineage/OpenLineage/pull/324

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 06:39:35
-
-

*Thread Reply:* Host would be something like -examplecluster.&lt;XXXXXXXXXXXX&gt;.<a href="http://us-west-2.redshift.amazonaws.com">us-west-2.redshift.amazonaws.com</a> -right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 07:13:51
-
-

*Thread Reply:* Yep, let me update the PR

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 07:27:42
-
-

*Thread Reply:* Done

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 07:31:40
-
-

*Thread Reply:* 🙌

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 07:35:30
-
-

*Thread Reply:* If you want to look at dbt integration itself, there are two things:

- -

We need to determine how Redshift adapter reports metrics https://github.com/OpenLineage/OpenLineage/blob/610a687bf69df2b52ec4ac4da80b4a05580e8d32/integration/common/openlineage/common/provider/dbt.py#L412

- -

And how we can create namespace and job name based on the job naming schema that you created: -https://github.com/OpenLineage/OpenLineage/blob/610a687bf69df2b52ec4ac4da80b4a05580e8d32/integration/common/openlineage/common/provider/dbt.py#L512

- -

One thing how to get this info is to run the dbt yourself and look at resulting metadata files - in target dir of the dbt directory

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 08:33:31
-
-

*Thread Reply:* I figured out how to generate the namespace. -But I can’t understand which of the JSON files is inspected for metrics. Is it run_results.json ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 09:48:50
-
-

*Thread Reply:* yes, run_results.json - it's different in bigquery and snowflake, so I presume it's different in redshift too

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 11:02:32
-
-

*Thread Reply:* Ok thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 11:11:57
-
-

*Thread Reply:* Should be stats:rows:value

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 11:19:59
-
-

*Thread Reply:* Regarding namespace: if env_var is used in profiles.yml , how is this handled now?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 11:44:50
-
-

*Thread Reply:* Well, it isn't. This is relevant only if you passed cluster hostname this way, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-08 11:53:52
-
-

*Thread Reply:* Exactly

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 07:10:38
-
-

*Thread Reply:* If you think it make sense, I can submit a PR to handle dbt profile with env_var

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:18:01
-
-

*Thread Reply:* Do you want to run jinja on the dbt profile?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:20:18
-
-

*Thread Reply:* Theoretically, we'd need to run it also on dbt_project.yml , but we only take target path and profile name from it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 07:20:32
-
-

*Thread Reply:* The env_var syntax in the profile is quite simple, I was thinking of extracting the env var name using re and then retrieving the value from os

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:23:59
-
-

*Thread Reply:* It would work, but we can actually use jinja - if you're using dbt, it's already included. -The method is pretty simple: -``` @contextmember - @staticmethod - def envvar(var: str, default: Optional[str] = None) -> str: - """The envvar() function. Return the environment variable named 'var'. - If there is no such environment variable set, return the default.

- -
    If the default is None, raise an exception for an undefined variable.
-    """
-    if var in os.environ:
-        return os.environ[var]
-    elif default is not None:
-        return default
-    else:
-        msg = f"Env var required but not provided: '{var}'"
-        undefined_error(msg)```
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 07:25:07
-
-

*Thread Reply:* Oh cool! -I will definitely use this one!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:25:09
-
-

*Thread Reply:* We'd be sure that our implementation matches dbt's one, right? Also, you'd support default method for free

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 07:26:34
-
-

*Thread Reply:* So this env_varmethod is defined in dbt and not in OpenLineage codebase, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:27:01
-
-

*Thread Reply:* yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:27:14
-
-

*Thread Reply:* dbt is on Apache license 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 07:28:06
-
-

*Thread Reply:* Should we import dbt package and use the method or should we just copy/paste the method inside OpenLineage codebase?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 07:28:28
-
-

*Thread Reply:* I’m asking for guidance here 😊

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:34:44
-
-

*Thread Reply:* I think we should just do basic jinja template rendering in our code like in the quick example: https://realpython.com/primer-on-jinja-templating/#quick-examples

- -

just with the env_var method passed to the render method 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 07:37:05
-
-

*Thread Reply:* basically, here in the code we should read the file, do the jinja render, and load yaml from string instead of straight from file -https://github.com/OpenLineage/OpenLineage/blob/610a687bf69df2b52ec4ac4da80b4a05580e8d32/integration/common/openlineage/common/provider/dbt.py#L176

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 07:38:53
-
-

*Thread Reply:* ok, got it. -Will try to implement following your suggestions. -Thanks @Maciej Obuchowski 🙌

- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 08:36:13
-
-

*Thread Reply:* We need to:

- -
  1. load the template profile from the profile.yml
  2. replace any env vars we found -For the first step, we can use jinja2.Template -However, to replace the env vars we find, we have to actually search for those env vars… 🤔
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 08:43:06
-
-

*Thread Reply:* The dbt method implements that: -``` @contextmember - @staticmethod - def envvar(var: str, default: Optional[str] = None) -> str: - """The envvar() function. Return the environment variable named 'var'. - If there is no such environment variable set, return the default.

- -
    If the default is None, raise an exception for an undefined variable.
-    """
-    if var in os.environ:
-        return os.environ[var]
-    elif default is not None:
-        return default
-    else:
-        msg = f"Env var required but not provided: '{var}'"
-        undefined_error(msg)```
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 08:45:54
-
-

*Thread Reply:* Ok, but I need to pass var to the env_var method. -And to pass the var value, I need to look into the loaded Template and search for env var names…

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 08:46:54
-
-

*Thread Reply:* that's what jinja does - you're passing function to jinja render, and it's calling it itself

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 08:47:45
-
-

*Thread Reply:* you can try the quick example from here, but just pass the env_var method (slightly adjusted - as a standalone function and without undefined error) and call it inside the template: https://realpython.com/primer-on-jinja-templating/#quick-examples

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 08:51:19
-
-

*Thread Reply:* Ok, will try

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 09:37:49
-
-

*Thread Reply:* I’m trying to run -pip install -e ".[dev]" -so that I can test my changes, but I get -ERROR: Could not find a version that satisfies the requirement openlineage-integration-common[dbt]==0.2.3 (from openlineage-dbt[dev]) (from versions: 0.0.1rc7, 0.0.1rc8, 0.0.1, 0.1.0rc5, 0.1.0, 0.2.0, 0.2.1, 0.2.2) -ERROR: No matching distribution found for openlineage-integration-common[dbt]==0.2.3 -I don’t understand what I’m doing wrong…

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 09:41:47
-
-

*Thread Reply:* can you try installing it manually?

- -

pip install openlineage-integration-common[dbt]==0.2.3

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 09:42:13
-
-

*Thread Reply:* I mean, it exists in pypi: https://pypi.org/project/openlineage-integration-common/#files

-
-
PyPI
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 09:44:57
-
-

*Thread Reply:* Yep, maybe it’s our internal Pypi repo which is not synced. -Installing from the public pypi resolved the issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 12:04:55
-
-

*Thread Reply:* Can;’t seem to make env_var working as the render method of a Template 😅

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 12:57:07
-
-

*Thread Reply:* try this:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 12:57:09
-
-

*Thread Reply:* ```import os -from typing import Optional -from jinja2 import Template

- -

def envvar(var: str, default: Optional[str] = None) -> str: - """The envvar() function. Return the environment variable named 'var'. - If there is no such environment variable set, return the default.

- -
If the default is None, raise an exception for an undefined variable.
-"""
-if var in os.environ:
-    return os.environ[var]
-elif default is not None:
-    return default
-else:
-    msg = f"Env var required but not provided: '{var}'"
-    raise Exception("")
-
- -

if name == 'main': - t = Template("Hello {{ envvar('ENVVAR') }}!") - print(t.render(envvar=envvar))```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-11 12:57:42
-
-

*Thread Reply:* works for me: -mobuchowski@thinkpad [18:57:14] [~] --&gt; % ENV_VAR=world python jinja_example.py -Hello world!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-11 16:59:13
-
-

*Thread Reply:* Finally 😅 -https://github.com/OpenLineage/OpenLineage/pull/328

- -

There are minimal tests for Redshift and env vars. -Feedbacks and suggestions are welcome!

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 03:10:45
-
-

*Thread Reply:* Hi @Maciej Obuchowski 😊 -Regarding this comment https://github.com/OpenLineage/OpenLineage/pull/328#discussion_r726586564

- -

How can we distinguish between snowflake, bigquery and redshift in this method?

- -

A simple, but not very clean solution, would be to split this -bytes = get_from_multiple_chains( - node.catalog_node, - [ - ['stats', 'num_bytes', 'value'], # bigquery - ['stats', 'bytes', 'value'], # snowflake - ['stats', 'size', 'value'] # redshift (Note: size = count of 1MB blocks) - ] - ) -into two pieces, one checking for snowflake and bigquery and the other checking for redshift.

- -

A better solution would be to have the profile type inside method node_to_output_dataset , but I’m struggling understanding how to do that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 05:35:00
-
-

*Thread Reply:* Well, why not do something like

- -

```bytes = getfrommultiple_chains(... rest of stuff)

- -

if adapter == 'redshift': - bytes = 10241024```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 05:36:49
-
-

*Thread Reply:* we can store adapter type in the class

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 05:38:47
-
-

*Thread Reply:* well, I've looked at last commit and that's exactly what you did 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 05:40:35
-
-

*Thread Reply:* Now, have you tested your branch on real redshift cluster? I don't think we 100% need automated tests for that now, but would be nice to have confirmation that it works.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 06:35:04
-
-

*Thread Reply:* Not yet, but I'll try to do that this afternoon. -Need to figure out how to build the lib locally, then I can use it to test with Redshift

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 06:40:58
-
-

*Thread Reply:* I think pip install -e .[dbt] in common directory should be enough

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 09:29:13
-
-

*Thread Reply:* I was able to run my local branch with my Redshift cluster and metadata is pushed to Marquez. -However, I’m not sure about the namespace . -I also see exceptions in Marquez logs

- -
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 09:33:26
-
-

*Thread Reply:* namespace: well, if it matches what you put into your profile, there's not much we can do. I don't understand why you connect to redshift via host, maybe this is related to IAM?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 09:44:17
-
-

*Thread Reply:* I think the marquez error is because we don't send SourceCodeLocationJobFacet

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 09:46:17
-
-

*Thread Reply:* Regarding the namespace, I will check it and figure it out 😊 -Regarding the error: in the context of this PR, is it something I should worry about or not?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 09:54:17
-
-

*Thread Reply:* I think not in the context of the PR. It certainly deserves separate issue in Marquez repository.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 10:24:38
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 10:24:51
-
-

*Thread Reply:* Is there anything else I can do to improve the PR?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 10:27:44
-
-

*Thread Reply:* did you figure out the namespace stuff? -I think it's ready to be merged outside of that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 10:49:06
-
-

*Thread Reply:* Not yet

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 10:58:07
-
-

*Thread Reply:* Ok i figured it out. -When running dbt locally, we connect to Redshift using an SSH tunnel. -dbt runs on Docker, hence it can access the tunnel using host.docker.internal

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 10:58:16
-
-

*Thread Reply:* So the namespace is correct

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 11:04:12
-
-

*Thread Reply:* Makes sense. So, let's merge it, after DCO bot gets up again.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 11:04:37
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-13 05:29:48
-
-

*Thread Reply:* merged your PR 🙌

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-13 10:54:09
-
-

*Thread Reply:* 🎉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-13 12:01:20
-
-

*Thread Reply:* I think I'm going to change it up a bit. -The problem is that we can try to render jinja everywhere, including comments. -I tried to make it skip unknown methods and values here, but I think the right solution is to load the yaml, and then try to render jinja for values.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-13 14:27:37
-
-

*Thread Reply:* Ok sounds good to me!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-06 10:50:43
-
-

Hey there, I’m not sure why I’m getting below error, after I ran OPENLINEAGE_URL=<http://localhost:5000> dbt-ol run , although running this command dbt debug doesn’t show any error. Pls help.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-06 10:54:32
-
-

*Thread Reply:* Does it work with simply dbt run?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-06 10:55:51
-
-

*Thread Reply:* also, do you have dbt-snowflake installed?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-06 11:00:42
-
-

*Thread Reply:* it works with dbt run

- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-06 11:01:22
-
-

*Thread Reply:* no i haven’t installed dbt-snowflake

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-06 12:04:19
-
-

*Thread Reply:* what the dbt says - the snowflake profile with dev target - is that what you ment to run or was it something else?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-06 12:04:46
-
-

*Thread Reply:* it feels very weird to me, since the dbt-ol script just runs dbt run underneath

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-06 12:19:27
-
-

*Thread Reply:* this is my profiles.yml file: -```snowflake: - target: dev - outputs: - dev: - type: snowflake - account: xxxxxxx

- -
  # User/password auth
-  user: xxxxxx
-  password: xxxxx
-
-  role: poc_db_temp_fullaccess
-  database: POC_DB
-  warehouse: poc_wh
-  schema: temp
-  threads: 2
-  client_session_keep_alive: False
-  query_tag: dbt_ol```
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-06 12:26:39
-
-

*Thread Reply:* Yes, it looks that everything is okay on your side...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-06 12:28:19
-
-

*Thread Reply:* may be I’ll restart my machine and try again

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-06 12:30:25
-
-

*Thread Reply:* can you try -OPENLINEAGE_URL=<http://localhost:5000> dbt-ol debug

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-07 05:59:03
-
-

*Thread Reply:* Actually i had to use venv that fixed above issue. However, i ran into another problem which is no jobs / datasets found in marquez:

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-07 06:00:28
-
-

*Thread Reply:* Good that you fixed that one 🙂 Regarding last one, I've found it independently yesterday and PR fixing it is already waiting for review: https://github.com/OpenLineage/OpenLineage/pull/322

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-07 06:00:46
-
-

*Thread Reply:* oh, thanks a lot

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 14:50:01
-
-

*Thread Reply:* There will be a release soon: https://openlineage.slack.com/archives/C01CK9T7HKR/p1633631825147900

-
- - -
- - - } - - Willy Lulciuc - (https://openlineage.slack.com/team/U01DCMDFHBK) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 SAM -
- -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-07 23:23:26
-
-

*Thread Reply:* Hi, -openlineage-dbt==0.2.3 worked, thanks a lot for the quick fix.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Alex P - (alexander.pelivan@scout24.com) -
-
2021-10-07 07:46:16
-
-

Hi, I just started playing around with Marquez. When submitting some lineage data, after some experimenting, the visualisation becomes a bit cluttered with all the naive attempts of building a meaningful graph. Can I clear this up somehow? Or is there a tip, how to hide certain information?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Alex P - (alexander.pelivan@scout24.com) -
-
2021-10-07 07:46:59
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Alex P - (alexander.pelivan@scout24.com) -
-
2021-10-07 09:51:40
-
-

*Thread Reply:* So, as a quick fix, shutting down and re-starting the docker container resets everything. -./docker/up.sh

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-07 12:28:25
-
-

*Thread Reply:* I guess that it's the easiest way now. There should be API for that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-07 14:09:50
-
-

*Thread Reply:* @Alex P Yeah, we're realizing that being able to delete metadata is becoming very important. And, as @Maciej Obuchowski mentioned, dropping your entire database is the only way currently (not ideal!). We do have an issue in the Marquez backlog to expose delete APIs: https://github.com/MarquezProject/marquez/issues/754

-
- - - - - - - -
-
Labels
- feature, api -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-07 14:10:36
-
-

*Thread Reply:* A bit more discussion is needed though. Like what if a dataset is deleted, but you still want to keep track that it existed at some point? (i.e. soft vs hard deletes). But, for the case that you just want to clear metadata because you were testing things out, then yeah, that's more obvious and requires little discussion of the API upfront.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-07 14:12:52
-
-

*Thread Reply:* @Alex P I moved the delete APIs to the Marquez 0.20.0 release

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 14:39:03
-
-

*Thread Reply:* Thanks Willy.

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 14:48:37
-
-

*Thread Reply:* I have also updated a corresponding issue to track this in OpenLineage: https://github.com/OpenLineage/OpenLineage/issues/323

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 13:36:48
-
-

The next OpenLineage monthly meeting is on the 13th. https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting -please chime in here if you’d like a topic to be added to the agenda

- - - -
- 🙌 Willy Lulciuc, Maciej Obuchowski, Peter Hicks -
- -
- ❤️ Willy Lulciuc, Maciej Obuchowski, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 10:47:49
-
-

*Thread Reply:* Reminder that the meeting is today. See you soon

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 19:49:21
-
-

*Thread Reply:* The recording and notes of the meeting are now available: -https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting#MonthlyTSCmeeting-Oct13th2021

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-07 14:37:05
-
-

@channel: We’ve recently become aware that our integration with dbt no longer works with the latest dbt manifest version (v3), see original discussion. The manifest version change was introduced in dbt 0.21 , see diff. That said, we do have a fix: PR #322 contributed by @Maciej Obuchowski! Here’s our plan to rollout the openlineage-dbt hotfix for those using the latest version of dbt (NOTE: for those using an older dbt version, you will NOT not be affected by this bug):

- -

Releasing OpenLineage 0.2.3 with dbt v3 manifest support:

- -
  1. Branch off 0.2.2 tagged commit, and create a openlineage-0.2.x branch
  2. Cherry pick the commit with the dbt manifest v3 fix
  3. Release 0.2.3 batch release -We will be releasing 0.2.3 today. Please reach out to us with any questions!
  4. -
-
- - -
- - - } - - Samjhana Khettri - (https://openlineage.slack.com/team/U02EYPQNU58) -
- - - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 Mario Measic, Minkyu Park, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 14:55:35
-
-

*Thread Reply:* For people following along, dbt changed the schema of its metadata which broke the openlineage integration. However we were a bit too stringent on validating the schema version (they increment it every time event if it’s backwards compatible, which it is in this case). We will fix that so that future compatible changes don’t prevent the ol integration to work.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-07 16:44:28
-
-

*Thread Reply:* As one of the main integrations, would be good to connect more within the dbt community for the next releases, by testing the release candidates 👍

- -

Thanks for the PR

- - - -
- 💯 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-07 16:46:40
-
-

*Thread Reply:* Yeah, I totally agree with you. We also should be more proactive and also be more aware in what’s coming in future dbt releases. Sorry if you were effected by this bug :ladybug:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-07 18:12:22
-
-

*Thread Reply:* We’ve release OpenLineage 0.2.3 with the hotfix for adding dbt v3 manifest support, see https://github.com/OpenLineage/OpenLineage/releases/tag/0.2.3

- -

You can download and install openlineage-dbt 0.2.3 with the fix using:

- -

$ pip3 install openlineage-dbt==0.2.3

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Drew Bittenbender - (drew@salt.io) -
-
2021-10-07 19:02:37
-
-

Hello. I have a question about dbt-ol. I run dbt in a docker container and alias the dbt command to execute in that docker container. dbt-ol doesn't seem to use that alias. Do you know of a way to force it to use the alias?...or is there an alternative to getting the linage into Marquez?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-07 21:10:36
-
-

*Thread Reply:* @Maciej Obuchowski might know

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 04:23:17
-
-

*Thread Reply:* @Drew Bittenbender dbt-ol always calls dbt command now, without spawning shell - so it does not have access to bash aliases.

- -

Can you elaborate about your use case? Do you mean that dbt in your path does docker run or something like this? It still might be a problem if we won't have access to artifacts generated by dbt in target directory.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Drew Bittenbender - (drew@salt.io) -
-
2021-10-08 10:59:32
-
-

*Thread Reply:* I am running on a mac and I have aliased (.zshrc) dbt to execute docker run against the fishtownanalytics docker image rather than installing dbt natively (homebrew, etc). I am doing this so that the dbt configuration is portable and reusable by others.

- -

It seems that by installing openlineage-dbt in a virtual environment, it pulls down it's own version of dbt which it calls inline rather than shelling out and executing the dbt setup resident in the host system. I understand that opening a shell is a security risk so that is understandable.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 11:05:00
-
-

*Thread Reply:* It does not pull down, it just assumes that it's in the system. It would fail if it isn't.

- -

For now I think you could build your own image based on official one, and install openlineage-dbt inside, something like:

- -

FROM fishtownanalytics/dbt:0.21.0 -RUN pip install openlineage-dbt -ENTRYPOINT ["dbt-ol"]

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 11:05:15
-
-

*Thread Reply:* and then pass OPENLINEAGE_URL in env while doing docker run

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 11:06:55
-
-

*Thread Reply:* Also, to make sure that using shell would help in your case: do you bind mount your dbt directory to home? dbt-ol can't run without access to dbt's target directory, so if it's not visible in host, the only option is to have dbt-ol in container.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-08 07:00:43
-
-

Hi, I found below issues, not sure what is the root-cause:

- -
  1. Marquez UI does not show any jobs/datasets, but if I search my table name then only it shows in search result section.
  2. After running dbt docs generate there is not schema information available in marquez?
  3. -
- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 08:16:37
-
-

*Thread Reply:* Regarding 2), the data is only visible after next dbt-ol run - dbt docs generate does not emit events itself, but generates data that run take into account.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-08 08:24:57
-
-

*Thread Reply:* oh got it, since its in default, i need to click on it and choose my dbt profile’s account name. thnx

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-08 11:25:22
-
-

*Thread Reply:* May I know, why these highlighted ones dont have schema? FYI, I used sources in dbt.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-08 11:26:18
-
-

*Thread Reply:* Do they have it in dbt docs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2021-10-08 11:33:59
-
-

*Thread Reply:* I prepared this yaml file, not sure this is what u asked

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
ale - (alessandro.lollo@gmail.com) -
-
2021-10-12 04:14:08
-
-

Hey folks 😊 -DCO checks on this PR https://github.com/OpenLineage/OpenLineage/pull/328 seem to be stuck. -Any suggestions on how to unblock it?

- -

Thanks!

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-12 07:21:33
-
-

*Thread Reply:* I don't think anything is wrong with your branch. It's also not working on my one. Maybe it's globally stuck?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mark Taylor - (marktayl@microsoft.com) -
-
2021-10-12 15:17:02
-
-

We are working on the hackathon and have a couple of questions about generating lineage information. @Willy Lulciuc would you have time to help answer a couple of questions?

- -

• Is there a way to generate OpenLineage output that contains a mapping between input and output fields? -• In Azure Databricks sources often map to ADB mount points. We are looking for a way to translate this into source metadata in the OL output. Is there some configuration that would make this possible, or any other suggestions?

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-12 15:50:20
-
-

*Thread Reply:* > Is there a way to generate OpenLineage output that contains a mapping between input and output fields? -OpenLineage defines discrete classes for both OpenLineage.InputDataset and OpenLineage.OutputDataset datasets. But, for clarification, are you asking:

- -
  1. If a job reads / writes to the same dataset, how can OpenLineage track which fields were used in job’s logic as input and which fields were used to write back to the resulting output?
  2. Or, if a job reads / writes from two different dataset, how can OpenLineage track which input fields were used in the job’s logic for the resulting output dataset? (i.e. column-level lineage)
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-12 15:56:18
-
-

*Thread Reply:* > In Azure Databricks sources often map to ADB mount points.  We are looking for a way to translate this into source metadata in the OL output.  Is there some configuration that would make this possible, or any other suggestions? -I would look into our OutputDatasetVisitors class (as a starting point) that extracts metadata from the spark logical plan to construct a mapping between a logic plan to one or more OpenLineage.Dataset for the spark job. But, I think @Michael Collado will have a more detailed suggestion / approach to what you’re asking

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-10-12 15:59:41
-
-

*Thread Reply:* are the sources mounted like local filesystem mounts? are you ending up with datasources that point to the local filesystem rather than some dbfs url? (sorry, I'm not familiar with databricks or azure at this point)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mark Taylor - (marktayl@microsoft.com) -
-
2021-10-12 16:59:38
-
-

*Thread Reply:* I think under the covers they are an os level fs mount, but it is using an ADB specific api, dbutils.fs.mount. It is using the ADB filesystem.

-
-
docs.microsoft.com
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-10-12 17:01:23
-
-

*Thread Reply:* Do you use the dbfs scheme to access the files from Spark as in the example on that page? -df = spark.read.text("dbfs:/mymount/my_file.txt")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mark Taylor - (marktayl@microsoft.com) -
-
2021-10-12 17:04:52
-
-

*Thread Reply:* @Willy Lulciuc In our project, @Will Johnson had generated some sample OL output from just reading in and writing out a dataset to blob storage. In the resulting output, I see the columns represented as fields under the schema element with a set represented for output and another for input. I would need the mapping of in and out columns to generate column level lineage so wondering if it is possible to get or am I just missing it somewhere? Thanks for your help!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-12 17:26:35
-
-

*Thread Reply:* Ahh, well currently, no, but it has been discussed and on the OpenLineage roadmap. Here’s a proposal opened by @Julien Le Dem, column level lineage facet, that starts the discussion to add the columnLineage face to the datasets model in order to support column-level lineage. Would be great to get your thoughts!

-
- - - - - - - -
-
Labels
- proposal -
- -
-
Comments
- 3 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-10-12 17:41:41
-
-

*Thread Reply:* @Michael Collado - Databricks allows you to reference a file called /mnt/someMount/some/file/path The way you have referenced it would let you hit the file with local file system stuff like pandas / local python.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 17:49:37
-
-

*Thread Reply:* For column level lineage, you can add your own custom facets: Here’s an example in the Spark integration: (LogicalPlanFacet) https://github.com/OpenLineage/OpenLineage/blob/5f189a94990dad715745506c0282e16fd8[…]openlineage/spark/agent/lifecycle/SparkSQLExecutionContext.java -Here is the paragraph about this in the spec: https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md#custom-facet-naming

- -
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 17:51:24
-
-

*Thread Reply:* This example adds facets to the run, but you can also add them to the job

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-10-12 17:52:46
-
-

*Thread Reply:* unfortunately, there's not yet a way to add your own custom facets to the spark integration- there's some work on extensibility to be done

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-10-12 17:54:07
-
-

*Thread Reply:* for the hackathon's sake, you can check out the package and just add in whatever you want

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-10-12 18:26:44
-
-

*Thread Reply:* Thank you guys!!

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-10-12 20:42:20
-
-

Question on the Spark Integration and its SPARKCONFURL_KEY configuration variable.

- -

https://github.com/OpenLineage/OpenLineage/blob/8afc4ff88b8dd8090cd9c45061a9f669fe[…]rk/src/main/java/io/openlineage/spark/agent/ArgumentParser.java

- -

It looks like I can pass in any url but I'm not sure if I can pass in query parameters along with that URL. For example, if I had https://localhost/myendpoint?secret_code=123 I THINK that is used for the endpoint and it does not append /lineage to the end of the url. Is that a fair assessment of what happens when the url is provided?

- -

Thank you for any guidance!

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 21:46:12
-
-

*Thread Reply:* You can also pass the settings independently if you want something more flexible: https://github.com/OpenLineage/OpenLineage/blob/8afc4ff88b8dd8090cd9c45061a9f669fe[…]n/java/io/openlineage/spark/agent/OpenLineageSparkListener.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 21:47:36
-
-

*Thread Reply:* SparkSession.builder() - .config("spark.jars.packages", "io.openlineage:openlineage_spark:0.2.+") - .config("spark.extraListeners", "io.openlineage.spark.agent.OpenLineageSparkListener") - .config("spark.openlineage.host", "<https://localhost>") - .config("spark.openlineage.apiKey", "your api key") - .config("spark.openlineage.namespace", "&lt;NAMESPACE_NAME&gt;") // Replace with the name of your Spark cluster. - .getOrCreate()

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 21:48:57
-
-

*Thread Reply:* It is going to add /lineage in the end: https://github.com/OpenLineage/OpenLineage/blob/8afc4ff88b8dd8090cd9c45061a9f669fe[…]rc/main/java/io/openlineage/spark/agent/OpenLineageContext.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 21:49:37
-
-

*Thread Reply:* the apiKey setting is sent in an “Authorization” header

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 21:49:55
-
-

*Thread Reply:* “Bearer $KEY”

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-12 21:51:09
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/blob/a6eea7a55fef444b6561005164869a9082[…]n/java/io/openlineage/spark/agent/client/OpenLineageClient.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-10-12 22:54:22
-
-

*Thread Reply:* Thank you @Julien Le Dem it seems in both cases (defining the url endpoint with spark.openlineage.url and with the components: spark.openlineage.host / openlineage.version / openlineage.namespace / etc.) OpenLineage will strip out url parameters and rebuild the url endpoint with /lineage.

- -

I think we might need to add in a url parameter configuration for our hackathon. We're using a bit of serverless code to shuttle open lineage events to a queue so that another job and/or serverless application can read that queue at its leisure.

- -

Using the apiKey that feeds into the Authorization header as a Bearer token is great and would suffice but for our services we use OAuth tokens that would expire after two hours AND most of our customers wouldn't want to generate an access token themselves and feed it to Spark. ☹️

- -

Would you guys entertain a proposal to support a spark.openlineage.urlParams configuration variable that lets you add url parameters to the derived lineage url?

- -

Thank you for the detailed replies and deep links!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 10:46:22
-
-

*Thread Reply:* Yes, please open an issue detailing the use case.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-10-13 13:02:06
-
-

Quick question, is it expected, when using Spark SQL and the Spark Integration for Spark3 that we receive and INPUT but no OUTPUTS when doing a CREATE TABLE ... AS SELECT ... .

- -

I'm reading from a Spark SQL table (underlying CSV) and then writing it to a DELTA lake table.

- -

I get a COMPLETE event type with an INPUT but no OUTPUT and then I get an exception for the AsyncEvent Queue but I'm guessing it's unrelated 😅

- -

21/10/13 15:38:15 INFO OpenLineageContext: Lineage completed successfully: ResponseMessage(responseCode=200, body=null, error=null) {"eventType":"COMPLETE","eventTime":"2021-10-13T15:38:15.878Z","run":{"runId":"2cfe52b3-e08f-4888-8813-ffcdd2b27c89","facets":{"spark_unknown":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.2.3-SNAPSHOT/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","output":{"description":{"@class":"org.apache.spark.sql.catalyst.plans.logical.Project","traceEnabled":false,"streaming":false,"cacheId":null,"canonicalizedPlan":false},"inputAttributes":[{"name":"id","type":"long","metadata":{}}],"outputAttributes":[{"name":"id","type":"long","metadata":{}},{"name":"action_date","type":"date","metadata":{}}]},"inputs":[{"description":{"@class":"org.apache.spark.sql.catalyst.plans.logical.Range","streaming":false,"traceEnabled":false,"cacheId":null,"canonicalizedPlan":false},"inputAttributes":[],"outputAttributes":[{"name":"id","type":"long","metadata":{}}]}]},"spark.logicalPlan":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.2.3-SNAPSHOT/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","plan":[{"class":"org.apache.spark.sql.catalyst.plans.logical.Project","num-children":1,"projectList":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num_children":0,"name":"id","dataType":"long","nullable":false,"metadata":{},"exprId":{"product_class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":111,"jvmId":"4bdfd808-97d5-455f-ad6a-a3b29855e85b"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.Alias","num-children":1,"child":0,"name":"action_date","exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":113,"jvmId":"4bdfd808_97d5_455f_ad6a_a3b29855e85b"},"qualifier":[],"explicitMetadata":{},"nonInheritableMetadataKeys":"[__dataset_id, __col_position]"},{"class":"org.apache.spark.sql.catalyst.expressions.CurrentDate","num_children":0,"timeZoneId":"Etc/UTC"}]],"child":0},{"class":"org.apache.spark.sql.catalyst.plans.logical.Range","num-children":0,"start":0,"end":5,"step":1,"numSlices":8,"output":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num_children":0,"name":"id","dataType":"long","nullable":false,"metadata":{},"exprId":{"product_class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":111,"jvmId":"4bdfd808-97d5-455f-ad6a-a3b29855e85b"},"qualifier":[]}]],"isStreaming":false}]}}},"job":{"namespace":"sparknamespace","name":"databricks_shell.project"},"inputs":[],"outputs":[],"producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.2.3-SNAPSHOT/integration/spark>","schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunEvent>"} -21/10/13 15:38:16 INFO FileSizeAutoTuner: File size tuning result: {"tuningType":"autoTuned","tunedConfs":{"spark.databricks.delta.optimize.minFileSize":"268435456","spark.databricks.delta.optimize.maxFileSize":"268435456"}} -21/10/13 15:38:16 INFO FileFormatWriter: Write Job e062f36c-8b9d-4252-8db9-73b58bd67b15 committed. -21/10/13 15:38:16 INFO FileFormatWriter: Finished processing stats for write job e062f36c-8b9d-4252-8db9-73b58bd67b15. -21/10/13 15:38:18 INFO CodeGenerator: Code generated in 253.294028 ms -21/10/13 15:38:18 INFO SparkContext: Starting job: collect at DataSkippingReader.scala:430 -21/10/13 15:38:18 INFO DAGScheduler: Job 1 finished: collect at DataSkippingReader.scala:430, took 0.000333 s -21/10/13 15:38:18 ERROR AsyncEventQueue: Listener OpenLineageSparkListener threw an exception -java.lang.NullPointerException - at io.openlineage.spark.agent.OpenLineageSparkListener.onJobEnd(OpenLineageSparkListener.java:167) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:39) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:119) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:103) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1547) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 17:54:22
-
-

*Thread Reply:* This is because this specific action is not covered yet. You can see the “spark_unknown” facet is describing things that are not understood yet -run": { -... - "facets": { - "spark_unknown": { -... - "output": { - "description": { - "@class": "org.apache.spark.sql.catalyst.plans.logical.Project", - "traceEnabled": false, - "streaming": false, - "cacheId": null, - "canonicalizedPlan": false - },

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 17:54:43
-
-

*Thread Reply:* I think this is part of the Spark 3 gap

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 17:55:46
-
-

*Thread Reply:* an unknown output will cause missing output lineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 18:05:57
-
-

*Thread Reply:* Output handling is here: https://github.com/OpenLineage/OpenLineage/blob/e0f1852422f325dc019b0eab0e466dc905[…]io/openlineage/spark/agent/lifecycle/OutputDatasetVisitors.java

-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-10-13 22:49:08
-
-

*Thread Reply:* Ah! Thank you so much, Julien! This is very helpful to understand where that is set. This is a big gap that we want to help address after our hackathon. Thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 20:09:17
-
-

Following up on the meeting this morning, I have created an issue to formalize a design doc review process: https://github.com/OpenLineage/OpenLineage/issues/336 -If that sounds good I’ll create the first doc to describe this as a PR. (how meta!)

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-13 20:13:02
-
-

*Thread Reply:* the github wiki is backed by a git repo but it does not allow PRs. (people do hacks but I’d rather avoid those)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-18 10:24:25
-
-

We're discussing creating Transport abstraction for OpenLineage clients, that would allow us creating better experience for people that expect to be able to emit their events using something else than http interface. Please tell us what you think of proposed mechanism - encouraging emojis are helpful too 😉 -https://github.com/OpenLineage/OpenLineage/pull/344

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-18 20:57:04
-
-

OpenLineage release 0.3 is coming. Please chiming if there’s anything blocker that should go in the release: https://github.com/OpenLineage/OpenLineage/projects/4

- - - -
- ❤️ Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Carlos Quintas - (cdquintas@gmail.com) -
-
2021-10-19 06:36:05
-
-

👋 Hi everyone!

- - - -
- 👋 Ross Turk, Willy Lulciuc, Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Carlos Quintas - (cdquintas@gmail.com) -
-
2021-10-22 05:38:14
-
-

openlineage with DBT and Trino, is there any forecast?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-22 05:44:17
-
-

*Thread Reply:* Maybe you want to contribute it? -It's not that hard, mostly testing, and figuring out what would be the naming of openlineage namespace for Trino, and how some additional statistics work.

- -

For example, recently we had added support for Redshift by community member @ale

- -

https://github.com/OpenLineage/OpenLineage/pull/328

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Carlos Quintas - (cdquintas@gmail.com) -
-
2021-10-22 05:42:52
-
-

Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5 -Traceback (most recent call last): - File "/home/labuser/.local/bin/dbt-ol", line 61, in <module> - main() - File "/home/labuser/.local/bin/dbt-ol", line 54, in main - events = processor.parse().events() - File "/home/labuser/.local/lib/python3.8/site-packages/openlineage/common/provider/dbt.py", line 98, in parse - self.extractdatasetnamespace(profile) - File "/home/labuser/.local/lib/python3.8/site-packages/openlineage/common/provider/dbt.py", line 377, in extractdatasetnamespace - self.datasetnamespace = self.extractnamespace(profile) - File "/home/labuser/.local/lib/python3.8/site-packages/openlineage/common/provider/dbt.py", line 391, in extract_namespace - raise NotImplementedError( -NotImplementedError: Only 'snowflake' and 'bigquery' adapters are supported right now. Passed trino

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-10-22 12:41:08
-
-

Hey folks, we've released OpenLineage 0.3.1. There are quite a few changes, including doc improvements, Redshift support in dbt, bugfixes, a new server-side client code base, but the real highlights are

- -
  1. Official Spark 3 support- this is still a work in progress (the whole Spark integration is), but the big deal is we've split the source tree to support both Spark 2 and Spark 3 specific plan visitors. This will enable us to work with the Spark 3 API explicitly and to add support for those interfaces and classes that didn't exist in Spark 2. We're also running all integration tests against both Spark 2.4.7 and Spark 3.1.0
  2. Airflow 2 support- also a work in progress, but we have a new LineageBackend implementation that allows us to begin tracking lineage for successful Airflow 2 DAGs. We're working to support failure notifications so we can also trace failed jobs. The LineageBackend can also be enabled in Airflow 1.10.X to improve the reporting of task completion times. -Check the READMEs for more details and to get started with the new features. Thanks to @Maciej Obuchowski , @Oleksandr Dvornik, @ale, and @Willy Lulciuc for their contributions. See the full changelog
  3. -
- - - -
- 🎉 Willy Lulciuc, Maciej Obuchowski, Minkyu Park, Ross Turk, Peter Hicks, RamanD, Ry Walker -
- -
- 🙌 Willy Lulciuc, Maciej Obuchowski, Minkyu Park, Will Johnson, Ross Turk, Peter Hicks, Ry Walker -
- -
- 🔥 Ry Walker -
- -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-10-28 07:27:12
-
-

Hello community. I am starting using marquez. I try to connect dbt with Marquez, but the spark adapter is not yet available.

- -

Are you planning to implement this spark dbt adapter in next openlineage versions?

- -

NotImplementedError: Only 'snowflake', 'bigquery', and 'redshift' adapters are supported right now. Passed spark -In my company we are starting to use as well the athena dbt adapter. Are you planning to implement this integration? Thanks a lot community

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-28 12:20:27
-
-

*Thread Reply:* That would make sense. I think you are the first person to request this. Is this something you would want to contribute to the project?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-10-28 17:37:53
-
-

*Thread Reply:* I would like to Julien, but not sure how can I do it. Could you guide me how can i start? or show me other integration.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Matthew Mullins - (mmullins@aginity.com) -
-
2021-10-31 07:57:55
-
-

*Thread Reply:* @David Virgil look at the pull request for the addition of Redshift as a starting guide. https://github.com/OpenLineage/OpenLineage/pull/328

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 12:01:41
-
-

*Thread Reply:* Thanks @Matthew Mullins I ll try to add dbt spark integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-28 09:31:01
-
-

Hey folks, quick question, are we able to run dbt-ol without providing OPENLINEAGE_URL? I find it quite limiting that I need to have a service set up in order to emit/generate OL events/messages. Is there a way to just output them to the console?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-28 10:05:09
-
-

*Thread Reply:* OK, was changed here: https://github.com/OpenLineage/OpenLineage/pull/286

- -

Did you think about this?

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-28 12:19:27
-
-

*Thread Reply:* In Marquez there was a mechanism to do that. Something like OPENLINEAGE_BACKEND=HTTP|LOG

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-28 13:56:42
-
-

*Thread Reply:* @Mario Measic We're going to add Transport mechanism, that will address use cases like yours. Please comment on this PR what would you expect: https://github.com/OpenLineage/OpenLineage/pull/344

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- 👀 Mario Measic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-28 15:29:50
-
-

*Thread Reply:* Nice, thanks @Julien Le Dem and @Maciej Obuchowski.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-28 15:46:45
-
-

*Thread Reply:* Also, dbt build is not working which is kind of the biggest feature of the version 0.21.0, I will try testing the code with modifications to the https://github.com/OpenLineage/OpenLineage/blob/c3aa70e161244091969951d0da4f37619bcbe36f/integration/dbt/scripts/dbt-ol#L141

- -

I guess there's a reason for it that I didn't see since you support v3 of the manifest.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-29 03:45:27
-
-

*Thread Reply:* Also, is it normal not to see the column descriptions for the model/table even though these are provided in the YAML file, persisted in Redshift and also dbt docs generate has been run before dbt-ol run?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-10-29 04:26:22
-
-

*Thread Reply:* Tried with dbt versions 0.20.2 and 0.21.0, openlineage-dbt==0.3.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-10-29 10:39:10
-
-

*Thread Reply:* I'll take a look at that. Supporting descriptions might be simple, but dbt build might be a little larger task.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-01 19:12:01
-
-

*Thread Reply:* I opened a ticket to track this: https://github.com/OpenLineage/OpenLineage/issues/376

-
- - - - - - - - - - - - - - - - -
- - - -
- 👀 Mario Measic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 05:48:06
-
-

*Thread Reply:* The column description issue should be fixed here: https://github.com/OpenLineage/OpenLineage/pull/383

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-10-28 12:27:17
-
-

I’m looking for feedback on my proposal to improve the proposal process ! https://github.com/OpenLineage/OpenLineage/issues/336

-
- - - - - - - -
-
Assignees
- wslulciuc, mobuchowski, mandy-chessell, collado-mike -
- -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-28 18:49:12
-
-

Hey guys - just an update on my prefect PR (https://github.com/OpenLineage/OpenLineage/pull/293) - there a little spiel on the ticket but I've closed that PR in favour of opening a new one. Prefect have just release a 2.0a technical preview, which they would like to make stable near the start of next year. I think it makes sense to target this release, and I've had one of the prefect team reach out and is keen to get some sort of lineage implemented in prefect.

- - - -
- 👍 Kevin Kho, Maciej Obuchowski, Willy Lulciuc, Michael Collado, Julien Le Dem, Thomas Fredriksen -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-28 18:51:10
-
-

*Thread Reply:* If anyone has any questions or comments - happy to discuss here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad - (bradley.mcelroy@live.com) -
-
2021-10-28 18:51:15
-
-

*Thread Reply:* @davzucky

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-28 23:01:29
-
-

*Thread Reply:* Thanks for updating the community, Brad!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
davzucky - (davzucky@hotmail.com) -
-
2021-10-28 23:47:02
-
-

*Thread Reply:* Than you Brad. Looking forward to see how to integrated that with v2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Kho - (kdykho@gmail.com) -
-
2021-10-28 18:53:23
-
-

Hello, joining here from Prefect. Because of community requests from users like Brad above, we are looking to implement lineage for Prefect this quarter. Good to meet you all!

- - - -
- ❤️ Minkyu Park, Faouzi, John Thomas, Maciej Obuchowski, Kevin Mellott, Thomas Fredriksen -
- -
- 👍 Minkyu Park, Faouzi, John Thomas -
- -
- 🙌 Michael Collado, Faouzi, John Thomas -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-10-28 18:54:56
-
-

*Thread Reply:* Welcome, @Kevin Kho 👋. Really excited to see this integration kick off! 💯🚀

- - - -
- 👍 Kevin Kho, Maciej Obuchowski, Peter Hicks, Faouzi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 12:03:14
-
-

Hello,

- -

i am integratng openLineage with Airflow 2.2.0

- -

Do you consider in the future airflow manual inlets and outlets?

- -

Seeing the documentation I can see that is not possible.

- -

OpenLineageBackend does not take into account manually configured inlets and outlets. -Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-01 12:23:11
-
-

*Thread Reply:* While it’s not something we’re supporting at the moment, it’s definitely something that we’re considering!

- -

If you can give me a little more detail on what your system infrastructure is like, it’ll help us set priority and design

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 13:57:34
-
-

*Thread Reply:* So basic architecture of a datalake. We are using airflow to trigger jobs. Every job is a pipeline that runs a spark job (in our case it spin up an EMR). So the idea of lineage would be defining in the dags inlets and outlets based on the airflow lineage:

- -

https://airflow.apache.org/docs/apache-airflow/stable/lineage.html

- -

I think you need to be able to include these inlets and outlets in the picture of openlineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-01 14:01:24
-
-

*Thread Reply:* Why not use spark integration? https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 14:05:02
-
-

*Thread Reply:* because there are some other jobs that are not spark, some jobs they run in dbt, other jobs they run in redshift @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-01 14:08:58
-
-

*Thread Reply:* So, combo of https://github.com/OpenLineage/OpenLineage/tree/main/integration/dbt and PostgresExtractor from airflow integration should cover Redshift if you're using it from PostgresOperator 🙂

- -

It's definitely interesting use case - you'd be using most of the existing integrations we have.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 15:04:44
-
-

*Thread Reply:* @Maciej Obuchowski Do i need to define any extractor in the airflow startup?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-05 23:48:21
-
-

*Thread Reply:* I am using Redshift with PostgresOperator and it is returning…

- -

[2021-11-06 03:43:06,541] {{__init__.py:92}} ERROR - Failed to extract metadata 'NoneType' object has no attribute 'host' task_type=PostgresOperator airflow_dag_id=counter task_id=inc airflow_run_id=scheduled__2021-11-06T03:42:00+00:00 -Traceback (most recent call last): - File "/usr/local/airflow/.local/lib/python3.7/site-packages/openlineage/lineage_backend/__init__.py", line 83, in _extract_metadata - task_metadata = self._extract(extractor, task_instance) - File "/usr/local/airflow/.local/lib/python3.7/site-packages/openlineage/lineage_backend/__init__.py", line 104, in _extract - task_metadata = extractor.extract_on_complete(task_instance) - File "/usr/local/airflow/.local/lib/python3.7/site-packages/openlineage/airflow/extractors/base.py", line 61, in extract_on_complete - return self.extract() - File "/usr/local/airflow/.local/lib/python3.7/site-packages/openlineage/airflow/extractors/postgres_extractor.py", line 65, in extract - authority=self._get_authority(), - File "/usr/local/airflow/.local/lib/python3.7/site-packages/openlineage/airflow/extractors/postgres_extractor.py", line 120, in _get_authority - if self.conn.host and self.conn.port: -AttributeError: 'NoneType' object has no attribute 'host'

- -

I can’t see this raised as an issue.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 13:57:54
-
-

Hello, I am trying to integrate Airflow with openlineage.

- -

It is not working for me.

- -

What I tried:

- -
  1. Adding openlineage-airflow to requirements.txt
  2. Adding -```- AIRFLOWLINEAGEBACKEND=openlineage.airflow.backend.OpenLineageBackend
  3. -
- -

During handling of the above exception, another exception occurred:

- -

Traceback (most recent call last): - File "/home/airflow/.local/bin/airflow", line 8, in <module> - sys.exit(main()) - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/main.py", line 40, in main - args.func(args) - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/cliparser.py", line 47, in command - func = importstring(importpath) - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/moduleloading.py", line 32, in importstring - module = importmodule(modulepath) - File "/usr/local/lib/python3.8/importlib/init.py", line 127, in importmodule - return bootstrap.gcdimport(name[level:], package, level) - File "<frozen importlib.bootstrap>", line 1014, in gcdimport - File "<frozen importlib.bootstrap>", line 991, in _findandload - File "<frozen importlib.bootstrap>", line 975, in findandloadunlocked - File "<frozen importlib.bootstrap>", line 671, in _loadunlocked - File "<frozen importlib.bootstrapexternal>", line 843, in execmodule - File "<frozen importlib.bootstrap>", line 219, in callwithframesremoved - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/dbcommand.py", line 24, in <module> - from airflow.utils import cli as cliutils, db - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/db.py", line 26, in <module> - from airflow.jobs.basejob import BaseJob # noqa: F401 - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/init.py", line 19, in <module> - import airflow.jobs.backfilljob - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/backfilljob.py", line 29, in <module> - from airflow import models - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/init.py", line 20, in <module> - from airflow.models.baseoperator import BaseOperator, BaseOperatorLink - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 196, in <module> - class BaseOperator(Operator, LoggingMixin, TaskMixin, metaclass=BaseOperatorMeta): - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 941, in BaseOperator - def postexecute(self, context: Any, result: Any = None): - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/lineage/init.py", line 103, in applylineage - _backend = getbackend() - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/lineage/init.py", line 52, in get_backend - clazz = conf.getimport("lineage", "backend", fallback=None) - File "/home/airflow/.local/lib/python3.8/site-packages/airflow/configuration.py", line 469, in getimport - raise AirflowConfigException( -airflow.exceptions.AirflowConfigException: The object could not be loaded. Please check "backend" key in "lineage" section. Current value: "openlineage.airflow.backend.OpenLineageBackend".```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-01 14:06:12
-
-

*Thread Reply:* 1. Please use openlineage.lineage_backend.OpenLineageBackend as AIRFLOW__LINEAGE__BACKEND

- -
  1. Please tell us where you've seen openlineage.airflow.backend.OpenLineageBackend, so we can fix the documentation 🙂
  2. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-01 19:07:21
-
-

*Thread Reply:* https://pypi.org/project/openlineage-airflow/

-
-
PyPI
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-01 19:08:03
-
-

*Thread Reply:* (I googled it and found that page that seems to have an outdated doc)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 02:38:59
-
-

*Thread Reply:* @Maciej Obuchowski -@Julien Le Dem that's the page i followed. Please guys revise the documentation, as it is very important

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 04:34:14
-
-

*Thread Reply:* It should just copy actual readme

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-03 16:30:00
-
-

*Thread Reply:* PyPi is using the README at the time of the release 0.3.1, rather than the current README, which is 0.4.0. If we send the new release to PyPi it should also update the README

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 15:09:54
-
-

Related the Airflow integration. Is it required to install openlineage-airflow and setup the environment variables in both scheduler and webserver, or just in the scheduler?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 15:19:18
-
-

*Thread Reply:* I set i up in the scheduler and it starts to log data to marquez. But it fails with this error:

- -

Traceback (most recent call last): - File "/home/airflow/.local/lib/python3.8/site-packages/openlineage/client/client.py", line 49, in __init__ - raise ValueError(f"Need valid url for OpenLineageClient, passed {url}") -ValueError: Need valid url for OpenLineageClient, passed "<http://marquez-internal-eks.eu-west-1.dev.hbi.systems>"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-01 15:19:26
-
-

*Thread Reply:* why is it not a valid URL?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-01 18:39:58
-
-

*Thread Reply:* Which version of the OpenLineage client are you using? On first check it should be fine

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 05:14:30
-
-

*Thread Reply:* @John Thomas I was appending double quotes as part of the url. Forget about this error

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-02 10:35:28
-
-

*Thread Reply:* aaaah, gotcha, good catch!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 05:15:52
-
-

Hello, I am receiving this error today when I deployed openlineage in development environment (not using docker-compose locally).

- -

I am running with KubernetesExecutor

- -

airflow.exceptions.AirflowConfigException: The object could not be loaded. Please check "backend" key in "lineage" section. Current value: "openlineage.lineage_backend.OpenLineageBackend".

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 05:18:18
-
-

*Thread Reply:* Are you sure that openlineage-airflow is present in the container?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 05:23:09
-
-

So in this case in my template I am adding:

- -

```env:
- ADDITIONALPYTHONDEPS: "openpyxl==3.0.3 smartopen==2.0.0 apache-airflow-providers-http apache-airflow-providers-cncf-kubernetes apache-airflow-providers-amazon openlineage-airflow" - OPENLINEAGEURL: https://marquez-internal-eks.eu-west-1.dev.hbi.systems - OPENLINEAGENAMESPACE: dnsairflow - AIRFLOWKUBERNETESENVIRONMENTVARIABLESOPENLINEAGEURL: https://marquez-internal-eks.eu-west-1.dev.hbi.systems - AIRFLOWKUBERNETESENVIRONMENTVARIABLESOPENLINEAGENAMESPACE: dns_airflow

- -

configmap: - mountPath: /var/airflow/config # mount path of the configmap - data: - airflow.cfg: | - [lineage] - backend = openlineage.lineage_backend.OpenLineageBackend

- -
pod_template_file.yaml: |
-
-    containers:
-      - args: []
-        command: []
-        env:
-          - name: AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__OPENLINEAGE_URL
-            value: <https://marquez-internal-eks.eu-west-1.dev.hbi.systems>
-          - name: AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__OPENLINEAGE_NAMESPACE
-            value: dns_airflow
-          - name: AIRFLOW__LINEAGE__BACKEND
-            value: openlineage.lineage_backend.OpenLineageBackend```
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 05:23:31
-
-

I am installing openlineage in the ADDITIONAL_PYTHON_DEPS

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 05:25:43
-
-

*Thread Reply:* Maybe ADDITIONAL_PYTHON_DEPS are dependencies needed by the tasks, and are installed after Airflow tries to initialize LineageBackend?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 06:34:11
-
-

*Thread Reply:* I am checking this accessing the Kubernetes pod

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 06:34:54
-
-

I have a question related airflow and open lineage:

- -

I have a dag that contains 2 tasks:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 06:35:34
-
-

I see that every task is displayed as a different job. I was expecting to see one job per dag.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 07:29:43
-
-

Is this the expected behaviour??

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 07:34:47
-
-

*Thread Reply:* Yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 07:35:53
-
-

*Thread Reply:* Probably what you want is job hierarchy: https://github.com/MarquezProject/marquez/issues/1737

-
- - - - - - - -
-
Assignees
- collado-mike -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 07:46:02
-
-

*Thread Reply:* I do not see any benefit of just having some airflow task metadata. I do not see relationship between tasks. Every task is a job. When I was thinking about lineage when i started working on my company integration with openlineage i though that openlineage would give me relationship between task or datasets and the only thing i see is some metadata of the history of airflow runs that is already provided by airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 07:46:20
-
-

*Thread Reply:* i was expecting to see a nice graph. I think it is missing some features

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 07:46:25
-
-

*Thread Reply:* at this early stage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 07:50:10
-
-

*Thread Reply:* It probably depends on whether those tasks are covered by the extractors: https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow/openlineage/airflow/extractors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 07:55:50
-
-

*Thread Reply:* We are not using any of those operators: bigquery, postsgress or snowflake.

- -

And what is it doing GreatExpectactions extractor?

- -

It would be good if there is one extractor that relies in the inlets and outlets that you can define in any Airflow task, and that that can be the general way to make relationships between datasets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 07:56:30
-
-

*Thread Reply:* And that the same dag graph can be seen in marquez, and not one job per task.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 08:07:06
-
-

*Thread Reply:* > It would be good if there is one extractor that relies in the inlets and outlets that you can define in any Airflow task -I think this is good idea. Overall, OpenLineage strongly focuses on automatic metadata collection. However, using them would be a nice fallback for not-covered-yet cases.

- -

> And that the same dag graph can be seen in marquez, and not one job per task. -This currently depends on dataset hierarchy. If you're not using any of the covered extractors, then Marquez can't build dataset graph like in the demo: https://raw.githubusercontent.com/MarquezProject/marquez/main/web/docs/demo.gif

- -

With the job hierarchy ticket, probably some graph could be generated using just the job data though.

-
- - - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 08:09:55
-
-

*Thread Reply:* Created issue for the manual fallback: https://github.com/OpenLineage/OpenLineage/issues/384

-
- - - - - - - -
-
Assignees
- mobuchowski -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 08:28:29
-
-

*Thread Reply:* @Maciej Obuchowski how many people are working full time in this library? I really would like to adopt it in my company, as we use airflow and spark, but i see that yet it does not have the features we would like to.

- -

At the moment the same info we have in marquez related the tasks, is available in airflow UI or using airflow API.

- -

The game changer for us would be that it could give us features/metadata that we cannot query directly from airflow. That's why if the airflow inlets/outlets could be used, then it really would make much more sense for us to adopt it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 09:33:31
-
-

*Thread Reply:* > how many people are working full time in this library? -On Airflow integration or on OpenLineage overall? 🙂

- -

> The game changer for us would be that it could give us features/metadata that we cannot query directly from airflow. -I think there are three options there:

- -
  1. Contribute relevant extractors for Airflow operators that you use
  2. Use those extractors as custom extractors: https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow#custom-extractors
  3. Create that manual fallback mechanism with Airflow inlets/outlets: https://github.com/OpenLineage/OpenLineage/issues/384
  4. -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-02 09:35:10
-
-

*Thread Reply:* But first, before implementing last option, I'd like to get consensus about it - so feel free to comment there about your use case

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-02 09:19:14
-
-

@Maciej Obuchowski even i can contribute or help with my ideas (from what i consider that should be lineage from a client side)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-03 07:58:56
-
-

@Maciej Obuchowski I was able to put to work Airflow in Kubernetes pointing to Marquez using the openlineage library. I have a few problems I found that would be good to comment.

- -

I see a warning -[2021-11-03 11:47:04,309] {great_expectations_extractor.py:27} WARNING - Did not find great_expectations_provider library or failed to import it -I couldnt find any information about GreatExpectationsExtractor. Could you tell me what is this extractor about?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-03 08:00:34
-
-

*Thread Reply:* It should only affect you if you're using https://greatexpectations.io/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-03 15:57:02
-
-

*Thread Reply:* I have a similar message after installing openlineage into Amazon MWAA from the scheduler logs:

- -

WARNING:/usr/local/airflow/.local/lib/python3.7/site-packages/openlineage/airflow/extractors/great_expectations_extractor.py:Did not find great_expectations_provider library or failed to import it

- -

I am not using great expectations in the DAG.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-03 08:00:52
-
-

I see a few priorities for Airflow integration:

- -
  1. Direct relationship 1-1 between Dag && Job. At the moment every task is a different job in marquez. What i consider wrong.
  2. Airflow Inlets/outlets integration with marquez -When do you think you guys can have this? If you need any help I can happily contribute, but I would need some help
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-03 08:08:21
-
-

*Thread Reply:* I don't think 1) is a good idea. You can have multiple tasks in one dag, processing different datasets and producing different datasets. If you want visual linking of jobs that produce disjoint datasets, then I think you want this: https://github.com/MarquezProject/marquez/issues/1737 -which wuill affect visual layer.

- -

Regarding 2), I think we need to get along with Airflow maintainers regarding long term mechanism on which OL will work: https://github.com/apache/airflow/issues/17984

- -

I think using inlets/outlets as a fallback mechanism when we're not doing automatic metadata extraction is a good idea, but we don't know if hypothetical future mechanism will have access to these. It's hard to commit to mechanism which might disappear soon.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-03 08:13:28
-
-

Another option is that I build my own extractor, do you have any example of how to create a custom extractor? How I can apply that customExtractor to specific operators? Is there a way to link an extractor with an operator, so at runtime airflow knows which extractor to run?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-03 08:19:00
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow#custom-extractors

- -

I think you can base your code on any existing extractor, like PostgresExtractor: https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/openlineage/airflow/extractors/postgres_extractor.py#L53

- -

Custom extractors work just like buildin ones, just that you need to add bit of mapping between operator and extractor, like OPENLINEAGE_EXTRACTOR_PostgresOperator=openlineage.airflow.extractors.postgres_extractor.PostgresExtractor

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Francis McGregor-Macdonald -
- -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-03 08:35:59
-
-

*Thread Reply:* Thank you very much @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-03 08:36:52
-
-

Last question of the morning. Running one task that failed i could see that no information appeared in Marquez. Is this something that is expected to happen? I would like to see in Marquez all the history of runs, successful and unsucessful them.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-03 08:41:14
-
-

*Thread Reply:* It worked like that in Airflow 1.10.

- -

This is an unfortunate limitation of LineageBackend API that we're using for Airflow 2. We're trying to work out solution for this with Airflow maintainers: https://github.com/apache/airflow/issues/17984

-
- - - - - - - -
-
Labels
- kind:feature, area:lineage -
- -
-
Comments
- 23 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-04 03:41:38
-
-

Hello openlineage community.

- -

Yesterday I tried the integration with spark.

- -

The result was not satisfactory. This is what I did:

- -
  1. Add openlineage-spark dependency
  2. Add these lines: -.config("spark.jars.packages", "io.openlineage:openlineage_spark:0.3.1") -.config("spark.extraListeners", "io.openlineage.spark.agent.OpenLineageSparkListener") -.config("spark.openlineage.url", "<https://marquez-internal-eks.eu-west-1.dev.hbi.systems/api/v1/namespaces/spark_integration/>" -This job was doing spark.read from 2 different json location. -It is doing spark write to 5 different parquet location in s3. -The job finished succesfully and the result in marquez is:
  3. -
- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-04 03:43:40
-
-

It created 3 namespaces. One was the one that I point in the spark config property. The other 2 are the bucket that we are writing to () and the bucket where we are reading from ()

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-04 03:44:00
-
-

If I enter in the bucket namespaces I see nowthing inside

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-04 03:48:35
-
-

I can see if i enter in one of the weird jobs generated this:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-04 18:47:41
-
-

*Thread Reply:* This job with no output is a symptom of the output not being understood. you should be able to see the facets for that job. There will be a spark_unknown facet with more information about the problem. If you put that into an issue with some more details about this job we should be able to help.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-05 04:36:30
-
-

*Thread Reply:* I ll try to put all the info in a ticket, as it is not working as i would expect

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-04 03:52:24
-
-

And i am seeing this as well

- -

If I check the logs of marquez-web and marquez I can't see any error there

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-04 03:54:38
-
-

When I try to open the job fulfilments.execute_insert_into_hadoop_fs_relation_command I see this window:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2021-11-04 04:06:29
-
-

The page froze and no link from the menu works. Apart from that I see that there are no messages in the logs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-04 18:49:31
-
-

*Thread Reply:* Is there an error in the browser javascript console? (example on chrome: View -> Developer -> Javascript console)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Alessandro Rizzo - (l.alessandrorizzo@gmail.com) -
-
2021-11-04 17:22:29
-
-

Hi #general, I'm a data engineer for a UK-based insuretech (part of one of the biggest UK retail insurers). We run a series of tech meetups and we'd love to have someone from the OpenLineage project to give us a demo of the tool. Would anyone be interested (DM if so 🙂 ) ?

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Taleb Zeghmi - (talebz@zillowgroup.com) -
-
2021-11-04 21:30:24
-
-

Hi! Is there an example of tracking lineage when using Pandas to read/write and transform data?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-04 21:35:16
-
-

*Thread Reply:* Hi Taleb - I don’t know of a generalized example of lineage tracking with Pandas, but you should be able to accomplish this by sending the runEvents manually to the OpenLineage API in your code: -https://openlineage.io/docs/openapi/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Taleb Zeghmi - (talebz@zillowgroup.com) -
-
2021-11-04 21:38:25
-
-

*Thread Reply:* Is this a work in progress, that we can investigate? Because I see it in this image https://github.com/OpenLineage/OpenLineage/blob/main/doc/Scope.png

-
- - - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-04 21:54:51
-
-

*Thread Reply:* To my knowledge, while there are a few proposals around adding a wrapper on some Pandas methods to output runEvents, it’s not something that’s had work started on it yet

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-04 21:56:26
-
-

*Thread Reply:* I sent some feelers out to get a little more context from folks who are more informed about this than I am, so I’ll get you more info about potential future plans and the considerations around them when I know more

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-04 23:04:47
-
-

*Thread Reply:* So, Pandas is tricky because unlike Airflow, DBT, or Spark, Pandas doesn’t own the whole flow, and you might dip in and out of it to use other Python Packages (at least I did when I was doing more Data Science).

- -

We have this issue open in OpenLineage that you should go +1 to help with our planning 🙂

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Taleb Zeghmi - (talebz@zillowgroup.com) -
-
2021-11-05 15:08:09
-
-

*Thread Reply:* interesting... what if it were instead on all the read_** to_** functions?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-05 12:00:57
-
-

Hi! I am working alongside David at integrating OpenLineage into our Data Pipelines. I have a questions around Marquez and OpenLineage's divergent APIs: -That is to say, these 2 APIs differ: -https://openlineage.io/docs/openapi/ -https://marquezproject.github.io/marquez/openapi.html -This makes sense since they are at different layers of abstraction, but Marquez requires a few things that are absent from OpenLineage's API, for example the type in a data source, the distinctions between physicalName and sourceName in Datasets. Is that intentional? And can these be set using the OpenLineage API as some additional facets or keys? I noticed that the DatasourceDatasetFacet has a map of additionalProperties .

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-05 12:59:49
-
-

*Thread Reply:* The Marquez write APIs are artifacts from before OpenLineage existed, and they’re already slated for deprecation soon.

- -

If you POST an OpenLineage runEvent to the /lineage endpoint in Marquez, it’ll create any missing jobs or datasets that are relevant.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-05 13:06:06
-
-

*Thread Reply:* Thanks for the response. That sounds good. Does this include the query interface e.g. -http://localhost:5000/api/v1/namespaces/testing_java/datasets/incremental_data -as that currently returns the Marquez version of a dataset including default set fields for type and the above mentioned properties.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-11-05 17:01:55
-
-

*Thread Reply:* I believe the intention for type is to support a new facet- TBH, it hasn't been the most pressing concern for most users, as most people are only recording tables, not streams. However, there's been some recent work to support Kafka in Spark- maybe it's time to address that deficiency.

- -

I don't actually know what happened to the datasource type field- maybe @Julien Le Dem can comment on whether that field was dropped intentionally or whether it was an oversight.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-05 18:18:06
-
-

*Thread Reply:* It looks like an oversight, currently Marquez hard codes it to POSGRESQL: https://github.com/MarquezProject/marquez/blob/734bfd691636cb00212d7d22b1a489bd4870fb04/api/src/main/java/marquez/db/OpenLineageDao.java#L438

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-05 18:18:25
-
-

*Thread Reply:* https://github.com/MarquezProject/marquez/blob/734bfd691636cb00212d7d22b1a489bd4870fb04/api/src/main/java/marquez/db/OpenLineageDao.java#L438-L440

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-05 18:20:25
-
-

*Thread Reply:* The source has a name though: https://github.com/OpenLineage/OpenLineage/blob/8afc4ff88b8dd8090cd9c45061a9f669fea2151e/spec/facets/DatasourceDatasetFacet.json#L12

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-05 18:07:16
-
-

The next OpenLineage monthly meeting is this coming Wednesday at 9am PT -The tentative agenda is: -• OL Client use cases for Apache Iceberg [Ryan] -• OpenLineage and Azure Purview [Shrikanth] -• Proxy Backend and Egeria integration progress update (Issue #152) [Mandy] -• OpenLineage last release overview (0.3.1) - ◦ Facet versioning - ◦ Airflow 2 / Spark 3 support, dbt improvements -• OpenLineage 0.4 scope review - ◦ Proxy Backend (Issue #152) - ◦ Spark, Airflow, dbt improvements (documentation, coverage, ...) - ◦ improvements to the OpenLineage model -• Open discussion 

-
- - - - - - - -
-
Assignees
- mandy-chessell -
- -
-
Comments
- 3 -
- - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-05 18:07:57
-
-

*Thread Reply:* If you want to add something please chime in this thread

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-05 19:27:44
- -
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-09 19:47:26
-
-

*Thread Reply:* The monthly meeting is happening tomorrow. -The purview team will present at the December meeting instead -See full agenda here: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting -You are welcome to contribute

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-10 11:10:17
-
-

*Thread Reply:* The slides for the meeting later today: https://docs.google.com/presentation/d/1z2NTkkL8hg_2typHRYhcFPyD5az-5-tl/edit#slide=id.ge7d4b64ef4_0_0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-10 12:02:23
-
-

*Thread Reply:* It’s happening now ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-16 19:57:23
-
-

*Thread Reply:* I have posted the notes and the recording from the last instance of our monthly meeting: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting#MonthlyTSCmeeting-Nov10th2021(9amPT) -I have a few TODOs to follow up on tickets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-11-05 18:09:10
-
-

The next release of OpenLineage is being scoped: https://github.com/OpenLineage/OpenLineage/projects/6 -Please chime in if you want to raise the priority of something or are planning to contribute

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anthony Ivanov - (anthvt@gmail.com) -
-
2021-11-09 08:18:11
-
-

Hi, I have been looking at open lineage for some time. And I really like it. It is very simple specification that covers a lot of use-cases. You can create any provider or consumer in a very simple way. So that’s pretty powerful. -I have some questions about things that are not clear to me. I am not sure if this is the best place to ask. Please refer me to other place if this is not appropriate.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anthony Ivanov - (anthvt@gmail.com) -
-
2021-11-09 08:18:58
-
-

*Thread Reply:* How do you model continuous process (not batch processes). For example a flume or spark job that does some real time processing on data.

- -

Maybe it’s simply a “Job” But than what is run ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anthony Ivanov - (anthvt@gmail.com) -
-
2021-11-09 08:19:44
-
-

*Thread Reply:* How do you model consumers at the end - they can be reports? Data applications, ML model deployments, APIs, GUI consumed by end users ?

- -

Have you considered having some examples of different use cases like those?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anthony Ivanov - (anthvt@gmail.com) -
-
2021-11-09 08:21:43
-
-

*Thread Reply: By definition, Job is a process definition that consumes and produces datasets. It is many to many relations? I’ve been wondering about that.Shouldn’t be more restrictive? -For example important use-case for lineage is troubleshooting or error notifications (e.g mark report or job as temporarily in bad state if upstream data integration is broken). -In order to be able to that you need to be able to traverse the graph to find the original error. So having multiple inputs produce single output make sense (e.g insert into output_1 select * from x,y group by a,b) . -But what are the cases where you’d want to see multiple outputs ? You can have single process produce multiple tables (in above example) but they’d alway be separate queries. The actual inputs for each output would be different.

- -

But having multiple outputs create ambiguity as now If x or y is broken but have multiple outputs I do not know which is really impacted?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-09 08:34:01
-
-

*Thread Reply:* > How do you model continuous process (not batch processes). For example a flume or spark job that does some real time processing on data. -> -> Maybe it’s simply a “Job” But than what is run ? -Every continuous process eventually has end - for example, you can deploy new version of your Flink pipeline. The new version would be the next Run for the same Job.

- -

Moreover, OTHER event type is useful to update metadata like amount of processed records. In this Flink example, it could be emitted per checkpoint.

- -

I think more attention for streaming use cases will be given soon.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-09 08:43:09
-
-

*Thread Reply:* > How do you model consumers at the end - they can be reports? Data applications, ML model deployments, APIs, GUI consumed by end users ? -Our reference implementation is an web application https://marquezproject.github.io/marquez/

- -

We definitely do not exclude any of the things you're talking about - and it would make a lot of sense to talk more about potential usages.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-09 08:45:47
-
-

*Thread Reply:* > By definition, Job is a process definition that consumes and produces datasets. It is many to many relations? I’ve been wondering about that.Shouldn’t be more restrictive? -I think this is too SQL-centric view 🙂

- -

Not everything is a query. For example, those Flink streaming jobs can produce side outputs, or even push data to multiple sinks. We need to model those types of jobs too.

- -

If your application does not do multiple outputs, then I don't see how specification allowing those would impact you.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anthony Ivanov - (anthvt@gmail.com) -
-
2021-11-17 12:11:37
-
-

*Thread Reply:* > We definitely do not exclude any of the things you’re talking about - and it would make a lot of sense to talk more about potential usages. -Yes I think that would be great if we expand on potential usages. if Open Lineage documentation (perhaps) has all kind of examples for different use-cases or case studies. Financal or healthcase industry case study and how would someone doing integration with OpenLineage. It would be easier to understand the concepts and make sure things are modeled consistently.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anthony Ivanov - (anthvt@gmail.com) -
-
2021-11-17 14:19:19
-
-

*Thread Reply:* > I think this is too SQL-centric view 🙂 -> -> Not everything is a query. For example, those Flink streaming jobs can produce side outputs, or even push data to multiple sinks. We need to model those types of jobs too. -Thanks for answering @Maciej Obuchowski

- -

Even in SQL you can have multiple outputs if you look thing at transaction level. I was simply using it as an example.

- -

Maybe it would be clear what I mean in another example . Let’s say we have those phases

- -
  1. Ingest from sources
  2. Process/transform
  3. export to somewhere -(image/diagram) -https://mermaid.ink/img/eyJjb2RlIjoiXG5ncmFwaCBMUlxuICAgIHN1YmdyYXBoIFNvdXJjZXNcbi[…]yIjpmYWxzZSwiYXV0b1N5bmMiOnRydWUsInVwZGF0ZURpYWdyYW0iOmZhbHNlfQ
  4. -
- -

Let’s look at those two cases:

- -
  1. Within a single flink job and even task: Inventory & UI are both written to both S3, DB
  2. Within a single flink job and even task: Inventory is written only to S3, UI is written only to DB
  3. -
- -

In 1. open lineage run event could look like {inputs: [ui, inventory], outputs: [s3, db] }

- -

In 2. user can either do same as 1. (because data changes or copy-paste) which would be an error since both do not go to both -Likely accurate one would be -{inputs: [ui], outputs: [s3] } {inputs: [ui], outputs: [db] }

- -

If the specification standard required single output then

- -
  1. would be modelled like run event {inputs: [ui, inventory], outputs: [s3] } ; {inputs: [ui, inventory], outputs: [db] } which is still correct if more verbose.
  2. could only be modelled this way: -{inputs: [ui], outputs: [s3] }; {inputs: [ui], outputs: [db] }
  3. -
- -

The more restrictive specification seems to lower the chance for an error doesn’t it?

- -

Also if tools know spec guarantees single output , they’d be able to write tracing capabilities which are more precise because the structure would allow for less ambiguity. -Storage backends that implement the spec could be also written in more optimal ways perhaps I have not looked into those accuracy of those hypothesis though.

- -

Those were the thoughts I was thinking when asking about that. I’d be curious if there’s document on the research of pros/cons and alternatives for the design of the current specifications

-
- - - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-23 05:38:11
-
-

*Thread Reply:* @Anthony Ivanov I see what you're trying to model. I think this could be solved by column level lineage though - when we'll have it. OL consumer could look at particular columns and derive which table contained particular error.

- -

> 2. Within a single flink job and even task: Inventory is written only to S3, UI is written only to DB -Does that actually happen? I understand this in case of job, but having single operator write to two different systems seems like bad design. Wouldn't that leave the possibility of breaking exactly-once unless you're going full into two phase commit?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anthony Ivanov - (anthvt@gmail.com) -
-
2021-11-23 17:02:36
-
-

*Thread Reply:* > Does that actually happen? I understand this in case of job, but having single operator write to two different systems seems like bad design -In a Spark or flink job it is less likely now that you mention it. But in a batch job (airflow python or kubernetes operator for example) users could do anything and then they’d need lineage to figure out what is wrong if even if what they did is suboptimal 🙂

- -

> I see what you’re trying to model. -I am not trying to model something specific. I am trying to understand how would openlineage be used in different organisations/companies and use-cases.

- -

> I think this could be solved by column level lineage though -There’s something specific planned ? I could not find a ticket in github. I thought you can use Dataset Facets - Schema for example could be subset of columns for a table …

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-24 04:55:41
-
-

*Thread Reply:* @Anthony Ivanov take a look at this: https://github.com/OpenLineage/OpenLineage/issues/148

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-10 13:21:23
-
-

How do you deleting jobs/runs from Marquez/OpenLineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-11-10 16:17:10
-
-

*Thread Reply:* We’re adding APIs to delete metadata in Marquez 0.20.0. Here’s the related issue, https://github.com/MarquezProject/marquez/issues/1736

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-11-10 16:17:37
-
-

*Thread Reply:* Until then, you can connected to the DB directly and drop the rows from both the datasets and jobs tables (I know, not dieal)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 05:03:50
-
-

*Thread Reply:* Thanks! I assume deleting information will remain a Marquez only feature rather than becoming part of OpenLineage itself?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-12-10 14:07:57
-
-

*Thread Reply:* Yes! Delete operations will be an action supported by consumers of OpenLineage events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 05:13:31
-
-

Am I understanding namespaces correctly? A job namespace is different to a Dataset namespace. -And that job namespaces define a job environment, like Airflow, Spark or some other system that executes jobs. But Dataset namespace define data locations, like an S3 bucket, local file system or schema in a Database?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 05:14:39
-
-

*Thread Reply:* I've been skimming this page: https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-11 05:46:06
-
-

*Thread Reply:* Yes!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 06:17:01
-
-

*Thread Reply:* Excellent, I think I had mistakenly conflated the two originally. This document makes it a little clearer. -As an additional question: -When viewing a Dataset in Marquez will it cross the job namespace bounds? As in, will I see jobs from different job namespaces?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 09:20:14
-
-

*Thread Reply:* In this example I have 1 job namespace and 2 dataset namespaces: -sql-runner-dev is the job namespace. -I cannot see a graph of my job now. Is this something to do with the namespace names?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 09:21:46
-
-

*Thread Reply:* The above document seems to have implied a namespace could be like a connection string for a database

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 09:22:25
-
-

*Thread Reply:* Wait, it does work? Marquez was being temperamental

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 09:24:01
-
-

*Thread Reply:* Yes, marquez is unable to fetch lineage for either dataset

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 09:32:19
-
-

*Thread Reply:* Here's what I mean:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-11 09:59:24
-
-

*Thread Reply:* I think you might have hit this issue: https://github.com/MarquezProject/marquez/issues/1744

-
- - - - - - - -
-
Labels
- bug -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-11 10:00:29
-
-

*Thread Reply:* or, maybe not? It was released already.

- -

Can you create issue on github with those helpful gifs? @Lyndon Armitage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 10:58:25
-
-

*Thread Reply:* I think you are right Maciej

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 10:58:52
-
-

*Thread Reply:* Was that patched in 0,19.1?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-11 11:06:06
-
-

*Thread Reply:* As far as I see yes: https://github.com/MarquezProject/marquez/releases/tag/0.19.1

- -

Haven't tested this myself unfortunately.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 11:07:07
-
-

*Thread Reply:* Perhaps not. It is urlencoding them: -<http://localhost:3000/lineage/dataset/jdbc%3Ah2%3Amem%3Asql_tests_like/HBMOFA.ORDDETP> -But the error seems to be in marquez getting them.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 11:09:23
-
-

*Thread Reply:* This is an example Lineage event JSON I am sending.

- -
- - - - - - - -
- - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 11:11:29
-
-

*Thread Reply:* I did run into another issue with really long names not being supported due to Marquez's DB using a fixed size string for a column, but that is understandable and probably a non-issue (my test code was generating temporary folders with long names).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 11:22:00
- -
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-11 11:36:01
-
-

*Thread Reply:* @Lyndon Armitage can you create issue on the Marquez repo? https://github.com/MarquezProject/marquez/issues

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lyndon Armitage - (lyndon.armitage@gmail.com) -
-
2021-11-11 11:52:36
-
-

*Thread Reply:* https://github.com/MarquezProject/marquez/issues/1761 Is this sufficient?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-11 11:54:41
-
-

*Thread Reply:* Yup, thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-15 13:00:39
-
-

I am looking at an AWS Glue Crawler lineage event. The glue crawler creates or updates a table schema, and I have a few questions on aligning to best practice.

- -
  1. Is this a dataset create/update or…
  2. … a job with no dataset inputs and only dataset outputs or
  3. … is the path in S3 the input and the Glue table the output?
  4. Is there an example of the lineage even here I can clone or work from? -Thanks.
  5. -
- - - -
- 🚀 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-15 13:04:19
-
-

*Thread Reply:* Hi Francis, for the event is it creating a new table with new data in glue / adding new data to an existing one or is it simply reformatting an existing table or making an empty one?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-15 13:35:00
-
-

*Thread Reply:* The table does not exist in the Glue catalog until …

- -

A Glue crawler connects to one or more data stores (in this case S3), determines the data structures, and writes tables into the Data Catalog.

- -

The data/objects are in S3, the Glue catalog is a metadata representation (HIVE) as as table.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-15 13:41:14
-
-

*Thread Reply:* Hmm, interesting, so the lineage of interest here would be of the metadata flow not of the data itself?

- -

In that case I’d say that the glue Crawler is a job that outputs a dataset.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-11-15 15:03:36
-
-

*Thread Reply:* The crawler is a job that discovers a dataset. It doesn't create it. If you're posting lineage yourself, I'd post it as an input event, not an output. The thing that actually wrote the data - generated the records and stored them in S3 - is the thing that would be outputting the dataset

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-15 15:23:23
-
-

*Thread Reply:* @Michael Collado I agree the crawler discovers the S3 dataset. It also creates an event which creates/updates the HIVE/Glue table.

- -

If the Glue table isn’t a distinct dataset from the S3 data, how does this compare to a view in a database on top of a table. Are they 2 datasets or just one?

- -

Glue can discover data in remote databases too, in those cases does it make sense to have only the source dataset?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-15 15:24:39
-
-

*Thread Reply:* @John Thomas yes, its the metadata flow.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-11-15 15:24:52
-
-

*Thread Reply:* that's how the Spark integration currently treats Hive datasets- I'd like to add a facet to attach that indicates that it is being read as a Hive table, and include all the appropriate metadata, but it uses the dataset's location in S3 as the canonical dataset identifier

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-15 15:29:22
-
-

*Thread Reply:* @Francis McGregor-Macdonald I think the way to represent this is predicated on what you’re looking to accomplish by sending a runEvent for the Glue crawler. What are your broader objectives in adding this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-15 15:50:37
-
-

*Thread Reply:* I am working through AWS native services seeing how they could, can, or do best integrate with openlineage (I’m an AWS SA). Hence the questions on best practice.

- -

Aligning with the Spark integration sounds like it might make sense then. Is there an example I could build from?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-11-15 17:56:17
-
-

*Thread Reply:* an example of reporting lineage? you can look at the Spark integration here - https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-15 17:59:14
-
-

*Thread Reply:* Ahh, in that case I would have to agree with Michael’s approach to things!

- - - -
- ✅ Diogo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-19 03:30:03
-
-

*Thread Reply:* @Michael Collado I am following the Spark integration you recommended (for a Glue job) and while everything appears to be set up correct, I am getting no lineage appear in marquez (a request.get from the pyspark script can reach the endpoint). Is there a way to enable a debug log so I can look to identify where the issue is? -Is there a specific place to look in the regular logs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-11-19 13:39:01
-
-

*Thread Reply:* listener output should be present in the driver logs. you can turn on debug logging in your log4j config (or whatever logging tool you use) for the package io.openlineage.spark.agent

- - - -
- ✅ Francis McGregor-Macdonald -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-11-19 19:44:06
-
-

Woo hoo! Initial Spark <-> Kafka support has been merged 🙂 https://github.com/OpenLineage/OpenLineage/pull/387

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- 🎉 Willy Lulciuc, John Thomas, Peter Hicks, Maciej Obuchowski -
- -
- 🙌 Willy Lulciuc, John Thomas, Francis McGregor-Macdonald, Peter Hicks, Maciej Obuchowski -
- -
- 🚀 Willy Lulciuc, John Thomas, Peter Hicks, Francis McGregor-Macdonald, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 13:32:57
-
-

I am “successfully” exporting lineage to openlineage from AWS Glue using the listener. Only the source load is showing, not the transforms, or the sink

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 13:34:15
-
-

*Thread Reply:* Output event:

- -

2021-11-22 08:12:15,513 INFO [spark-listener-group-shared] agent.OpenLineageContext (OpenLineageContext.java:emit(50)): Lineage completed successfully: ResponseMessage(responseCode=201, body=, error=null) { - “eventType”: “COMPLETE”, - “eventTime”: “2021-11-22T08:12:15.478Z”, - “run”: { - “runId”: “03bfc770-2151-499e-9265-8457a38ceec3”, - “facets”: { - “sparkversion”: { - “producer”: “https://github.com/OpenLineage/OpenLineage/tree/0.3.1/integration/spark”, - “schemaURL”: “https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet”, - “spark-version”: “3.1.1-amzn-0”, - “openlineage-spark-version”: “0.3.1” - } - } - }, - “job”: { - “namespace”: “sparkintegration”, - “name”: “nyctaxirawstage.mappartitionsunionmappartitionsnew_hadoop” - }, - “inputs”: [ - { - “namespace”: “s3.cdkdl-dev-foundationstoragef3787fa8-raw1d6fb60a-171gwxf2sixt9”, - “name”: “” - } - ], - “outputs”: [], - “producer”: “https://github.com/OpenLineage/OpenLineage/tree/0.3.1/integration/spark”, - “schemaURL”: “https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunEvent” -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 13:34:59
-
-

*Thread Reply:* This sink record is missing details …

- -

2021-11-22 08:12:15,481 INFO [Thread-7] sinks.HadoopDataSink (HadoopDataSink.scala:$anonfun$writeDynamicFrame$1(275)): nameSpace: , table:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 13:40:30
-
-

*Thread Reply:* I can also see multiple history events (presumably for each transform, each as above) emitted for the same Glue Job, with different RunId, with the same inputs and the same (null) output.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-22 14:31:06
-
-

*Thread Reply:* Are you using the existing spark integration for the spark lineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 14:46:47
-
-

*Thread Reply:* I followed: https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark -In the Glue context I was not clear on the correct settings for “spark.openlineage.parentJobName” and “spark.openlineage.parentRunId”, I put in static values (which may be incorrect)? -I injected these via: "--conf": "spark.openlineage.parentJobName=nyc-taxi-raw-stage",

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 14:47:54
-
-

*Thread Reply:* Happy to share what is working when I am done, I can’t seem to find an AWS Glue specific example to walk me through.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-22 15:03:31
-
-

*Thread Reply:* yeah, We haven’t spent any significant time with AWS Glue, but we just released the Databricks integration, which might help guide the way you’re working a little bit more

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 15:12:15
-
-

*Thread Reply:* from what I can see in the DBX integration (https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark/databricks) all of what is being done here I am doing in Glue (upload the jar, embed the settings into the Glue spark job). -It is emitting the above for each transform in the Glue job, but does not seem to capture the output …

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 15:13:54
-
-

*Thread Reply:* Is there a standard Spark test script in use with openlineage I could put into Glue to test without using any Glue specific functionality (without for example the GlueContext, or Glue dynamic frames)?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 15:25:30
-
-

*Thread Reply:* The initialisation does appear to be working if I compare it to the DBX README -Mine from AWS Glue… -21/11/22 18:48:48 INFO SparkContext: Registered listener io.openlineage.spark.agent.OpenLineageSparkListener -21/11/22 18:48:49 INFO OpenLineageContext: Init OpenLineageContext: Args: ArgumentParser(host=<http://ec2>-….<a href="http://compute-1.amazonaws.com:5000">compute-1.amazonaws.com:5000</a>, version=v1, namespace=spark_integration, jobName=default, parentRunId=null, apiKey=Optional.empty) URI: <http://ec2>-….<a href="http://compute-1.amazonaws.com:5000/api/v1/lineage">compute-1.amazonaws.com:5000/api/v1/lineage</a> -21/11/22 18:48:49 INFO AsyncEventQueue: Process of event SparkListenerApplicationStart(nyc-taxi-raw-stage,Some(spark-application-1637606927106),1637606926281,spark,None,None,None) by listener OpenLineageSparkListener took 1.092252643s.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-22 16:12:40
-
-

*Thread Reply:* We don’t have a test run, unfortunately, but you could follow this blog post’s processes in each and see what the differences are? https://openlineage.io/blog/openlineage-spark/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-22 16:43:23
-
-

*Thread Reply:* Thanks, I have been looking at that. I will create a Glue job aligned with that. What is the best way to pass feedback? Keep it here?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-22 16:49:50
-
-

*Thread Reply:* yeah, this thread will work great 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilya Davidov - (idavidov@marpaihealth.com) -
-
2022-07-18 11:37:02
-
-

*Thread Reply:* @Francis McGregor-Macdonald are you managed to enable it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2022-07-18 15:14:47
-
-

*Thread Reply:* Just DM you the code I used a while back (app.py + CDK code). I haven’t used it in a while, and there is some duplication in it. I had openlineage enabled, but dynamic frames not working yet with lineage. Let me know how you go. -I haven’t had the space to look at it in a while, but happy to support if you are looking at it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-11-23 08:48:51
-
-

how to use the Open lineage with amundsen ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-23 09:01:11
-
-

*Thread Reply:* You can use this: https://github.com/amundsen-io/amundsen/pull/1444

-
- - - - - - - -
-
Labels
- area:databuilder, area:dev-tools, area:docs -
- -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-11-23 09:38:44
-
-

*Thread Reply:* you can also check out this section from the Amundsen Community Meeting in october: https://www.youtube.com/watch?v=7WgECcmLSRk

-
-
YouTube
- -
- - - } - - Amundsen - (https://www.youtube.com/channel/UCgOyzG0sEoolxuC9YXDYPeg) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-11-23 08:49:16
-
-

do we need to use the Marquez ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-11-23 12:45:34
-
-

*Thread Reply:* No, I believe the databuilder OpenLineage extractor for Amundsen will continue to store lineage metadata in Atlas

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-11-23 12:47:01
-
-

*Thread Reply:* We've spoken to the Amundsen team, and though using Marquez to store lineage metadata isn't an option, it's an integration that makes sense but hasn't yet been prioritized

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-11-23 13:51:00
-
-

*Thread Reply:* Thanks , Right now amundsen has no support for lineage extraction from spark or airflow , if this case do we need to use marquez for open lineage implementation to capture the lineage from airflow & spark

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2021-11-23 13:57:13
-
-

*Thread Reply:* Maybe, that would mean running the full Amundsen stack as well as the Marquez stack along side each other (not ideal). The OpenLineage integration for Amundsen is very recent, so haven't had a chance to look deeply into the implementation. But, briefly looking over the config for Openlineagetablelineageextractor, you can only send metadata to Atlas

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-11-24 00:36:56
-
-

*Thread Reply:* @Willy Lulciuc thats our real concern , running the two stacks will make a mess environment , let me explain our amundsen setup , we are having neo4j as backend , (front end , search service , metadata service,elastic search & neo4j) . our requirement to capture lineage from spark and airflow , imported into amundsen

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vinith Krishnan US - (vinithk@nvidia.com) -
-
2022-03-11 22:33:39
-
-

*Thread Reply:* We are running into a similar issue. @Dinakar Sundar were you able to get the Amundsen OpenLineage integration to work with a neo4j backend?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
bitsofinfo - (bitsofinfo.g@gmail.com) -
-
2021-11-24 11:41:31
-
-

Hi all - i just watched the presentation on this and Marquez from the Airflow 21 summit. I was pretty impressed with this. My question is what other open source players are in this space or are pretty much people consolidating around this? (which would be great). Was looking at the available datasource extractors for the airflow side and would hope to see more here, looking at the code doesn't seem like too huge of a deal. Is there a roadmap available?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-24 11:49:14
-
-

*Thread Reply:* You can take a look at https://github.com/OpenLineage/OpenLineage/projects

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2021-11-24 19:24:48
-
-

Hi all, I was wondering what is the status of native support of openlineage for DataHub or Amundzen. re https://openlineage.slack.com/archives/C01CK9T7HKR/p1633633476151000?thread_ts=1633008095.115900&cid=C01CK9T7HKR -Many thanks!

-
- - -
- - - } - - Julien Le Dem - (https://openlineage.slack.com/team/U01DCLP0GU9) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2021-12-01 16:35:17
-
-

*Thread Reply:* Anyone? Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-11-25 01:42:26
-
-

our amundsen setup , we are having neo4j as backend , (front end , search service , metadata service,elastic search & neo4j) . our requirement to capture lineage from spark and airflow , imported into amundsen ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-11-29 23:30:12
-
-

Hello, OpenLineage folks - I'm curious if anyone here has ran into an issue like we're running into as we look to extend OpenLineage's Spark integration into Databricks.

- -

Has anyone ran into an issue where a scala class should exist (based on a decompiled jar, I see that it's a public class) but you keep getting an error like object SqlDWRelation in package sqldw cannot be accessed in package com.databricks.spark.sqldw?

- -

Databricks has a Synapse SQL DW connector: https://docs.databricks.com/data/data-sources/azure/synapse-analytics.html

- -

I want to extract the database URL, table, and schema from the logical plan but

- -

I execute something like the below command that runs a SELECT ** on the given tableName ("borrower" in this case) in the Azure Synapse database.

- -

val df = spark.read.format("com.databricks.spark.sqldw") -.option("url", sqlDwUrl) -.option("tempDir", tempDir) -.option("forwardSparkAzureStorageCredentials", "true") -.option("dbTable", tableName) -.load() -val logicalPlan = df.queryExecution.logical -val logicalRelation = logicalPlan.asInstanceOf[LogicalRelation] -val sqlBaseRelation = logicalRelation.relation -I end up with something like this, all good so far: -```logicalPlan: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = -Relation[memberId#97,residentialState#98,yearsEmployment#99,homeOwnership#100,annualIncome#101,incomeVerified#102,dtiRatio#103,lengthCreditHistory#104,numTotalCreditLines#105,numOpenCreditLines#106,numOpenCreditLines1Year#107,revolvingBalance#108,revolvingUtilizationRate#109,numDerogatoryRec#110,numDelinquency2Years#111,numChargeoff1year#112,numInquiries6Mon#113] SqlDWRelation("borrower")

- -

logicalRelation: org.apache.spark.sql.execution.datasources.LogicalRelation = -Relation[memberId#97,residentialState#98,yearsEmployment#99,homeOwnership#100,annualIncome#101,incomeVerified#102,dtiRatio#103,lengthCreditHistory#104,numTotalCreditLines#105,numOpenCreditLines#106,numOpenCreditLines1Year#107,revolvingBalance#108,revolvingUtilizationRate#109,numDerogatoryRec#110,numDelinquency2Years#111,numChargeoff1year#112,numInquiries6Mon#113] SqlDWRelation("borrower")

- -

sqlBaseRelation: org.apache.spark.sql.sources.BaseRelation = SqlDWRelation("borrower")`` -Schema, I can easily get withsqlBaseRelation.schema` but I cannot figure out:

- -
  1. How I can get the database name from the logical relation
  2. How I can get the table name from the logical relation ("borrower" is the table name so I can always parse the string if necessary" -I know that Databricks has the SqlDWRelation class which I think I need to cast the BaseRelation to BUT it appears to be in a jar / package that is inaccessible during the execution of a notebook. Specifically import com.databricks.spark.sqldw.SqlDWRelation is the relation and it appears to have a few accessors that would help me answer some of these questions: params and JDBCWrapper
  3. -
- -

Of course this is undocumented on the Databricks side 😰

- -

If I could cast the BaseRelation into this SqlDWRelation, I'd be able to get this info. However, whenever I attempt to use the imported SqlDWRelation, I get an error object SqlDWRelation in package sqldw cannot be accessed in package com.databricks.spark.sqldw I'm hoping someone has run into something similar in the past on the Spark / Databricks / Scala side and might share some advice. Thank you for any guidance!

-
-
docs.databricks.com
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-30 07:03:30
- -
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-11-30 11:21:34
-
-

*Thread Reply:* I have not! Will give it a try, Maciej! Thank you for the reply!

- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-11-30 15:20:18
-
-

*Thread Reply:* 🙏 @Maciej Obuchowski we're not worthy! That was the magic we needed. Seems like a hack since we're snooping in on private classes but if it works...

- -

Thank you so much for pointing to those utilities!

- - - -
- ❤️ Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-11-30 15:48:25
-
-

*Thread Reply:* Glad I could help!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2021-11-30 19:43:03
-
-

A colleague pointed me at https://open-metadata.org/, is there anywhere a view or comparison of this and openlineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2021-12-01 08:51:28
-
-

*Thread Reply:* Different concepts. OL is focused on describing the lineage and metadata of the running jobs. So it keeps track of all the metadata (schema, ...) of inputs and outputs at the time transformation occurs + transformation metadata (code version, cost, etc.)

- -

OM I am not an expert but it's a metadata model with clients and API around it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
RamanD - (romantanzar@gmail.com) -
-
2021-12-01 12:33:51
-
-

Hey! OpenLineage is a beautiful initiative, to be honest! We also try to accommodate it. One question, maybe it's already described somewhere then many apologies :) if we need to propagate run id from Airflow to a child task (AWS Batch job, for instance) what will be the best way to do it in the current realization (as we get run id only at post execute phase)?.. We use Airflow 2+ integration.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-01 12:40:53
-
-

*Thread Reply:* Hey. For technical reasons, we can't automatically register macro that does this job, as we could in Airflow 1 integration. You could put it yourself:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-01 12:41:02
-
-

*Thread Reply:* ```def lineageparentid(run_id, task): - """ - Macro function which returns the generated job and run id for a given task. This - can be used to forward the ids from a task to a child run so the job - hierarchy is preserved. Child run can create ParentRunFacet from those ids. - Invoke as a jinja template, e.g.

- -
PythonOperator(
-    task_id='render_template',
-    python_callable=my_task_function,
-    op_args=['{{ lineage_parent_id(run_id, task) }}'], # lineage_run_id macro invoked
-    provide_context=False,
-    dag=dag
-)
-
-:param run_id:
-:param task:
-:return:
-"""
-with create_session() as session:
-    job_name = openlineage_job_name(task.dag_id, task.task_id)
-    ids = JobIdMapping.get(job_name, run_id, session)
-    if ids is None:
-        return ""
-    elif isinstance(ids, list):
-        run_id = "" if len(ids) == 0 else ids[0]
-    else:
-        run_id = str(ids)
-    return f"{_DAG_NAMESPACE}/{job_name}/{run_id}"
-
- -

def openlineagejobname(dagid: str, taskid: str) -> str: - return f'{dagid}.{taskid}'```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-01 12:41:13
-
-

*Thread Reply:* from here: https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/openlineage/airflow/dag.py#L77

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
RamanD - (romantanzar@gmail.com) -
-
2021-12-01 12:53:27
-
-

*Thread Reply:* the quickest response ever! And that works like a charm 🙌

- - - -
- 👍 Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-01 13:21:16
-
-

*Thread Reply:* Glad I could help!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-01 14:14:23
-
-

@Maciej Obuchowski and @Michael Collado given your work on the Spark Integration, what's the right way to explore the Write operations' logical plans? When doing a read, it's easy! In scala df.queryExecution.logical gives you exactly what you need but how do you guys interactively explore what sort of commands are being used during a write? We are exploring some of the DataSourceV2 data sources and are hoping to learn from you guys a bit more, please 😃

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-01 14:18:00
-
-

*Thread Reply:* For SQL, EXPLAIN EXTENDED and show() in scala-shell is helpful:

- -

spark.sql("EXPLAIN EXTENDED CREATE TABLE tbl USING delta LOCATION '/tmp/delta' AS SELECT ** FROM tmp").show(false) -```|== Parsed Logical Plan == -'CreateTableAsSelectStatement [tbl], delta, /tmp/delta, false, false -+- 'Project [**] - +- 'UnresolvedRelation [tmp], [], false

- -

== Analyzed Logical Plan ==

- -

CreateTableAsSelect org.apache.spark.sql.delta.catalog.DeltaCatalog@63c5b63a, default.tbl, [provider=delta, location=/tmp/delta], false -+- Project [x#12, y#13] - +- SubqueryAlias tmp - +- LocalRelation [x#12, y#13]

- -

== Optimized Logical Plan == -CreateTableAsSelect org.apache.spark.sql.delta.catalog.DeltaCatalog@63c5b63a, default.tbl, [provider=delta, location=/tmp/delta], false -+- LocalRelation [x#12, y#13]

- -

== Physical Plan == -AtomicCreateTableAsSelect org.apache.spark.sql.delta.catalog.DeltaCatalog@63c5b63a, default.tbl, LocalRelation [x#12, y#13], [provider=delta, location=/tmp/delta, owner=mobuchowski], [], false -+- LocalTableScan [x#12, y#13] -|```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-01 14:27:25
-
-

*Thread Reply:* For dataframe api, I'm usually just either logging plan to console from OpenLineage listener, or looking at sparklogicalPlan or sparkunknown facets send by listener - even when the particular write operation isn't supported by integration, those facets should have some relevant info.

- - - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-01 14:27:40
-
-

*Thread Reply:* For example, for the query I've send at comment above, the spark_logicalPlan facet looks like this:

- -

"spark.logicalPlan": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.4.0-SNAPSHOT/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>", - "plan": [ - { - "allowExisting": false, - "child": [ - { - "class": "org.apache.spark.sql.catalyst.plans.logical.LocalRelation", - "data": null, - "isStreaming": false, - "num-children": 0, - "output": [ - [ - { - "class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", - "dataType": "integer", - "exprId": { - "id": 2, - "jvmId": "e03e2860-a24b-41f5-addb-c35226173f7c", - "product-class": "org.apache.spark.sql.catalyst.expressions.ExprId" - }, - "metadata": {}, - "name": "x", - "nullable": false, - "num-children": 0, - "qualifier": [] - } - ], - [ - { - "class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", - "dataType": "integer", - "exprId": { - "id": 3, - "jvmId": "e03e2860-a24b-41f5-addb-c35226173f7c", - "product-class": "org.apache.spark.sql.catalyst.expressions.ExprId" - }, - "metadata": {}, - "name": "y", - "nullable": false, - "num-children": 0, - "qualifier": [] - } - ] - ] - } - ], - "class": "org.apache.spark.sql.execution.command.CreateViewCommand", - "name": { - "product-class": "org.apache.spark.sql.catalyst.TableIdentifier", - "table": "tmp" - }, - "num-children": 0, - "properties": null, - "replace": true, - "userSpecifiedColumns": [], - "viewType": { - "object": "org.apache.spark.sql.catalyst.analysis.LocalTempView$" - } - } - ] - },

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-01 14:38:55
-
-

*Thread Reply:* Okay! That is very helpful! I wasn't sure if there was a fancier trick but I can definitely do logging 🙂 Our challenge was that our proprietary packages were resulting in Null Pointer Exceptions when it tried to push to OpenLineage 😞

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-01 14:39:02
-
-

*Thread Reply:* Thank you as usual!!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-01 14:40:25
-
-

*Thread Reply:* You can always add test cases and add breakpoints to debug in your IDE. That doesn't work for the container tests, but it does work for the other ones

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-01 14:47:20
-
-

*Thread Reply:* Ah! That's a great point! I definitely would appreciate being able to poke at the objects interactively in a debug mode. Thank you for the guidance as well!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ricardo Gaspar - (ricardogaspar2@gmail.com) -
-
2021-12-03 11:49:10
-
-

hi everyone! 👋 -Very noob question here: I’ve been wanting to play with Marquez and open lineage for my company’s projects. I use mostly scala & spark, but also Airflow. -I’ve been reading and watching talks about OpenLineage and Marquez. -So far i didn’t quite discover if Marquez or OpenLineage does field-level lineage (with Spark), like spline tries to.

- -

Any idea?

- -

Other sources about this topic -• https://medium.com/cdapio/data-integration-with-field-level-lineage-5d9986524316 -• https://medium.com/cdapio/field-level-lineage-part-1-3cc5c9e1d8c6 -• https://medium.com/cdapio/designing-field-level-lineage-part-2-b6c7e6af5bf4 -• https://www.youtube.com/playlist?list=PL897MHVe_nHeEQC8UnCfXecmZdF0vka_T -• https://www.youtube.com/watch?v=gKYGKXIBcZ0 -• https://www.youtube.com/watch?v=eBep6rRh7ic

-
-
Medium
- - - - - - -
-
Reading time
- 6 min read -
- - - - - - - - - - - - -
-
-
Medium
- - - - - - -
-
Reading time
- 6 min read -
- - - - - - - - - - - - -
-
-
Medium
- - - - - - -
-
Reading time
- 10 min read -
- - - - - - - - - - - - -
-
-
YouTube
- - - - - - - - - - - - - - - - - -
-
-
YouTube
- -
- - - } - - CDAP - (https://www.youtube.com/c/CDAPio) -
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Francis McGregor-Macdonald -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-12-03 11:55:17
-
-

*Thread Reply:* Hi Ricardo - OpenLineage doesn’t currently have support for field-level lineage, but it’s definitely something we’ve been looking into. This is a great collection of resources 🙂

- -

We’ve to-date been working on our integrations library, making it as easy to set up as possible.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ricardo Gaspar - (ricardogaspar2@gmail.com) -
-
2021-12-03 12:01:25
-
-

*Thread Reply:* Thanks John! I was checking the issues on github and other posts here. Just wanted to clarify that. -I’ll keep an eye on it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-12-06 20:25:19
-
-

The next OpenLineage monthly meeting is this Wednesday at 9am PT. (everybody is welcome to join) -The slides are here: https://docs.google.com/presentation/d/1q2Be7WTKlIhjLPgvH-eXAnf5p4w7To9v/edit#slide=id.ge4b57c6942_0_75 -tentative agenda: -• SPDX headers [Mandy Chessel] -• Azure Purview + OpenLineage [Will Johnson, Mark Taylor] -• Logging backend (OpenTelemetry, ...) [Julien Le Dem] -• Open discussion -Please chime in in this thread if you’d want to add something

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-12-06 20:28:09
-
-

*Thread Reply:* The link to join the meeting is on the wiki: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2021-12-06 20:28:25
-
-

*Thread Reply:* Please reach out to me if you’d like to be added to a gcal invite

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-12-06 22:37:29
-
-

@John Thomas we in Condenast currently exploring the features of open lineage to integrate to databricks , https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark/databricks , spark configuration not working ,

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-08 02:03:37
-
-

*Thread Reply:* Hi Dinakar. Can you give some specifics regarding what kind of problem you're running into?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-12-09 10:15:50
-
-

*Thread Reply:* Hi @Michael Collado, were able to set the spark configuration for spark extra listener & placed jars as well , wen i ran the sapark job , Lineage is not get tracked into the marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-12-09 10:34:39
-
-

*Thread Reply:* {"producer":"https://github.com/OpenLineage/OpenLineage/tree/0.3.1/integration/spark","schemaURL":"https://github.com/OpenLineage/OpenLineage/tree/0.3.1/integration/spark/facets/spark/v1/output-statistics-facet.json","rowCount":0,"size":-1,"status":"DEPRECATED"}},"outputFacets":{"outputStatistics":{"producer":"https://github.com/OpenLineage/OpenLineage/tree/0.3.1/integration/spark","schemaURL":"https://openlineage.io/spec/facets/1-0-0/OutputStatisticsOutputDatasetFacet.json#/$defs/OutputStatisticsOutputDatasetFacet","rowCount":0,"size":-1}}}],"producer":"https://github.com/OpenLineage/OpenLineage/tree/0.3.1/integration/spark","schemaURL":"https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunEvent"} -OpenLineageHttpException(code=0, message=java.lang.IllegalArgumentException: Cannot construct instance of io.openlineage.spark.agent.client.HttpError (although at least one Creator exists): no String-argument constructor/factory method to deserialize from String value ('{"code":404,"message":"HTTP 404 Not Found"}') - at [Source: UNKNOWN; line: -1, column: -1], details=java.util.concurrent.CompletionException: java.lang.IllegalArgumentException: Cannot construct instance of io.openlineage.spark.agent.client.HttpError (although at least one Creator exists): no String-argument constructor/factory method to deserialize from String value ('{"code":404,"message":"HTTP 404 Not Found"}') - at [Source: UNKNOWN; line: -1, column: -1]) - at io.openlineage.spark.agent.OpenLineageContext.emit(OpenLineageContext.java:48) - at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:122) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$onJobStart$3(OpenLineageSparkListener.java:159) - at java.util.Optional.ifPresent(Optional.java:159) - at io.openlineage.spark.agent.OpenLineageSparkListener.onJobStart(OpenLineageSparkListener.java:148) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:119) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:103) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1585) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-12-09 13:29:42
-
-

*Thread Reply:* Issue solved , mentioned the version wrongly as 1 instead v1

- - - -
- 🙌 Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jitendra Sharma - (jitendra_sharma@condenast.com) -
-
2021-12-07 02:07:06
-
-

👋 Hi everyone!

- - - -
- 👋 Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
kavuri raghavendra - (kavuri.raghavendra@gmail.com) -
-
2021-12-08 05:37:44
-
-

Hello Everyone.. we are exploring Openlineage for capturing Spark lineage.. but form the GitHub(https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark) ..I see that the output send to API (Marquez).. how can I send it to Kafka topic.. can some body please guide me on this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2021-12-08 12:15:38
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/400/files

- -

there’s ongoing PR for proxy backend, which opens http API and redirects events to Kafka.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2021-12-08 12:17:38
-
-

*Thread Reply:* Hi Kavuri, as minkyu said, there's currently work going on to simplify this process.

- -

For now, you'll need to make something to capture the HTTP api events and send them to the Kafka topic. Changing the spark.openlineage.url parameter will send the runEvents wherever you like, but obviously you can't directly produce HTTP events to a topic

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
kavuri raghavendra - (kavuri.raghavendra@gmail.com) -
-
2021-12-08 22:13:09
-
-

*Thread Reply:* Many Thanks for the Reply.. As I understand, currently pushing lineage to kafka topic is not yet there. it is under implementation. If you can help me out in understanding in which version it is going to be present, that will help me a lot. Thanks in advance.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2021-12-09 12:57:10
-
-

*Thread Reply:* Not sure about the release plan, but the http endpoint is just regular RESTful API, and you will be able to write a super simple proxy for your own use case if you want.

- - - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-12 00:13:54
-
-

Hi, Open Lineage team - For the Spark Integration, I'm looking to extract information from a DataSourceV2 data source.

- -

I'm working on the WRITE side of the data source and right now I'm touching the AppendData logical plan (I can't find the Java Doc): https://github.com/rdblue/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L446

- -

I was able to extract out the table name (from the named relation) but I'm struggling getting out the schema next.

- -

I noticed that the AppendData offers inputSet, schema, and outputSet. -• inputSet gives me an AttributeSet which does contain the names of my columns (https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala#L69) -• schema returns an empty StructType -• outputSet is an empty AttributeSet -I thought I read in the Spark Internals book that outputSet would only be populated if there was some sort of change to the DataFrame columns but I cannot find that page and searching for spark outputSet turns up few relevant results.

- -

Has anyone else worked with the AppendData plan and gotten the schema out of it? Am I going down the wrong path with this snippet of code below? Thank you for any guidance!

- -

if (logical instanceof AppendData) { - AppendData appendOp = (AppendData) logical; - NamedRelation namedRel = appendOp.table(); - <a href="http://log.info">log.info</a>(namedRel.name()); // Works great! - <a href="http://log.info">log.info</a>(appendOp.inputSet().toString());// This will get you a rough schema - StructType schema = appendOp.schema(); // This is an empty StructType - <a href="http://log.info">log.info</a>(schema.json()); // Nothing useful here - }

- -
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-12 07:34:13
-
-

*Thread Reply:* One thing, you're looking at Ryan's fork of Spark, which is few thousand commits behind head 🙂

- -

This one should be good: -https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala#L72

- -

About schema: looking at AppendData's query schema should work, if there's no change to columns, because to pass analysis, data being inserted have to match table's schema. I would test that though 🙂

- -

On the other hand, current AppendDataVisitor just looks at AppendData's table and tries to extract dataset from it using list of common output visitors:

- -

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/src/main/co[…]o/openlineage/spark/agent/lifecycle/plan/AppendDataVisitor.java

- -

In this case, the DataSourceV2RelationVisitor would look at it, provided we're using Spark 3:

- -

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/src/main/sp[…]ge/spark3/agent/lifecycle/plan/DataSourceV2RelationVisitor.java

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-12 07:37:04
-
-

*Thread Reply:* In this case, we basically need more info about nature of this DataSourceV2Relation, because this is provider-dependent. We have Iceberg in main branch and Delta here: https://github.com/OpenLineage/OpenLineage/pull/393/files#diff-7b66a9bd5905f4ba42914b73a87d834c1321ebcf75137c1e2a2413c0d85d9db6

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-13 14:54:13
-
-

*Thread Reply:* Ah! Maciej! As always, thank you! Looking through the DataSourceV2RelationVisitor you provided, it looks like the connector (Azure Cosmos Db) doesn't provide that Provider property 😞 😞 😞

- -

Is there any other method for determining the type of DataSourceV2Relation?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-13 14:57:06
-
-

*Thread Reply:* And, to make sure I close out on my original question, it was as simple as the code that Maciej was using:

- -

I merely needed to use DataSourceV2Realtion rather than NamedRelation!

- -

DataSourceV2Relation relation = (DataSourceV2Relation)appendOp.table(); - <a href="http://log.info">log.info</a>(relation.schema().toString()); - <a href="http://log.info">log.info</a>(relation.name());

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-15 06:20:31
-
-

*Thread Reply:* Are we talking about this connector? https://github.com/Azure/azure-sdk-for-java/blob/934200f63dc5bc7d5502a95f8daeb8142[…]/src/main/scala/com/azure/cosmos/spark/ItemsReadOnlyTable.scala

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-15 06:22:05
-
-

*Thread Reply:* I guess you can use object.getClass.getCanonicalName() to find if the passed class matches the one that Cosmos provider uses.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-15 09:53:24
-
-

*Thread Reply:* Yes! That's the one, Maciej! I will give getCanonicalName a try but also make a PR into that repo to get the provider property set up correctly 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-15 09:53:28
-
-

*Thread Reply:* Thank you so much!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-15 10:09:39
-
-

*Thread Reply:* Glad to help 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-15 10:22:58
-
-

*Thread Reply:* @Will Johnson could you tell on which commands from https://github.com/OpenLineage/OpenLineage/issues/368#issue-1038510649 you'll be working?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-15 10:24:14
-
-

*Thread Reply:* If any, of course 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-15 10:49:31
-
-

*Thread Reply:* From all of our tests on that Cosmos connector, it looks like it strictly uses athe AppendData operation. However @Harish Sune is looking at more of these commands from a Delta data source.

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-22 22:43:34
-
-

*Thread Reply:* Just to close the loop on this one - I submitted a PR for the work we've been doing. Looking forward to any feedback! https://github.com/OpenLineage/OpenLineage/pull/450

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-23 05:04:36
-
-

*Thread Reply:* Thanks @Will Johnson! I added one question about dataset naming.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-14 19:45:59
-
-

Finally got this doc posted - https://github.com/OpenLineage/OpenLineage/pull/437 (see the readable version here ) -Looking for feedback, @Willy Lulciuc @Maciej Obuchowski @Will Johnson

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2021-12-15 10:54:41
-
-

*Thread Reply:* Yes! This is awesome!! How might this work for an existing command like the DataSourceV2Visitor.

- -

Right now, OpenLineage checks based on the provider property if it's an Iceberg or Delta provider.

- -

Ideally, we'd be able to extend the list of providers or have a custom "CosmosDbDataSourceV2Visitor" that knew how to work with a custom DataSourceV2.

- -

Would that cause any conflicts if the base class is already accounted for in OpenLineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-15 11:13:20
-
-

*Thread Reply:* Resolving this would be nice addition to the doc (and, to the implementation) - currently, we're just returning result of first function for which isDefinedAt is satisfied.

- -

This means, that we can depend on the order of the visitors...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-15 13:59:12
-
-

*Thread Reply:* great question. For posterity, I'd like to move this to the PR discussion. I'll address the question there.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-14 19:50:57
-
-

Oh, and I forgot to post yesterday -OpenLineage 0.4.0 was released 🥳

- -

This was a big one. -• Split tests for Spark 2 and Spark 3 -• Spark output metrics -• Databricks support with init scripts -• Initial Iceberg support for Spark -• Initial Kafka support for Spark -• dbt build support -• forward compatibility for dbt versions -• lots of bug fixes 🙂 -Check the full changelog for details

- - - -
- 🙌 Maciej Obuchowski, Will Johnson, Peter Hicks, Manuel, Peter Hanssens -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2021-12-14 21:42:40
-
-

Hi @Michael Collado is there any documentation on using great expectations with open lineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-15 11:50:47
-
-

*Thread Reply:* hmm, actually the only documentation we have right now is on the demo.datakin.com site https://demo.datakin.com/onboarding . The great expectations tab should be enough to get you started

-
-
demo.datakin.com
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-15 11:51:04
-
-

*Thread Reply:* I'll open a ticket to copy that documentation to the OpenLineage site repo

- - - -
- 👍 Madhu Maddikera, Dinakar Sundar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Carlos Meza - (omar.m.8x@gmail.com) -
-
2021-12-15 09:52:51
-
-

Hello ! I am new on OpenLineage , awesome project !! ; anybody knows about integration with Deequ ? Or a way to capture dataset stats with openlineage ? Thanks ! Appreciate the help !

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-15 19:01:50
-
-

*Thread Reply:* Hi! We don't have any integration with deequ yet. We have a structure for recording data quality assertions and statistics, though - see https://github.com/OpenLineage/OpenLineage/blob/main/spec/facets/DataQualityAssertionsDatasetFacet.json and https://github.com/OpenLineage/OpenLineage/blob/main/spec/facets/DataQualityMetricsInputDatasetFacet.json for the specs.

- -

Check the great expectations integration to see how those facets are being used

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno González - (brugms2@gmail.com) -
-
2022-05-24 06:20:50
-
-

*Thread Reply:* This is great. Thanks @Michael Collado!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anatoliy Zhyzhkevych - (Anatoliy.Zhyzhkevych@franklintempleton.com) -
-
2021-12-19 22:40:33
-
-

Hi,

- -

I am testing Open Lineage/Marquez 0.4.0 with dbt 1.0.0 using dbt-ol build -It seems 12 events were generated but UI shows only history of runs with "Nothing to show here" in detail section about datasets/tests failures in dbt namespace. -The warehouse namespace shows lineage but no details about dataset/test failures .

- -

Please advice.

- -

02:57:54 Done. PASS=4 WARN=0 ERROR=3 SKIP=2 TOTAL=9 -02:57:54 Error sending message, disabling tracking -Emitting OpenLineage events: 100%|██████████████████████████████████████████████████████| 12/12 [00:00<00:00, 12.50it/s]

- -
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-20 04:15:51
-
-

*Thread Reply:* This is nothing to show here when you click on test node, right? What about run node?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anatoliy Zhyzhkevych - (Anatoliy.Zhyzhkevych@franklintempleton.com) -
-
2021-12-20 12:28:21
-
-

*Thread Reply:* There is no details about failure.

- -

```dbt-ol build -t DEV --profile cdp --profiles-dir /c/Work/dbt/cdp100/profiles --project-dir /c/Work/dbt/cdp100 --select +riskrawmastersharedshareclass -Running OpenLineage dbt wrapper version 0.4.0 -This wrapper will send OpenLineage events at the end of dbt execution. -02:57:21 Running with dbt=1.0.0 -02:57:23 [WARNING]: Configuration paths exist in your dbtproject.yml file which do not apply to any resources. -There are 1 unused configuration paths:

  • models.cdp.risk.raw.liquidity.shared
  • -
- -

02:57:23 Found 158 models, 181 tests, 0 snapshots, 0 analyses, 574 macros, 0 operations, 2 seed files, 56 sources, 1 exposure, 0 metrics -02:57:23 -02:57:35 Concurrency: 10 threads (target='DEV') -02:57:35 -02:57:35 1 of 9 START test dbtexpectationssourceexpectcompoundcolumnstobeuniquebsesharedpbshareclassEDMPORTFOLIOIDSHARECLASSCODEanyvalueismissingDELETEDFLAGFalse [RUN] -02:57:37 1 of 9 PASS dbtexpectationssourceexpectcompoundcolumnstobeuniquebsesharedpbshareclassEDMPORTFOLIOIDSHARECLASSCODEanyvalueismissingDELETEDFLAGFalse [PASS in 2.67s] -02:57:37 2 of 9 START view model REPL.SHARECLASSDIM.................................... [RUN] -02:57:39 2 of 9 OK created view model REPL.SHARECLASSDIM............................... [SUCCESS 1 in 2.12s] -02:57:39 3 of 9 START test dbtexpectationsexpectcompoundcolumnstobeuniquerawreplpbsharedshareclassRISKPORTFOLIOIDSHARECLASSCODEanyvalueismissingDELETEDFLAGFalse [RUN] -02:57:43 3 of 9 PASS dbtexpectationsexpectcompoundcolumnstobeuniquerawreplpbsharedshareclassRISKPORTFOLIOIDSHARECLASSCODEanyvalueismissingDELETEDFLAGFalse [PASS in 3.42s] -02:57:43 4 of 9 START view model RAWRISKDEV.STG.SHARECLASSDIM........................ [RUN] -02:57:46 4 of 9 OK created view model RAWRISKDEV.STG.SHARECLASSDIM................... [SUCCESS 1 in 3.44s] -02:57:46 5 of 9 START view model RAWRISKDEV.MASTER.SHARECLASSDIM..................... [RUN] -02:57:46 6 of 9 START test relationshipsriskrawstgsharedshareclassRISKINSTRUMENTIDRISKINSTRUMENTIDrefriskrawstgsharedsecurity_ [RUN] -02:57:46 7 of 9 START test relationshipsriskrawstgsharedshareclassRISKPORTFOLIOIDRISKPORTFOLIOIDrefriskrawstgsharedportfolio_ [RUN] -02:57:51 5 of 9 ERROR creating view model RAWRISKDEV.MASTER.SHARECLASSDIM............ [ERROR in 4.31s] -02:57:51 8 of 9 SKIP test relationshipsriskrawmastersharedshareclassRISKINSTRUMENTIDRISKINSTRUMENTIDrefriskrawmastersharedsecurity_ [SKIP] -02:57:51 9 of 9 SKIP test relationshipsriskrawmastersharedshareclassRISKPORTFOLIOIDRISKPORTFOLIOIDrefriskrawmastersharedportfolio_ [SKIP] -02:57:52 7 of 9 FAIL 7282 relationshipsriskrawstgsharedshareclassRISKPORTFOLIOIDRISKPORTFOLIOIDrefriskrawstgsharedportfolio_ [FAIL 7282 in 5.41s] -02:57:54 6 of 9 FAIL 6520 relationshipsriskrawstgsharedshareclassRISKINSTRUMENTIDRISKINSTRUMENTIDrefriskrawstgsharedsecurity_ [FAIL 6520 in 7.23s] -02:57:54 -02:57:54 Finished running 6 tests, 3 view models in 30.71s. -02:57:54 -02:57:54 Completed with 3 errors and 0 warnings: -02:57:54 -02:57:54 Database Error in model riskrawmastersharedshareclass (models/risk/raw/master/shared/riskrawmastersharedshareclass.sql) -02:57:54 002003 (42S02): SQL compilation error: -02:57:54 Object 'RAWRISKDEV.AUDIT.STGSHARECLASSDIMRELATIONSHIPRISKINSTRUMENTID' does not exist or not authorized. -02:57:54 compiled SQL at target/run/cdp/models/risk/raw/master/shared/riskrawmastersharedshareclass.sql -02:57:54 -02:57:54 Failure in test relationshipsriskrawstgsharedshareclassRISKPORTFOLIOIDRISKPORTFOLIOIDrefriskrawstgsharedportfolio (models/risk/raw/stg/shared/riskrawstgsharedschema.yml) -02:57:54 Got 7282 results, configured to fail if != 0 -02:57:54 -02:57:54 compiled SQL at target/compiled/cdp/models/risk/raw/stg/shared/riskrawstgsharedschema.yml/relationshipsriskrawstgsha19e10fb324f7d0cccf2aab512683f693.sql -02:57:54 -02:57:54 Failure in test relationshipsriskrawstgsharedshareclassRISKINSTRUMENTIDRISKINSTRUMENTID_refriskrawstgsharedsecurity_ (models/risk/raw/stg/shared/riskrawstgsharedschema.yml) -02:57:54 Got 6520 results, configured to fail if != 0 -02:57:54 -02:57:54 compiled SQL at target/compiled/cdp/models/risk/raw/stg/shared/riskrawstgsharedschema.yml/relationshipsriskrawstgsha_e3148a1627817f17f7f5a9eb841ef16f.sql -02:57:54 -02:57:54 See test failures:

- -
- -

select ** from RAWRISKDEV.AUDIT.STGSHARECLASSDIMrelationship_RISKINSTRUMENT_ID

- -
- -

02:57:54 -02:57:54 Done. PASS=4 WARN=0 ERROR=3 SKIP=2 TOTAL=9 -02:57:54 Error sending message, disabling tracking -Emitting OpenLineage events: 100%|██████████████████████████████████████████████████████| 12/12 [00:00<00:00, 12.50it/s]Emitted 14 openlineage events -(dbt) linux@dblnbk152371:/c/Work/dbt/cdp$```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-20 12:30:20
-
-

*Thread Reply:* I'm talking on clicking on non-test node in Marquez UI - the screenshots shared show you clicked on the one ending in test

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anatoliy Zhyzhkevych - (Anatoliy.Zhyzhkevych@franklintempleton.com) -
-
2021-12-20 16:46:11
-
-

*Thread Reply:* There are two types of failures: tests failed on stage model (relationships) and physical error in master model (no table with such name). The stage test node in Marquez does not show any indication of failures and dataset node indicates failure but without number of failed records or table name for persistent test storage. The failed master model shows in red but no details of failure. Master model tests were skipped because of model failure but UI reports "Complete".

- -
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-20 18:11:50
-
-

*Thread Reply:* If I understood correctly, for model you would like OpenLineage to capture message error, like this one -22:52:07 Database Error in model customers (models/customers.sql) -22:52:07 Syntax error: Expected "(" or keyword SELECT or keyword WITH but got identifier "PLEASE_REMOVE" at [56:12] -22:52:07 compiled SQL at target/run/jaffle_shop/models/customers.sql -And for dbt test failures, to visualize better that error is happening, for example like that:

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-20 18:23:12
-
-

*Thread Reply:* We actually do the first one for Airflow and Spark, I've missed it for dbt 😞

- -

Created issue to add it to spec in a generic way: -https://github.com/OpenLineage/OpenLineage/issues/446

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anatoliy Zhyzhkevych - (Anatoliy.Zhyzhkevych@franklintempleton.com) -
-
2021-12-20 22:49:54
-
-

*Thread Reply:* Sounds great. Failed/Skipped Tests/Models could be color-coded as well. Thanks.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorge Reyes (Zenta Group) - (jorge.reyes@zentagroup.com) -
-
2021-12-22 12:37:00
-
-

hello everyone , i'm learning Openlineage, I am trying to connect with airflow 2, is it possible? or that version is not yet released. this is currently throwing me airflow

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-22 12:38:26
-
-

*Thread Reply:* Hey. If you're using Airflow 2, you should use LineageBackend method described here: https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow#airflow-21-experimental

- - - -
- 🙌 Jorge Reyes (Zenta Group) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2021-12-22 12:39:06
-
-

*Thread Reply:* You don't need to do anything with DAG import then.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorge Reyes (Zenta Group) - (jorge.reyes@zentagroup.com) -
-
2021-12-22 12:40:30
-
-

*Thread Reply:* Thanks!!!!! i'll try

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-27 16:49:20
-
-

The PR at https://github.com/OpenLineage/OpenLineage/pull/451 should be everything needed to complete the implementation for https://github.com/OpenLineage/OpenLineage/pull/437 . The PR is in draft mode, as I still need ~1 day to update the integration test expectations to match the refactoring (there are some new events, but from my cursory look, the old events still match expected contents). But I think it's in a state that can be reviewed before the tests are updated.

- -

There are two other PRs that this one is based on - broken up for easier reviewing -• https://github.com/OpenLineage/OpenLineage/pull/447 -• https://github.com/OpenLineage/OpenLineage/pull/448

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2021-12-27 16:49:56
-
-

*Thread Reply:* @Will Johnson @Maciej Obuchowski FYI 👆

- - - -
- 🙌 Will Johnson, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-01-07 15:25:11
-
-

The next OpenLineage Technical Steering Committee meeting is Wednesday, January 12! Meetings are on the second Wednesday of each month from 9:00 to 10:00am PT.  -Join us on Zoom: https://us02web.zoom.us/j/81831865546?pwd=RTladlNpc0FTTDlFcWRkM2JyazM4Zz09 -All are welcome. -Agenda: -• OpenLineage 0.4 and 0.5 releases -• Egeria version 3.4 support for OpenLineage -• Airflow TaskListener to simplify OpenLineage integration [Maciej] -• Open Discussion -Notes: https://tinyurl.com/openlineagetsc

- - - -
- 🙌 Maciej Obuchowski, Ross Turk, John Thomas, Minkyu Park, Joshua Wankowski, Dalin Kim -
- -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2022-01-11 12:16:09
-
-

Hello community,

- -

We are able to post this datasource in marquez. But then the information about the facet with the datasource is not displayed in the UI.

- -

We want to display the S3 location (URI) where this datasource is pointing to. -{ - id: { - namespace: "<s3://hbi-dns-staging>", - name: "PCHG" - }, - type: "DB_TABLE", - name: "PCHG", - physicalName: "PCHG", - createdAt: "2022-01-11T16:15:54.887Z", - updatedAt: "2022-01-11T16:56:04.093153Z", - namespace: "<s3://hbi-dns-staging>", - sourceName: "<s3://hbi-dns-staging>", - fields: [], - tags: [], - lastModifiedAt: null, - description: null, - currentVersion: "c565864d-1a66-4cff-a5d9-2e43175cbf88", - facets: { - dataSource: { - uri: "<s3://hbi-dns-staging/sql-runner/2022-01-11/PCHG.avro>", - name: "<s3://hbi-dns-staging>", - _producer: "<a href="http://ip-172-25-23-163.dir.prod.aws.hollandandbarrett.comeu-west-1.com/172.25.23.163">ip-172-25-23-163.dir.prod.aws.hollandandbarrett.comeu-west-1.com/172.25.23.163</a>", - _schemaURL: "<https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet>" - } - } -}

- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2022-01-11 12:24:00
-
-

As you see there is no much info in openlineage UI

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-01-11 13:02:16
-
-

The OpenLineage TSC meeting is tomorrow! https://openlineage.slack.com/archives/C01CK9T7HKR/p1641587111000700

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-01-12 11:59:44
-
-

*Thread Reply:* ^ It’s happening now!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Virgil - (david.virgil.naranjo@googlemail.com) -
-
2022-01-14 06:46:44
-
-

any idea guys about the previous question?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2022-01-18 14:19:39
-
-

*Thread Reply:* Just to be clear, were you able to get a datasource information from API but just now showing up in the UI? Or you weren’t able to get it from API too?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2022-01-17 03:41:56
-
-

Hi everyone !! I am doing POC of OpenLineage with Airflow version 2.1, before that would like to know, if this version is supported by OpenLineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-01-18 11:40:00
-
-

*Thread Reply:* It does generally work, but, there's a known limitation in that only successful task runs are reported to the lineage backend. This is planned to be fixed in Airflow 2.3.

- - - -
- ✅ SAM -
- -
- ❤️ Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2022-01-18 20:35:52
-
-

*Thread Reply:* thank you. 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2022-01-17 06:47:54
-
-

Hello there, I’m using docker Airflow version 2.1.0 , below were the steps I performed but I encountered error, pls help:

- -
  1. Inside requirements.txt file i added openlineage-airflow . Then ran pip install -r requirements.txt .
  2. Added environmental variable using this command -export AIRFLOW__LINEAGE__BACKEND = openlineage.lineage_backend.OpenLineageBackend
  3. Then configured HTTP Backend environment variables inside “airflow” folder: -export OPENLINEAGE_URL=<http://marquez:5000>
  4. Ran Marquez using ./docker/up.sh & open web frontend UI and saw below error msg:
  5. -
- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-01-18 11:30:38
-
-

*Thread Reply:* hey, I'm aware of one small bug ( which will be fixed in the upcoming OpenLineage 0.5.0 ) which means you would also have to include google-cloud-bigquery in your requirements.txt. This is the bug: https://github.com/OpenLineage/OpenLineage/issues/438

- - - -
- ✅ SAM -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-01-18 11:31:51
-
-

*Thread Reply:* The other thing I think you should check is, did you def define the AIRFLOW__LINEAGE__BACKEND variable correctly? What you pasted above looks a little odd with the 2 = signs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-01-18 11:34:25
-
-

*Thread Reply:* I'm looking a task log inside my own Airflow and I see msgs like: -INFO - Constructing openlineage client to send events to

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-01-18 11:34:47
-
-

*Thread Reply:* ^ i.e. I think checking the task logs you can see if it's at least attempting to send data

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-01-18 11:34:52
-
-

*Thread Reply:* hope this helps!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2022-01-18 20:40:37
-
-

*Thread Reply:* Thank you, will try again.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-01-18 20:10:25
-
-

Just published OpenLineage 0.5.0 . Big items here are -• dbt-spark support -• New proxy message broker for forwarding OpenLineage messages to Kafka -• New extensibility API for Spark integration -Accompanying tweet thread on the latter two items here: https://twitter.com/PeladoCollado/status/1483607050953232385

- - - -
- 🙌 Maciej Obuchowski, Kevin Mellott -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-01-19 12:39:30
-
-

*Thread Reply:* BTW, this was actually the 0.5.1 release. Because, pypi... 🤷‍♂️:skintone4:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mario Measic - (mario.measic.gavran@gmail.com) -
-
2022-01-27 06:45:08
-
-

*Thread Reply:* nice on the dbt-spark support 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:12:14
-
-

HELLO everyone . I’ve been reading and watching talks about OpenLineage and Marquez . this solution is exactly what we been looking to lineage our etls . GREAT WORK . our etls based on postgres redshift and airflow. SO

- -

I tried to implement the example respecting all the steps required. everything runs successfully (the two dags on airflow ) on host http://localhost:3000/ but nothing appeared on marquez ui . am i missing something ? .

- -

I’am thinking about create a simple etl pandas to a pandas with some transformation . Like to have a poc to show it to my team . I REALLY NEED SOME HELP

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-01-19 11:13:35
-
-

*Thread Reply:* Are you using docker on mac with "Use Docker Compose V2" enabled?

- -

We've just found yesterday that it somehow breaks our example...

- - - -
- ✅ Mohamed El IBRAHIMI -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:14:51
-
-

*Thread Reply:* yes i just installed docker on mac

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:15:02
-
-

*Thread Reply:* and docker compose version 1.29.2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-01-19 11:20:24
-
-

*Thread Reply:* What you can do is to uncheck this, do docker system prune -a and try again.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:21:56
-
-

*Thread Reply:* done but i get this : Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-01-19 11:22:15
-
-

*Thread Reply:* Try to restart docker for mac

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-01-19 11:23:00
-
-

*Thread Reply:* It needs to show Docker Desktop is running :

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:24:01
-
-

*Thread Reply:* yeah done . I will try to implement the example again and see thank you very much

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:32:55
-
-

*Thread Reply:* i dont why i getting this when i $ docker-compose up :

- -

WARNING: The TAG variable is not set. Defaulting to a blank string. -WARNING: The APIPORT variable is not set. Defaulting to a blank string. -WARNING: The APIADMINPORT variable is not set. Defaulting to a blank string. -WARNING: The WEBPORT variable is not set. Defaulting to a blank string. -ERROR: The Compose file ‘./../docker-compose.yml’ is invalid because: -services.api.ports contains an invalid type, it should be a number, or an object -services.api.ports contains an invalid type, it should be a number, or an object -services.web.ports contains an invalid type, it should be a number, or an object -services.api.ports value [‘:’, ‘:’] has non-unique elements

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-01-19 11:46:12
-
-

*Thread Reply:* are you running it exactly like here, with respect to directories, etc?

- -

https://github.com/MarquezProject/marquez/tree/main/examples/airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:59:36
-
-

*Thread Reply:* yeah yeah my bad . every things work fine know . I see the graph in the ui

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 12:04:01
-
-

*Thread Reply:* one more question plz . As i said our etls based on postgres redshift and airflow . any advice you have for us to integrate OL to our pipeline ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed El IBRAHIMI - (mohamedelibrahimi700@gmail.com) -
-
2022-01-19 11:12:17
-
-

thank you very much

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-01-19 17:29:51
-
-

I’m upgrading our OL Java client from an older version (0.2.3) and noticed that the ol.newCustomFacetBuilder() method to create custom facets no longer exists. I can see in this code diff that it might be replaced by simply adding to the additional properties of the standard element you are extending.

- -

Can you please let me know if I’m understanding this change correctly? In other words, is the code in the diff functionally equivalent or is there a large change I should be understanding better?

- -

https://github.com/OpenLineage/OpenLineage/compare/0.2.3...0.4.0#diff-f0381d7e68797d9ec60551c96897809072582350e1657d23425747358ec6e471L196

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-01-19 17:50:39
-
-

*Thread Reply:* Hi Kevin - to my understanding that's correct. Do you guys have a custom extractor using this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-01-19 20:49:49
-
-

*Thread Reply:* Thanks John! We have custom code emitting OL events within our ingestion pipeline and it includes a custom facet. I’ll refactor the code to the new format and should be good to go.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-01-21 00:34:37
-
-

*Thread Reply:* Just to follow up, this code update worked as expected and we are all good on the upgrade.

- - - -
- 👍 Minkyu Park, John Thomas, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
SAM - (skhettri@gmail.com) -
-
2022-01-21 02:13:51
-
-

I’m not sure what went wrong, with Airflow docker, version 2.1.0 , below were the steps I performed but Marquez UI is showing no jobs, pls help:

- -
  1. requirements.txt i added openlineage-airflow==0.5.1 . Then ran pip install -r requirements.txt .
  2. Added environmental variable inside my airflow docker folder using this command: -export AIRFLOW__LINEAGE__BACKEND = openlineage.lineage_backend.OpenLineageBackend
  3. Then configured HTTP Backend environment variables inside same airflow docker folder: -export OPENLINEAGE_URL=<http://localhost:5000>
  4. Ran Marquez using ./docker/up.sh  which is in another folder, Front end UI is not showing any job, its empty:
  5. Attached in the airflow DAG log.
  6. -
- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-01-25 14:46:58
-
-

*Thread Reply:* Hm, that is odd. Usually there are a few lines in the DAG log from the OpenLineage bits. I’d expect to see something about not having an extractor for the operator you are using.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-01-25 14:47:53
-
-

*Thread Reply:* If you open a shell in your Airflow Scheduler container and check for the presence of AIRFLOW__LINEAGE__BACKEND is it properly set? Possible the env isn’t making it all the way there.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lena Kullab - (Lena.Kullab@storable.com) -
-
2022-01-21 13:38:37
-
-

Hi All,

- -

I am working on a POC of OpenLineage-Airflow integration and was attempting to get it configured with Amundsen (also working on a POC). Reading through the tutorial here https://openlineage.io/integration/apache-airflow/, under the Prerequisites section it says: -To use the OpenLineage Airflow integration, you'll need a running Airflow instance. You'll also need an OpenLineage compatible HTTP backend. -The example uses Marquez, but I was trying to figure out how to get it to send metadata to the Amundsen graph db backend. Does the Airflow integration only support configuration with an HTTP compatible backend?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-01-21 14:03:29
-
-

*Thread Reply:* Hi Lena! That’s correct, Openlineage is designed to send events to an HTTP backend. There’s a ticket on the future section of the roadmap to support pushing to Amundsen, but it’s not yet been worked on (Ref: Roadmap Issue #86)

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lena Kullab - (Lena.Kullab@storable.com) -
-
2022-01-21 14:08:35
-
-

*Thread Reply:* Thank you for the info!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
naman shaundik - (namanshaundik@gmail.com) -
-
2022-01-30 11:01:42
-
-

hi , i am completely new to openlineage and marquez, i have to integrate openlineage to my existing java project but i am completely confused on where to start, i have gone through documentation and all but i am not able to understand how to integrate openlineage using marquez http backend in my existing project. please someone help me. I may sound naive here but i am in dire need of help.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-01-30 12:37:39
-
-

*Thread Reply:* what do you mean by “Integrate Openlineage”?

- -

Can you give a little more information on what you’re trying to accomplish and what the existing project is?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
naman shaundik - (namanshaundik@gmail.com) -
-
2022-01-31 03:49:22
-
-

*Thread Reply:* I work in a datalake team and we are trying to implement data lineage property in our project using openlineage. our project basically keeps track of datasets coming from different sources(hive, redshift, elasticsearch etc.) and jobs.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-01-31 15:01:31
-
-

*Thread Reply:* Gotcha!

- -

Broadly speaking, all an integration needs to do is to send runEvents to Marquez.

- -

I'd start by understanding the OpenLineage data model, and then looking at your system to identify when / where runEvents should be sent from, and what information needs to be included.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
TJ Tang - (tj@tapdata.io) -
-
2022-02-15 15:28:03
-
-

*Thread Reply:* I suppose OpenLineage itself only defines the standard/protocol to design your data model. To be able to visualize/trace the lineage, you either have to implement your self with the standard data models or including Marquez in your project. You would need to use HTTP API to send lineage events from your Java project to Marquez in this case.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-16 11:17:13
-
-

*Thread Reply:* Exactly! This project also includes connectors for more common data tools (Airflow, dbt, spark, etc), but at it's core OpenLineage is a standard and protocol

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-02-02 19:55:13
-
-

The next OpenLineage Technical Steering Committee meeting is Wednesday, February 9. Meetings are on the second Wednesday of each month from 9:00 to 10:00am PT. -Join us on Zoom: https://us02web.zoom.us/j/81831865546?pwd=RTladlNpc0FTTDlFcWRkM2JyazM4Zz09 -All are welcome. Agenda items are always welcome, as well. Reply in thread with yours. -Current agenda: -• OpenLineage 0.5.1 release -• Apache Flink effort -• Dagster integration -• Open Discussion -Notes: https://tinyurl.com/openlineagetsc

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jensen Yap - (jensen@contxts.io) -
-
2022-02-03 00:33:45
-
-

Hi everybody!

- - - -
- 👋 Maciej Obuchowski, John Thomas -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-03 12:39:57
-
-

*Thread Reply:* Hello!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Albert Bikeev - (albert.bikeev@gmail.com) -
-
2022-02-04 09:36:46
-
-

Hi everybody! -Very cool initiative, thank you! Is there any traction on Apache Atlas integration? Is there some way to help you there?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-04 15:07:07
-
-

*Thread Reply:* Hey Albert! There aren't yet any issues or proposals around Apache Atlas yet, but that's definitely something you can help with!

- -

I'm not super familiar with Atlas, were you thinking in terms of enabling Atlas to receive runEvents from OpenLineage connectors?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Albert Bikeev - (albert.bikeev@gmail.com) -
-
2022-02-07 05:49:16
-
-

*Thread Reply:* Hi John! -Yes, exactly, it’d be nice to see Atlas as a receiver side of the OpenLineage events. Is there some guidelines on how to implement it? I guess we need OpenLineage-compatible server implementation so we could receive events and send them to Atlas, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-07 11:30:14
-
-

*Thread Reply:* exactly - This would be a change on the Atlas side. I’d start by opening an issue in the atlas repo about making an API endpoint that can receive OpenLineage events. -Marquez is our reference implementation of OpenLineage, so I’d look around in that repo to see how it’s been implemented :)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Albert Bikeev - (albert.bikeev@gmail.com) -
-
2022-02-07 11:50:27
-
-

*Thread Reply:* Got it, thanks! Did that: https://issues.apache.org/jira/browse/ATLAS-4550 -If it’d not get any traction we at New Work might contribute as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-07 11:56:09
-
-

*Thread Reply:* awesome! if you guys have any questions, reach out and I can get you in touch with some of the engineers on our end

- - - -
- 👍 Albert Bikeev -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-08 11:20:47
-
-

*Thread Reply:* @Albert Bikeev one minor thing that could be helpful: java OpenLineage library contains server model classes: https://github.com/OpenLineage/OpenLineage/pull/300#issuecomment-923489097

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Albert Bikeev - (albert.bikeev@gmail.com) -
-
2022-02-08 11:32:12
-
-

*Thread Reply:* Got it, thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Carlos Fernández Rodríguez - (jcfernandez@keedio.com) -
-
2022-05-04 11:12:23
-
-

*Thread Reply:* This is a quite old discussion, but isn't possible to use openlineage proxy to send json to kafka topic and let Atlas read that json without any modification? -It would be needed to create a new model for spark, other than https://github.com/apache/atlas/blob/release-2.1.0-rc3/addons/models/1000-Hadoop/1100-spark_model.json and upload it to atlas (what could be done with a call to the atlas Api) -Does it makes sense?

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Albert Bikeev -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-05-04 11:24:02
-
-

*Thread Reply:* @Juan Carlos Fernández Rodríguez - You still need to build a bridge between the OpenLineage Spec and the Apache Atlas entity JSON. So far, no one has contributed something like that to the open source community... yet!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Carlos Fernández Rodríguez - (jcfernandez@keedio.com) -
-
2022-05-04 14:24:28
-
-

*Thread Reply:* sorry for the ignorance, -But what is the purpose of the bridge?the comunicación with atlas should be done throw kafka, and that messages can be sent by the proxy. What are I missing?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-04 16:37:33
-
-

*Thread Reply:* "bridge" in this case refers to a service of some sort that converts from OpenLineage run event to Atlas entity JSON, since there's currently nothing that will do that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-05-19 09:08:23
-
-

*Thread Reply:* If OpenLineage send an event to kafka, I think we can use kafka stream or kafka connect to rebuild message to atlas event.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-05-19 09:11:37
-
-

*Thread Reply:* @John Thomas Our company used to use atlas as a metadata service. I just came into know this project. After I learned how openlineage works, I think I can create an issue to describe my design first.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-05-19 09:13:36
-
-

*Thread Reply:* @Juan Carlos Fernández Rodríguez If you already have some experience and design, can you directly create an issue so that we can discuss it in more detail ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Carlos Fernández Rodríguez - (jcfernandez@keedio.com) -
-
2022-05-19 12:42:31
-
-

*Thread Reply:* Hi @xiang chen we are discussing internally in my company if rewrite to atlas or another alternative. If we do this, we will share and could involve you in some way.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-02-04 15:02:29
-
-

Who here is working with OpenLineage at Dagster or Flink? We would love to hear about your work at the next on February 9 at 9 a.m. PT. Please reply here or message me to coordinate. @Ziyoiddin Yusupov

- - - -
- 👍 Ziyoiddin Yusupov -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luca Soato - (lucasoato@gmail.com) -
-
2022-02-04 19:18:24
-
-

Hi everyone, -OpenLineage is wonderful, we really needed something like this! -Has anyone else used it with Databricks, Delta tables or Spark? If someone is interested into these technologies we can work together to get a POC and share some thoughts. -Thanks and have a nice weekend! :)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julius Rentergent - (julius.rentergent@thetradedesk.com) -
-
2022-02-25 13:06:16
-
-

*Thread Reply:* Hi Luca, I agree this looks really promising. I’m working on getting it to run on Databricks, but I’m only just starting out 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-02-08 12:00:02
-
-

Friendly reminder: this month’s OpenLineage TSC meeting is tomorrow! https://openlineage.slack.com/archives/C01CK9T7HKR/p1643849713216459

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- ❤️ Kevin Mellott, John Thomas -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Albert Bikeev - (albert.bikeev@gmail.com) -
-
2022-02-10 08:22:28
-
-

Hi people, -One question regarding error reporting - what is the mechanism for that? E.g. if I send duplicated job to Openlineage, is there a way to notify me about that?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-10 09:05:39
-
-

*Thread Reply:* By duplicated, you mean with the same runId?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Albert Bikeev - (albert.bikeev@gmail.com) -
-
2022-02-10 11:40:55
-
-

*Thread Reply:* It’s only one example, could be also duplicated job name or anything else. The question is if there is mechanism to report that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-14 17:21:20
-
-

Reducing the Logging of Spark Integration

- -

Hey, OpenLineage community! I'm curious if there are any quick tricks / fixes to reduce the amount of logging happening in the OpenLineage Spark Integration. Each job seems to print out the Logical Plan with INFO level logging. The default behavior of Databricks is to print out INFO level logs and so it gets pretty cluttered and noisy.

- -

I'm hoping there's a feature flag that would help me shut off those kind of logs in OpenLineage's Spark integration 🤞

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-15 05:15:12
-
-

*Thread Reply:* I think this log should be dropped to debug: https://github.com/OpenLineage/OpenLineage/blob/d66c41872f3cc7f7cd5c99664d401e070e[…]c/main/common/java/io/openlineage/spark/agent/EventEmitter.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-15 23:27:07
-
-

*Thread Reply:* @Maciej Obuchowski that is a good one! It would be nice to still have SOME logging in info to know that the event complete successfully but that response and event is very verbose.

- -

I was also thinking about here: -https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/src/main/common/java/io/openlineage/spark/agent/lifecycle/OpenLineageRunEventBuilder.java#L337-L340

- -

and here: -https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/src/main/common/java/io/openlineage/spark/agent/lifecycle/OpenLineageRunEventBuilder.java#L405-L408

- -

These spots are where it's printing out the full logical plan for some reason.

- -

Can I just open up a PR and switch these to log.debug instead?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-16 04:59:17
-
-

*Thread Reply:* Yes, that would be good solution for now. Later would be nice to have some option to raise the log level - OL logs are absolutely drowning in logs from rest of Spark cluster when set to debug.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-16 13:35:15
-
-

[SPARK][INTEGRATION] Need Brainstorming Ideas - How to Persist / Access Spark Configs in JobEnd

- -

Hey, OL community! I'm working on PR#490 and I finally have all tests passing but now my desired behavior - display environment properties during COMPLETE / JobEnd events - is not happening 😭

- -

The previous approach stored the spark properties in the OpenLineageContext with a properties attribute but that was part of all of the test failures I believe.

- -

What are some other ways to store the jobStart's properties and make them accessible to the corresponding jobEnd? Hopefully it's okay to tag @Maciej Obuchowski, @Michael Collado, and @Paweł Leszczyński who have been extremely helpful in the past and brought great ideas to the table.

-
- - - - - - - -
-
Comments
- 6 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-02-16 13:44:30
-
-

*Thread Reply:* Hey, I responded on the issue, but just to make it clear for everyone, the OL events for a run are not expected to be an accumulation of all past events. Events should be treated as additive by the backend - each event can post what information it has about the run and the backend is responsible for constructing a holistic picture of the run

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-02-16 13:47:18
-
-

*Thread Reply:* e.g., here is the marquez code that fetches the facets for a run. Note that all of the facets are included from all events with the requested run_uuid. If the env facet is present on any event, it will be returned by the API

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-16 13:51:30
-
-

*Thread Reply:* Ah! Thanks for that @Michael Collado it's good to understand the OpenLineage perspective.

- -

So, we do need to maintain some state. That makes total sense, Mike.

- -

How does Marquez handle failed jobs currently? Based on this issue (https://github.com/OpenLineage/OpenLineage/issues/436) I think Marquez would show a START but no COMPLETE event, right?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-02-16 14:00:03
-
-

*Thread Reply:* If I were building the backend, I would store events, then calculate the end state later, rather than trying to "maintain some state" (maybe we mean the same thing, but using different words here 😀). -Re: the failure events, I think job failures will currently result in one FAIL event and one COMPLETE event. The SparkListenerJobEnd event will trigger a FAIL event but the SparkListenerSQLExecutionEnd event will trigger the COMPLETE event.

- - - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-16 15:16:27
-
-

*Thread Reply:* Oooh! I did not know we already could get a FAIL event! That is super helpful to know, Mike! Thank you so much!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-21 10:04:18
-
-

[SPARK] Connecting SparkListenerSQLExecutionStart to the various SparkListenerJobStarts

- -

TL;DR: How can I connect the SparkListenerSQLExecutionStart to the SparkListenerJobStart events coming out of OpenLineage? The events appear to have two separate run ids and no link to indicate that the ExecutionStart event owns the subsequent JobStart events.

- -

More Context:

- -

Recently, I implemented a connector for Azure Synapse (data warehouse on the Microsoft cloud) for the Spark integration and now with https://github.com/OpenLineage/OpenLineage/pull/490, I realize now that the SparkListenerSQLExecutionStart events carries with it the necessary inputs and outputs to tell the "real" lineage. The way the Synapse in Databricks works is:

- -

• SparkListenerSQLExecutionStart fires off an event with the end to end input and output (e.g. S3 as input and SQL table as output) -• SparkListenerJobStart events fire off that move content from one S3 location to a "staging" location controlled by Azure Synapse. OpenLineage records this event with INPUT S3 and output is a WASB "tempfolder" (which is a temporary locatio and not really useful for lineage since it will be destroyed at the end of the job) -• The final operation actually happens ALL in Synapse and OpenLineage does not fire off an event it seems. The Synapse database has a "COPY" command which moves the data from "tempfolder" in to the database. -• Finally a SparkListenerSQLExecutionEnd event happens and the query is complete. -Ideally, I could connect the SQLExecutionStart of SQLExecutionEnd with the SparkListenerJobStart so that I can get the JobStart properties. I see that ExecutionStart has an execution id and JobStart should have the same Execution Id BUT I think by the time I reach the ExecutionEND, all the JobStart events would have been removed from the HashMap that contains all of the events in OpenLineage.

- -

Any guidance on how to reach a JobStart properties from an ExecutionStart or ExecutionEnd would be greatly appreciated!

-
- - - - - - - -
-
Comments
- 7 -
- - - - - - - - - - -
- - - -
- 🤔 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-22 09:02:48
-
-

*Thread Reply:* I think this scenario only happens when spark job spawns another "sub-job", right?

- -

I think that maybe you can check sparkContext.getLocalProperty("spark.sql.execution.id")

- -

> I see that ExecutionStart has an execution id and JobStart should have the same Execution Id BUT I think by the time I reach the ExecutionEND, all the JobStart events would have been removed from the HashMap that contains all of the events in OpenLineage. -But pairwise, those starts and ends should at least have the same runId as they were created with same OpenLineageContext, right?

- -

Anyway, what @Michael Collado wrote on the issue is true: https://github.com/OpenLineage/OpenLineage/pull/490#issuecomment-1042011803 - you should not assume that we hold all the metadata somewhere in memory during whole execution of the run. The backend should be able to take care of it.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-22 10:53:09
-
-

*Thread Reply:* @Maciej Obuchowski - I was hoping they'd have the same run id as well but they do not 😞

- -

But that is the expectation? A SparkSQLExecutionStart and JobStart SHOULD have the same execution ID, right?

- -

I will take a look at sparkContext.getLocalProperty. Thank you so much for the reply Maciej!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-22 10:57:24
-
-

*Thread Reply:* SparkSQLExecutionStart and SparkSQLExecutionEnd should have the same runId, as well as JobStart and JobEnd events. Beyond those it can get wild. For example, some jobs don't emit JobStart/JobEnd events. Some jobs, like Delta emit multiple, that aren't easily tied to SQL event.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-23 03:48:38
-
-

*Thread Reply:* Okay, I dug into the Databricks Synapse Connector and it does the following:

- -
  1. SparkSQLExecutionStart with execution id of 8 happens (so gets runid of abc123). It contains the real inputs and outputs that we want.
  2. The Synapse connector starts executing JDBC commands. These commands prepare the synapse database to connect with data that Spark will land in a staging area in the cloud. (I don't know how it' executing arbitrary commands before the official job start begins 😞 )
  3. SparkJobStart beings with execution id of 9 happens (so it gets runid of jkl456). This contains the inputs and an output to a temp folder (NOT the real output we want but a staging location) -a. There are four JobIds 0 - 3, all of which point back to execution id 9 with the same physical plan. -b. After job1, it runs more JDBC commands. -c. I think at Job2, it runs the actual Spark code to query and join my raw input data and land it in a cloud storage account "tempfolder"/ -d. After job3, it runs the final JDBC commands to actually move the data from "tempfolder/" to Synapse Db.
  4. Finally, the SparkSQLListenerEnd event occurs. -I can see this in the Spark UI as well.
  5. -
- -

Because the Databricks Synapse connector somehow adds these additional JobStarts WITHOUT referencing the original SparkSQLExeuctionStart execution ID, we have to rely on heuristics to connect the /tempfolder to the real downstream table that was already provided in the ExecutionStart event 😞

- -

I've attached the logs and a screenshot of what I'm seeing the Spark UI. If you had a chance to take a look, it's a bit verbose but I'd appreciate a second pair of eyes on my analysis. Hopefully I got something wrong 😅

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-02-23 07:19:01
-
-

*Thread Reply:* I think we've encountered the same stuff in Delta before 🙂

- -

https://github.com/OpenLineage/OpenLineage/issues/388#issuecomment-964401860

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-02-23 14:13:18
-
-

*Thread Reply:* @Will Johnson , am I reading your report correctly that the SparkListenerJobStart event is reported with a spark.sql.execution.id that differs from the execution id of the SparkSQLExecutionStart?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-02-23 14:18:04
-
-

*Thread Reply:* WILLJ: We're deep inside this thing and have an executionid |9| -😂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-02-23 21:56:48
-
-

*Thread Reply:* Hah @Michael Collado I see you found my method of debugging in Databricks 😅

- -

But you're exactly right, there's a SparkSQLExecutionStart event with execution id 8 and then a set of JobStart events all with execution id 9!

- -

I don't know enough about Spark internals on how you can just run arbitrary Scala code while making it look like a Spark Job but that's what it looks like. As if the SqlDwWriter somehow submits a new job without a ExecutionStart... maybe it's an RDD operation instead? This has given me another idea to add some more log.info statements to my jar 😅😬

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-02-28 14:00:23
-
-

One of our own will be talking OpenLineage, Airflow and Spark at the Subsurface Conference this week. Register to attend @Michael Collado’s session on March 3rd at 11:45. You can register and learn more here: https://www.dremio.com/subsurface/live/winter2022/

-
-
Dremio
- - - - - - - - - - - - - - - - - -
- -
- - - - - - - -
- - -
- 🎉 Willy Lulciuc, Maciej Obuchowski -
- -
- 🙌 Will Johnson, Ziyoiddin Yusupov, Julien Le Dem -
- -
- 👍 Ziyoiddin Yusupov -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-02-28 14:00:56
-
-

*Thread Reply:* You won’t want to miss this talk!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2022-02-28 15:06:43
-
-

I have a question about DataHub integration through OpenLineage standard. Is anyone working on it, or was it rather just an icon used in previous materials? We have build a openlineage API endpoint in our product and we were hoping OL will gain enough traction so it will be a native way to connect to variaty of data discovery/observability tools, such as datahub, amundzen, etc.

- -

Many thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-28 15:29:58
-
-

*Thread Reply:* hi Martin - when you talk about a DataHub integration, did you mean a method to collect information from DataHub? I don't see a current issue open for that, but I recommend you make one and to kick off the discussion around it.

- -

If you mean sending information to DataHub, that should already be possible if users pass a datahub api endpoint to the OPENLINEAGE_ENDPOINT variable

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2022-02-28 16:29:54
-
-

*Thread Reply:* Hi, thanks for a reply! I meant to emit Openlineage JSON structure to datahub.

- -

Could you be please more specific, possibly link an article how to find the endpoint on the datahub side? Many thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-28 17:15:31
-
-

*Thread Reply:* ooooh, sorry I misread - I thought you meant that datahub had built an endpoint. Your integration should emit openlineage events to an endpoint, but datahub would have to build that support into their product likely? I'm not sure how to go about it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-02-28 17:16:27
-
-

*Thread Reply:* I'd reach out to datahub, potentially?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2022-02-28 17:21:51
-
-

*Thread Reply:* i see. ok, will do!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-03-02 18:15:21
-
-

*Thread Reply:* It has been discussed in the past but I don’t think there is something yet. The Kafka transport PR that is in flight should facilitate this

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2022-03-02 18:33:45
-
-

*Thread Reply:* Thanks for the response! though dragging Kafka in just for data delivery bit is too much. I think the clearest way would be to push Datahub to make an API endpoint and parser for OL /lineage data structure.

- -

I see this is more political think that would require join effort of DataHub team and OpenLineage with a common goal.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-02-28 17:22:47
-
-

Is there a topic you think the community should discuss at the next OpenLineage TSC meeting? Reply or DM with your item, and we’ll add it to the agenda. Mark your calendars: the next TSC meeting is Wednesday, March 9 at 9 am PT on zoom.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-03-02 10:24:58
-
-

The next OpenLineage Technical Steering Committee meeting is Wednesday, March 9! Meetings are on the second Wednesday of each month from 9:00 to 10:00am PT. -Join us on Zoom: https://us02web.zoom.us/j/81831865546?pwd=RTladlNpc0FTTDlFcWRkM2JyazM4Zz09 -All are welcome. -Agenda: -• New committers -• Release overview (0.6.0) -• New process for blog posts -• Retrospective: Spark integration -Notes: https://tinyurl.com/openlineagetsc

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-03-02 14:29:33
-
-

FYI, there's a talk on OpenLineage at Subsurface live tomorrow - https://www.dremio.com/subsurface/live/winter2022/session/cross-platform-data-lineage-with-openlineage/

-
-
Dremio
- - - - - - -
-
Est. reading time
- 1 minute -
- - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski, John Thomas, Paweł Leszczyński, Francis McGregor-Macdonald -
- -
- 👍 Ziyoiddin Yusupov, Michael Robinson, Jac. -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-03-04 15:25:20
-
-

@channel The latest release (0.6.0) of OpenLineage is now available, featuring a new Dagster integration, updates to the Airflow and Java integrations, a generic facet for env properties, bug fixes, and more. For more info, visit https://github.com/OpenLineage/OpenLineage/releases/tag/0.6.0

- - - -
- 🙌 Conor Beverland, Dalin Kim, Ziyoiddin Yusupov, Luca Soato -
- -
- 👍 Julien Le Dem -
- -
- 👀 William Angel, Francis McGregor-Macdonald -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-07 14:06:19
-
-

Hello Guys,

- -

Where do I find an example of building a custom extractor? We have several custom airflow operators that I need to integrate

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-07 14:56:58
-
-

*Thread Reply:* Hi marco - we don't have documentation on that yet, but the Postgres extractor is a pretty good example of how they're implemented.

- -

all the included extractors are here: https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow/openlineage/airflow/extractors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-07 15:07:41
-
-

*Thread Reply:* Thanks. I can follow that to build my own. Also I am installing this environment right now in Airflow 2. It seems I need Marquez and openlinegae-aiflow library. It seems that by this example I can put my extractors in any path as long as it is referenced in the environment variable. Is that correct? -OPENLINEAGE_EXTRACTOR_&lt;operator&gt;=full.path.to.ExtractorClass -Also do I need anything else other than Marquez and openlineage_airflow?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-07 15:30:45
-
-

*Thread Reply:* Yes, as long as the extractors are in the python path.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-07 15:31:59
-
-

*Thread Reply:* I built one a little while ago for a custom operator, I'd be happy to share what I did. I put it in the same file as the operator class for convenience.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-07 15:32:51
-
-

*Thread Reply:* That will be great help. Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-08 20:38:27
-
-

*Thread Reply:* This is the one I wrote:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-08 20:39:30
-
-

*Thread Reply:* to make it work, I set this environment variable:

- -

OPENLINEAGE_EXTRACTOR_HttpToBigQueryOperator=http_to_bigquery.HttpToBigQueryExtractor

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-08 20:40:57
-
-

*Thread Reply:* the extractor starts at line 183, and the really important bits start at line 218

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-03-07 15:16:37
-
-

@channel At the next OpenLineage TSC meeting, we’ll be reminiscing about the Spark integration. If you’ve had a hand in OL support for Spark, please join and share! The meeting will start at 9 am PT on Wednesday this week. @Maciej Obuchowski @Oleksandr Dvornik @Willy Lulciuc @Michael Collado https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
- 👍 Ross Turk, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-07 18:44:26
-
-

Would Marquez create some lineage for operators that don't have a custom extractor built yet?

- - - -
- ✅ Fuming Shih -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-08 12:05:25
-
-

*Thread Reply:* You would see that job was run - but we couldn't extract dataset lineage from it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-08 12:05:49
-
-

*Thread Reply:* The good news is that we're working to solve this problem in general.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-08 12:15:52
-
-

*Thread Reply:* I see, so i definitively will need the custom extractor built. I just need to understand where to set the path to the extractor. I can build one by following the postgress extractor you have built.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-08 12:50:00
-
-

*Thread Reply:* That depends how you deploy Airflow. Our tests use environment in docker-compose: https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/tests/integration/tests/docker-compose-2.yml#L34

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-08 13:19:37
-
-

*Thread Reply:* Thanks for the example. I can show this to my infra support person for his reference.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-03-08 11:47:11
-
-

This month’s OpenLineage TSC community meeting is tomorrow at 9am PT! It’s not too late to add an item to the agenda. Reply here or msg me with yours. https://openlineage.slack.com/archives/C01CK9T7HKR/p1646234698326859

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-09 19:31:23
-
-

I am running the last command to install marquez in AWS -helm upgrade --install marquez . - --set marquez.db.host &lt;AWS-RDS-HOST&gt; - --set marquez.db.user &lt;AWS-RDS-USERNAME&gt; - --set marquez.db.password &lt;AWS-RDS-PASSWORD&gt; - --namespace marquez - --atomic - --wait -And I am receiving this error -Error: query: failed to query with labels: secrets is forbidden: User "xxx@xxx.xx" cannot list resource "secrets" in API group "" in the namespace "default"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-03-10 12:46:18
-
-

*Thread Reply:* Do you need to specify a namespace that is not « default »?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-09 19:31:48
-
-

Can anyone let me know what is happening? My DI guy said it is a chart issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-10 07:40:13
-
-

*Thread Reply:* @Kevin Mellott aren't you the chart wizard? Maybe you could help 🙂

- - - -
- 👀 Kevin Mellott -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 14:09:26
-
-

*Thread Reply:* Ok so I had to update a chart dependency

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 14:10:39
-
-

*Thread Reply:* Now I installed the service in amazon using this -helm install marquez . --dependency-update --set marquez.db.host=myhost --set marquez.db.user=myuser --set marquez.db.password=mypassword --namespace marquez --atomic --wait

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 14:11:31
-
-

*Thread Reply:* i can see marquez-web running and marquez as well as the database i set up manually

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 14:12:27
-
-

*Thread Reply:* however I can not fetch initial data when login into the endpoint

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-10 14:52:06
-
-

*Thread Reply:* 👋 @Marco Diaz happy to hear that the Helm install is completing without error! To help troubleshoot the error above, can you please let me know if this endpoint is available and working?

- -

http://localhost:5000/api/v1/namespaces

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 15:13:16
-
-

*Thread Reply:* i got this -{"namespaces":[{"name":"default","createdAt":"2022_03_10T18:05:55.780593Z","updatedAt":"2022-03-10T19:03:31.309713Z","ownerName":"anonymous","description":"The default global namespace for dataset, job, and run metadata not belonging to a user-specified namespace."}]}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 15:13:34
-
-

*Thread Reply:* i have to use the namespace marquez to redirect there -kubectl port-forward svc/marquez 5000:80 -n marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 15:13:48
-
-

*Thread Reply:* is there something i need to change in a config file?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 15:14:39
-
-

*Thread Reply:* also how would i change the "localhost" address to something that is accessible in amazon without the need to redirect?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 15:14:59
-
-

*Thread Reply:* Sorry for all the questions. I am not an infra guy and have had to do all this by myself

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-10 15:39:23
-
-

*Thread Reply:* No problem at all, I think there are a couple of things at play here. With the local setup, it appears that the web is attempting to access the API on the wrong port number (3000 instead of 5000). I’ll create an issue for that one so that we can fix it.

- -

As to the EKS installation (or any non-local install), this is where you would need to use what’s called an ingress controller to expose the services outside of the Kubernetes cluster. There are different flavors of these (NGINX is popular), and I believe that AWS EKS has some built-in capabilities that might help as well.

- -

https://www.eksworkshop.com/beginner/130_exposing-service/ingress/

-
-
Amazon EKS Workshop
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 15:40:50
-
-

*Thread Reply:* So how do i fix this issue?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-10 15:46:56
-
-

*Thread Reply:* If your goal is to deploy to AWS, then you would need to get the EKS ingress configured. It’s not a trivial task, but they do have a bit of a walkthrough at https://www.eksworkshop.com/beginner/130_exposing-service/.

- -

However, if you are just seeking to explore Marquez and try things out, then I would highly recommend the “Open in Gitpod” functionality at https://github.com/MarquezProject/marquez#try-it. That will perform a full deployment for you in a temporary environment very quickly.

-
-
Amazon EKS Workshop
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 16:02:05
-
-

*Thread Reply:* i need to use it in aws for a POC

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-10 19:15:08
-
-

*Thread Reply:* Is there a better guide on how to install and setup Marquez in AWS? -This guide is omitting many steps -https://marquezproject.github.io/marquez/running-on-aws.html

-
-
Marquez
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-10 12:35:37
-
-

We're trying to find best way to track upstream releases of projects we have integrations for, to support newer versions faster and with less bugs. If you have any opinions on this topic, please chime in here

- -

https://github.com/OpenLineage/OpenLineage/issues/602

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 13:34:30
-
-

@Kevin Mellott Hello Kevin I followed the tutorial you sent me and I have exposed my services. However I am still seeing the same errors (this comes from the api/namescape call) -{"namespaces":[{"name":"default","createdAt":"2022_03_10T18:05:55.780593Z","updatedAt":"2022-03-10T19:03:31.309713Z","ownerName":"anonymous","description":"The default global namespace for dataset, job, and run metadata not belonging to a user-specified namespace."}]}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 13:35:08
-
-

Is there something i need to change in the chart? I do not have access to the default namespace in kubernetes only marquez namescpace

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-11 13:56:27
-
-

@Marco Diaz that is actually a good response! This is the JSON returned back by the API to show some of the default Marquez data created by the install. Is there another error you are experiencing?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 13:59:28
-
-

*Thread Reply:* I still see this -https://files.slack.com/files-pri/T01CWUYP5AR-F036JKN77EW/image.png

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:00:09
-
-

*Thread Reply:* I created my own database and changed the values for host, user and password inside the chart.yml

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-11 14:00:23
-
-

*Thread Reply:* Does it show that within the AWS deployment? It looks to show localhost in your screenshot.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-11 14:00:52
-
-

*Thread Reply:* Or are you working through the local deploy right now?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:01:57
-
-

*Thread Reply:* It shows the same using the exposed service

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:02:09
-
-

*Thread Reply:* i just didnt do another screenshot

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:02:27
-
-

*Thread Reply:* Could it be communication with the DB?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-11 14:04:37
-
-

*Thread Reply:* What do you see if you view the network traffic within your web browser (right click -> Inspect -> Network). Specifically, wondering what the response code from the Marquez API URL looks like.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:14:48
-
-

*Thread Reply:* i see this error -Error occured while trying to proxy to: <a href="http://xxxxxxxxxxxxxxxxxxxxxxxxx.us-east-1.elb.amazonaws.com/api/v1/namespaces">xxxxxxxxxxxxxxxxxxxxxxxxx.us-east-1.elb.amazonaws.com/api/v1/namespaces</a>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:16:00
-
-

*Thread Reply:* it seems to be trying to use the same address to access the api endpoint

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:16:26
-
-

*Thread Reply:* however the api service is in a different endpoint

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:18:24
-
-

*Thread Reply:* The API resides here -<a href="http://Xxxxxxxxxxxxxxxxxxxxxx-2064419849.us-east-1.elb.amazonaws.com">Xxxxxxxxxxxxxxxxxxxxxx-2064419849.us-east-1.elb.amazonaws.com</a>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:19:13
-
-

*Thread Reply:* The web service resides here -<a href="http://xxxxxxxxxxxxxxxxxxxxxxxxxxx-335729662.us-east-1.elb.amazonaws.com">xxxxxxxxxxxxxxxxxxxxxxxxxxx-335729662.us-east-1.elb.amazonaws.com</a>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:19:25
-
-

*Thread Reply:* do they both need to be under the same LB?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:19:56
-
-

*Thread Reply:* How would i do that is they install as separate services?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-11 14:27:15
-
-

*Thread Reply:* You are correct, both the website and API are expecting to be exposed on the same ALB. This will give you a single URL that can reach your Kubernetes cluster, and then the ALB will allow you to configure Ingress rules to route the traffic based on the request.

- -

Here is an example from one of the AWS repos - in the ingress resource you can see the single rule setup to point traffic to a given service.

- -

https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/main/docs/examples/2048/2048_full.yaml

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-11 14:36:40
-
-

*Thread Reply:* Thanks for the help. Now I know what the issue is

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-11 14:51:34
-
-

*Thread Reply:* Great to hear!!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-03-16 00:55:36
-
-

👋 Hi everyone! Our company is looking to adopt data lineage tool, so i have few queries on open lineage, so 1. Is this completey free.

- -
  1. What are tha database it supports?
  2. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-16 10:29:06
-
-

*Thread Reply:* Hi! Yes, OpenLineage is free. It is an open source standard for collection, and it provides the agents that integrate with pipeline tools to capture lineage metadata. You also need a metadata server, and there is an open source one called Marquez that you can use.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-16 10:29:15
-
-

*Thread Reply:* It supports the databases listed here: https://openlineage.io/integration

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-03-16 08:27:20
-
-

and when i run the ./docker/up.sh --seed i got the result from java code(sample example) But how to get the same thing in python example?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-16 10:29:53
-
-

*Thread Reply:* Not sure I understand - are you looking for example code in Python that shows how to make OpenLineage calls?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-03-16 12:45:14
-
-

*Thread Reply:* yup

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-03-16 13:10:04
-
-

*Thread Reply:* how to run

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-16 23:08:31
-
-

*Thread Reply:* this is a good post for getting started with Marquez: https://openlineage.io/blog/explore-lineage-api/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-16 23:08:51
-
-

*Thread Reply:* once you have run ./docker/up.sh, you should be able to run through that and see how the system runs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-16 23:09:45
-
-

*Thread Reply:* There is a python client you can find here: https://github.com/OpenLineage/OpenLineage/tree/main/client/python

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-03-17 00:05:58
-
-

*Thread Reply:* Thank you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-19 00:00:32
-
-

*Thread Reply:* You are welcome 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-19 09:28:50
-
-

*Thread Reply:* Hey @Ross Turk, (and potentially @Maciej Obuchowski) - what are the plans for OL Python client? I'd like to use it, but without a pip package it's not really project-friendly.

- -

Is there any work in that direction, is the current client code considered mature and just needs re-packaging, or is it just a thought sketch and some serious work is needed?

- -

I'm trying to avoid re-inventing the wheel, so if there's already something in motion, I'd rather support than start (badly) from scratch?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-19 09:32:17
-
-

*Thread Reply:* What do you mean without pip-package?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-19 09:32:18
- -
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-19 09:35:08
-
-

*Thread Reply:* It's still developed, for example next release will have pluggable backends - like Kafka -https://github.com/OpenLineage/OpenLineage/pull/530

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-19 09:40:11
-
-

*Thread Reply:* My apologies Maciej! -In my defense - looking for "open lineage" on pypi doesn't show this in the first 20 results. Still, should have checked setup.py. My bad, and thank you for the pointer!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-19 10:00:49
-
-

*Thread Reply:* We might need to add some keywords to setup.py - right now we have only "openlineage" there 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-20 08:12:29
-
-

*Thread Reply:* My mistake was that I was expecting a separate repo for the clients. But now I'm playing around with the package and trying to figure out the OL concepts. Thank you for your contribution, it's much nicer to experiment from ipynb than curl 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-03-16 12:00:01
-
-

@Julien Le Dem and @Willy Lulciuc will be at Data Council Austin next week talking OpenLineage and Airflow https://www.datacouncil.ai/talks/data-lineage-with-apache-airflow-using-openlineage?hsLang=en

-
-
datacouncil.ai
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-03-16 12:50:20
-
-

I couldn't figure out for the sample lineage flow (etldelivery7_days) when we ran the seed command after from which file its fetching data

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-16 14:35:14
-
-

*Thread Reply:* the seed data is being inserted by this command here: https://github.com/MarquezProject/marquez/blob/main/api/src/main/java/marquez/cli/SeedCommand.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-03-17 00:06:53
-
-

*Thread Reply:* Got it, but if i changed the code in this java file lets say i added another job here satisfying the syntax its not appearing in the lineage flow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-22 18:18:22
-
-

@Kevin Mellott Hello Kevin, sorry to bother you again. I was finally able to configure Marquez in AWS using an ALB. Now I am receiving this error when calling the API

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-22 18:18:32
-
-

Is this an issue accessing the database?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-22 18:19:15
-
-

I created the database and host manually and passed the parameters using helm --set

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-22 18:19:33
-
-

Do the database services need to be exposed too through the ALB?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-23 10:20:47
-
-

*Thread Reply:* I’m not too familiar with the 504 error in ALB, but found a guide with troubleshooting steps. If this is an issue with connectivity to the Postgres database, then you should be able to see errors within the marquez pod in EKS (kubectl logs <marquez pod name>) to confirm.

- -

I know that EKS needs to have connectivity established to the Postgres database, even in the case of RDS, so that could be the culprit.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-23 16:09:09
-
-

*Thread Reply:* @Kevin Mellott This is the error I am seeing in the logs -[HPM] Proxy created: /api/v1 -&gt; <http://localhost:5000/> -App listening on port 3000! -[HPM] Error occurred while trying to proxy request /api/v1/namespaces from <a href="http://marquez-interface-test.di.rbx.com">marquez-interface-test.di.rbx.com</a> to <http://localhost:5000/> (ECONNREFUSED) (<https://nodejs.org/api/errors.html#errors_common_system_errors>)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-23 16:22:13
-
-

*Thread Reply:* It looks like the website is attempting to find the API on localhost. I believe this can be resolved by setting the following Helm chart value within your deployment.

- -

marquez.hostname=marquez-interface-test.di.rbx.com

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kevin Mellott - (kevin.r.mellott@gmail.com) -
-
2022-03-23 16:22:54
-
-

*Thread Reply:* assuming that is the DNS used by the website

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-23 16:48:53
-
-

*Thread Reply:* thanks, that did it. I have a question regarding the database

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-23 16:50:01
-
-

*Thread Reply:* I made my own database manually. Do the marquez tables should be created automatically when install marquez?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-23 16:56:10
-
-

*Thread Reply:* Also could you put both the API and interface on the same port (3000)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-23 17:21:58
-
-

*Thread Reply:* Seems I am still having the forwarding issue -[HPM] Proxy created: /api/v1 -&gt; <http://marquez-interface-test.di.rbx.com:5000/> -App listening on port 3000! -[HPM] Error occurred while trying to proxy request /api/v1/namespaces from <a href="http://marquez-interface-test.di.rbx.com">marquez-interface-test.di.rbx.com</a> to <http://marquez-interface-test.di.rbx.com:5000/> (ECONNRESET) (<https://nodejs.org/api/errors.html#errors_common_system_errors>)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-03-23 09:08:14
-
-

Guidance on How / When a Spark SQL Execution event Controls JobStart Events?

- -

@Maciej Obuchowski and @Paweł Leszczyński and @Michael Collado I'd really appreciate your thoughts on how / when JobStart events are triggered for a given execution. I've ran into two situations now where a SQLExecutionStart event fires with execution id X and then JobStart events fire with execution id Y.

- -

• Spark 2 Delta SaveIntoDataSourceCommand on Databricks - I see it has a SparkSQLExecutionStart event but only on Spark 3 does it have JobStart events with the SaveIntoDataSourceCommand and the same execution id. -• Databricks Synapse Connector - A SparkSQLExecutionStart event occurs but then the job starts are different execution ids. -Is there any guidance / books / videos that dive deeper into how these events are triggered?

- -

We need the JobStart event with the same execution id so that we can get some environment properties stored in the job start event.

- -

Thanks you so much for any guidance!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-23 09:25:18
-
-

*Thread Reply:* It's always Delta, isn't it?

- -

When I originally worked on Delta support I tried to find answer on Delta slack and got an answer:

- -

Hi Maciej, the main reason is that Delta will run queries on metadata to figure out what files should be read for a particular version of a Delta table and that's why you might see multiple jobs. In general Delta treats metadata as data and leverages Spark to handle them to make it scalable.

- - - -
- 🤣 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-23 09:25:48
-
-

*Thread Reply:* I haven't touched how it works in Spark 2 - wanted to make it work with Spark 3's new catalogs, so can't help you there.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-03-23 09:46:14
-
-

*Thread Reply:* Argh!! It's always Databricks doing something 🙄

- -

Thanks, Maciej!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-03-23 09:51:59
-
-

*Thread Reply:* One last question for you, @Maciej Obuchowski, any thoughts on how I could identify WHY a particular JobStart event fired? Is it just stepping through every event? Was that your approach to getting Spark3 Delta working? Thank you so much for the insights!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-23 09:58:08
-
-

*Thread Reply:* Before that, we were using just JobStart/JobEnd events and I couldn't find events that correspond to logical plan that has anything to do with what job was actually doing. I just found out that SQLExecution events have what I want, so I just started using them and stopped worrying about Projection or Aggregate, or other events that don't really matter here - and that's how filtering idea was born: https://github.com/OpenLineage/OpenLineage/issues/423

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-23 09:59:37
-
-

*Thread Reply:* Are you trying to get environment info from those events, or do you actually get Job event with proper logical plans like SaveIntoDataSourceCommand?

- -

Might be worth to just post here all the events + logical plans that are generated for particular job, as I've done in that issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-23 09:59:40
-
-

*Thread Reply:* scala&gt; spark.sql("CREATE TABLE tbl USING delta AS SELECT ** FROM tmp") -21/11/09 19:01:46 WARN SparkSQLExecutionContext: SparkListenerSQLExecutionStart - executionId: 3 -21/11/09 19:01:46 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.CreateTableAsSelect -21/11/09 19:01:46 WARN SparkSQLExecutionContext: SparkListenerSQLExecutionStart - executionId: 4 -21/11/09 19:01:46 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.LocalRelation -21/11/09 19:01:46 WARN SparkSQLExecutionContext: SparkListenerJobStart - executionId: 4 -21/11/09 19:01:46 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.LocalRelation -21/11/09 19:01:47 WARN SparkSQLExecutionContext: SparkListenerJobEnd - executionId: 4 -21/11/09 19:01:47 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.LocalRelation -21/11/09 19:01:47 WARN SparkSQLExecutionContext: SparkListenerSQLExecutionEnd - executionId: 4 -21/11/09 19:01:47 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.LocalRelation -21/11/09 19:01:48 WARN SparkSQLExecutionContext: SparkListenerSQLExecutionStart - executionId: 5 -21/11/09 19:01:48 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.Aggregate -21/11/09 19:01:48 WARN SparkSQLExecutionContext: SparkListenerJobStart - executionId: 5 -21/11/09 19:01:48 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.Aggregate -21/11/09 19:01:49 WARN SparkSQLExecutionContext: SparkListenerJobEnd - executionId: 5 -21/11/09 19:01:49 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.Aggregate -21/11/09 19:01:49 WARN SparkSQLExecutionContext: SparkListenerSQLExecutionEnd - executionId: 5 -21/11/09 19:01:49 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.Aggregate -21/11/09 19:01:49 WARN SparkSQLExecutionContext: SparkListenerSQLExecutionEnd - executionId: 3 -21/11/09 19:01:49 WARN SparkSQLExecutionContext: org.apache.spark.sql.catalyst.plans.logical.CreateTableAsSelect

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-03-23 11:41:37
-
-

*Thread Reply:* The JobStart event contains a Properties field and that contains a bunch of fields we want to extract to get more precise lineage information within Databricks.

- -

As far as we know, the SQLExecutionStart event does not have any way to get these properties :(

- -

https://github.com/OpenLineage/OpenLineage/blob/21b039b78bdcb5fb2e6c2489c4de840ebb[…]ark/agent/facets/builder/DatabricksEnvironmentFacetBuilder.java

- -

As a result, I do have to care about the subsequent JobStart events coming from a given ExecutionStart 😢

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-03-23 11:42:33
-
-

*Thread Reply:* I started down this path with the Project statement but I agree with @Michael Collado that a ProjectVisitor isn't a great idea.

- -

https://github.com/OpenLineage/OpenLineage/issues/617

-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-24 09:43:38
-
-

Hey. I'm working on replacing current SQL parser - on which we rely for Postgres, Snowflake, Great Expectations - and I'd appreciate your opinion.

- -

https://github.com/OpenLineage/OpenLineage/pull/627/files

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-25 19:30:29
-
-

Am i supposed to see this when I open marquez fro the first time on an empty database?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-25 20:33:02
-
-

*Thread Reply:* Marquez and OpenLineage are job-focused lineage tools, so once you run a job in an OL-integrated instance of Airflow (or any other supported integration), you should see the jobs and DBs appear in the marquez ui

- - - -
- 👍 Marco Diaz -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-25 21:44:54
-
-

*Thread Reply:* If you want to seed it with some data, just to try it out, you can run docker/up.sh -s and it will run a seeding job as it starts.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-25 19:31:09
-
-

Would datasets be created when I send data from airflow?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-03-31 18:34:40
-
-

*Thread Reply:* Yep! Marquez will register all in/out datasets present in the OL event as well as link them to the run

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-03-31 18:35:47
-
-

*Thread Reply:* FYI, @Peter Hicks is working on displaying the dataset version to run relationship in the web UI, see https://github.com/MarquezProject/marquez/pull/1929

-
- - - - - - - -
-
Labels
- feature, review, web, javascript -
- -
-
Comments
- 1 -
- - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-28 14:31:32
-
-

How is Datakin used in conjunction with Openlineage and Marquez?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-28 15:43:46
-
-

*Thread Reply:* Hi Marco,

- -

Datakin is a reporting tool built on the Marquez API, and therefore designed to take in Lineage using the OpenLineage specification.

- -

Did you have a more specific question?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-28 15:47:53
-
-

*Thread Reply:* No, that is it. Got it. So, i can install Datakin and still use openlineage and marquez?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-28 15:55:07
-
-

*Thread Reply:* if you set up a datakin account, you'll have to change the environment variables used by your OpenLineage integrations, and the runEvents will be sent to Datakin rather than Marquez. You shouldn't have any loss of functionality, and you also won't have to keep manually hosting Marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-28 16:10:25
-
-

*Thread Reply:* Will I still be able to use facets for backfills?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-28 17:04:03
-
-

*Thread Reply:* yeah it works in the same way - Datakin actually submodules the Marquez API

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-28 16:52:41
-
-

Another question. I installed the open-lineage library and now I am trying to configure Airflow 2 to use it -Do I follow these steps?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-28 16:53:20
-
-

If I have marquez access via alb ingress what would i use the marquezurl variable or openlineageurl?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-28 16:54:53
-
-

So, i don't need to modify my dags in Airflow 2 to use the library? Would this just allow me to start collecting data? -openlineage.lineage_backend.OpenLineageBackend

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-29 06:24:21
-
-

*Thread Reply:* Yes, you don't need to modify dags in Airflow 2.1+

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-29 17:47:39
-
-

*Thread Reply:* ok, I added that environment variable. Now my question is how do i configure my other variables. -I have marquez running in AWS with an ingress. -Do i use OpenLineageURL or Marquez_URL?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-29 17:48:09
-
-

*Thread Reply:* Also would a new namespace be created if i add the variable?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
data_fool - (data.fool.me@gmail.com) -
-
2022-03-29 02:12:30
-
-

Hello! Are there any plans for openlineage to support dbt on trino?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-30 14:59:13
-
-

*Thread Reply:* Hi Datafool - I'm not familiar with how trino works, but the DBT-OL integration works by wrapping the dbt run command with dtb-ol run , and capturing lineage data from the runresult file

- -

These things don't necessarily preclude you from using OpenLineage on trino, so it may work already.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
data_fool - (data.fool.me@gmail.com) -
-
2022-03-30 18:34:38
-
-

*Thread Reply:* hey @John Thomas yep, tried to use dbt-ol run command but it seems trino is not supported, only bigquery, redshift and few others.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-03-30 18:36:41
-
-

*Thread Reply:* aaah I misunderstood what Trino is - yeah we don't currently support jobs that are running outside of those environments.

- -

We don't currently have plans for this, but a great first step would be opening an issue in the OpenLineage repo.

- -

If you're interested in implementing the support yourself I'm also happy to connect you to people that can help you get started.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
data_fool - (data.fool.me@gmail.com) -
-
2022-03-30 20:23:46
-
-

*Thread Reply:* oh okay, got it, yes I can contribute, I'll see if I can get some time in the next few weeks. Thanks @John Thomas

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2022-03-30 16:08:39
-
-

I can see 2 articles using Spline with BMW and Capital One. Could OpenLineage be doing the same job as Spline here? What would the differences be? -Are there any similar references for OpenLineage? I can see Northwestern Mutual but that article does not contain a lot of detail.

-
-
SpringerLink
- - - - - - - - - - - - - - - - - -
-
-
Capital One
- - - - - - - - - - - - - - - - - -
-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 12:47:59
-
-

Could anyone help me wit this custom extractor. I am not sure what I am doing wrong. I added the variable to airflow2, but I still see this in the logs -[2022-03-31, 16:43:39 UTC] {__init__.py:97} WARNING - Unable to find an extractor. task_type=QueryOperator -Here is the code

- -

```import logging -from typing import Optional, List -from openlineage.airflow.extractors.base import BaseExtractor,TaskMetadata -from openlineage.client.facet import SqlJobFacet, ExternalQueryRunFacet -from openlineage.common.sql import SqlMeta, SqlParser

- -

logger = logging.getLogger(name)

- -

class QueryOperatorExtractor(BaseExtractor):

- -
def __init__(self, operator):
-    super().__init__(operator)
-
-@classmethod
-def get_operator_classnames(cls) -&gt; List[str]:
-    return ['QueryOperator']
-
-def extract(self) -&gt; Optional[TaskMetadata]:
-    # (1) Parse sql statement to obtain input / output tables.
-    sql_meta: SqlMeta = SqlParser.parse(self.operator.hql)
-    inputs = sql_meta.in_tables
-    outputs = sql_meta.out_tables
-    task_name = f"{self.operator.dag_id}.{self.operator.task_id}"
-    run_facets = {}
-    job_facets = {
-        'hql': SqlJobFacet(self.operator.hql)
-    }
-
-    return TaskMetadata(
-        name=task_name,
-        inputs=[inputs.to_openlineage_dataset()],
-        outputs=[outputs.to_openlineage_dataset()],
-        run_facets=run_facets,
-        job_facets=job_facets
-    )```
-
- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Orbit - -
-
2022-03-31 13:20:55
-
-

@Orbit has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Orbit - -
-
2022-03-31 13:21:23
-
-

@Orbit has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 14:07:24
-
-

@Ross Turk Could you please take a look if you have a minute☝️? I know you have built one extractor before

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:11:35
-
-

*Thread Reply:* Hmmmm. Are you running in Docker? Is it possible for you to shell into your scheduler container and make sure the ENV is properly set?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:11:57
-
-

*Thread Reply:* looks to me like the value you posted is correct, and return ['QueryOperator'] seems right to me

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 14:33:00
-
-

*Thread Reply:* It is in an EKS cluster -I checked and the variable is there -OPENLINEAGE_EXTRACTOR_QUERYOPERATOR=shared.plugins.ol_custom_extractors.QueryOperatorExtractor

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 14:33:56
-
-

*Thread Reply:* I am wondering if it is an issue with my extractor code. Something not rendering well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:40:17
-
-

*Thread Reply:* I don’t think it’s even executing your extractor code. The error message traces back to here: -https://github.com/OpenLineage/OpenLineage/blob/249868fa9b97d218ee35c4a198bcdf231a9b874b/integration/airflow/openlineage/lineage_backend/__init__.py#L77

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:40:45
-
-

*Thread Reply:* I am currently digging into _get_extractor to see where it might be missing yours 🤔

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 14:46:36
-
-

*Thread Reply:* Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:47:19
-
-

*Thread Reply:* silly idea, but you could add a log message to __init__ in your extractor.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:47:25
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/blob/249868fa9b97d218ee35c4a198bcdf231a[…]ntegration/airflow/openlineage/airflow/extractors/extractors.py

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:48:20
-
-

*Thread Reply:* the openlineage client actually tries to import the value of that env variable from pos 22. if that happens, but for some reason it fails to register the extractor, we can at least know that it’s importing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:48:54
-
-

*Thread Reply:* if you add a log line, you can verify that your PYTHONPATH and env are correct

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 14:49:23
-
-

*Thread Reply:* will try that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 14:49:29
-
-

*Thread Reply:* and let you know

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-03-31 14:49:39
-
-

*Thread Reply:* ok!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-31 15:04:05
-
-

*Thread Reply:* @Marco Diaz can you try env variable OPENLINEAGE_EXTRACTOR_QueryOperator instead of full caps?

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 15:13:37
-
-

*Thread Reply:* Will try that too

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 15:13:44
-
-

*Thread Reply:* Thanks for helping

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 16:58:24
-
-

*Thread Reply:* @Maciej Obuchowski My setup does not allow me to submit environment variables with lowercases. Is the name of the variable used to register the extractor?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-31 17:15:57
-
-

*Thread Reply:* yes, it's case sensitive...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 17:18:42
-
-

*Thread Reply:* i see

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 17:39:16
-
-

*Thread Reply:* So it is definitively the name of the variable. I changed the name of the operator to capitals and now is being registered

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-03-31 17:39:44
-
-

*Thread Reply:* Could there be a way not to make this case sensitive?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-03-31 18:31:27
-
-

*Thread Reply:* yes - could you create issue on OpenLineage repository?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-01 10:46:59
-
-

*Thread Reply:* sure

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-01 10:48:28
-
-

I have another question. I have this query -INSERT OVERWRITE TABLE schema.daily_play_sessions_v2 - PARTITION (ds = '2022-03-30') - SELECT - platform_id, - universe_id, - pii_userid, - NULL as session_id, - NULL as session_start_ts, - COUNT(1) AS session_cnt, - SUM( - UNIX_TIMESTAMP(stopped) - UNIX_TIMESTAMP(joined) - ) AS time_spent_sec - FROM schema.fct_play_sessions_merged - WHERE ds = '2022-03-30' - AND UNIX_TIMESTAMP(stopped) - UNIX_TIMESTAMP(joined) BETWEEN 0 AND 28800 - GROUP BY - platform_id, - universe_id, - pii_userid -And I am seeing the following inputs -[DbTableName(None,'schema','fct_play_sessions_merged','schema.fct_play_sessions_merged')] -But the outputs are empty -Shouldn't this be an output table -schema.daily_play_sessions_v2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:25:52
-
-

*Thread Reply:* Yes, it should. This line is the likely culprit: -https://github.com/OpenLineage/OpenLineage/blob/431251d25f03302991905df2dc24357823d9c9c3/integration/common/openlineage/common/sql/parser.py#L30

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:26:25
-
-

*Thread Reply:* I bet if that said ['INTO','OVERWRITE'] it would work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:27:23
-
-

*Thread Reply:* @Maciej Obuchowski do you agree? should OVERWRITE be a token we look for? if so, I can submit a short PR.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-01 13:30:36
-
-

*Thread Reply:* we have a better solution

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-01 13:30:37
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/644

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:31:27
-
-

*Thread Reply:* ah! I heard there was a new SQL parser, but did not know it was imminent!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-01 13:31:30
-
-

*Thread Reply:* I've added this case as a test and it works: https://github.com/OpenLineage/OpenLineage/blob/764dfdb885112cd0840ebc7384ff958bf20d4a70/integration/sql/tests/tests_insert.rs

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Ross Turk, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:31:33
-
-

*Thread Reply:* let me review this PR

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-01 13:36:32
-
-

*Thread Reply:* Do i have to download a new version of the opelineage-airflow python library

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-01 13:36:41
-
-

*Thread Reply:* If so which version?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:37:22
-
-

*Thread Reply:* this PR isn’t merged yet 😞 so if you wanted to try this you’d have to build the python client from the sql/rust-parser-impl branch

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-01 13:38:17
-
-

*Thread Reply:* ok, np. I am not in a hurry yet. Do you have an ETA for the merge?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:39:50
-
-

*Thread Reply:* Hard to say, it’s currently in-review. Let me pull some strings, see if I can get eyes on it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-01 13:40:34
-
-

*Thread Reply:* I will check again next week don't worry. I still need to make some things in my extractor work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:40:36
-
-

*Thread Reply:* after it’s merged, we’ll have to do an OpenLineage release as well - perhaps next week?

- - - -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:40:41
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 12:25:48
-
-

Hi everyone, I just started using openlineage to connect with DBT for my company. I work as data engineering. After the connection and run test on dbt-ol run, it gives me this error. I have looked up online to find the answer but couldn't see the answer anywhere. Can somebody please help me with? The error tells me that the correct version is DBT Schemajson version 2 instead of 3. I don't know where to change the schemajson version. Thank you everyone @channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:34:10
-
-

*Thread Reply:* Hm - what version of dbt are you using?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:47:50
-
-

*Thread Reply:* @Tien Nguyen The dbt schema version changes with different versions of dbt. If you have recently updated, you may have to make some changes: https://docs.getdbt.com/docs/guides/migration-guide/upgrading-to-v1.0

-
-
docs.getdbt.com
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 13:48:27
-
-

*Thread Reply:* also make sure you are on the latest version of openlineage-dbt - I believe we have made it a bit more tolerant of dbt schema changes.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 13:52:46
-
-

*Thread Reply:* @Ross Turk Thank you very much for your answer. I will update those and see if I can resolve the issues.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 14:20:00
-
-

*Thread Reply:* @Ross Turk Thank you very much for your help. The latest version of dbt couldn't work. But version 0.20.0 works for this problem.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 14:22:42
-
-

*Thread Reply:* Hmm. Interesting, I remember when dbt 1.0 came out we fixed a very similar issue: https://github.com/OpenLineage/OpenLineage/pull/397

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 14:25:17
-
-

*Thread Reply:* if you run pip3 list | grep openlineage-dbt, what version does it show?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 14:26:26
-
-

*Thread Reply:* I wonder if you have somehow ended up with an older version of the integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 14:33:43
-
-

*Thread Reply:* it is 0.1.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 14:34:23
-
-

*Thread Reply:* is it 0.1.0 the older version of openlineage ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 14:43:14
-
-

*Thread Reply:* ❯ pip3 list | grep openlineage-dbt -openlineage-dbt 0.6.2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 14:43:26
-
-

*Thread Reply:* the latest is 0.6.2 - that might be your issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 14:43:59
-
-

*Thread Reply:* How are you going about installing it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 18:35:26
-
-

*Thread Reply:* @Ross Turk. I follow instruction from open lineage "pip3 install openlineage-dbt"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-01 18:36:00
-
-

*Thread Reply:* Hm! Interesting. I did the same thing to get 0.6.2.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 18:51:36
-
-

*Thread Reply:* @Ross Turk Yes. I have tried to reinstall and clear cache but it still install 0.1.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-01 18:53:07
-
-

*Thread Reply:* But thanks for the version. I reinstall 0.6.2 version by specify the version

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-02 17:37:59
-
-

@Ross Turk @Maciej Obuchowski FYI the sql parser also seems not to return any inputs or outpus for queries that have subqueries -Example -INSERT OVERWRITE TABLE mytable - PARTITION (ds = '2022-03-31') - SELECT - ** - FROM - (SELECT ** FROM table2) a -INSERT OVERWRITE TABLE mytable - PARTITION (ds = '2022-03-31') - SELECT - ** - FROM - (SELECT ** FROM table2 - UNION - SELECT ** FROM table3 - UNION ALL - SELECT ** FROM table4) a

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-03 15:07:09
-
-

*Thread Reply:* they'll work with new parser - added test for those

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-03 15:07:39
-
-

*Thread Reply:* btw, thank you very much for notifying us about multiple bugs @Marco Diaz!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-03 15:20:55
-
-

*Thread Reply:* @Maciej Obuchowski thank you for making sure these cases are taken into account. I am getting more familiar with the Open lineage code as i build my extractors. If I see anything else I will let you know. Any ETA on the new parser release date?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-03 15:55:28
-
-

*Thread Reply:* it should be week-two, unless anything comes up

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-03 17:10:02
-
-

*Thread Reply:* I see. Keeping my fingers crossed this is the only thing delaying me right now.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-02 20:27:37
-
-

Also what would happen if someone uses a CTE in the SQL? Is the parser taken those cases in consideration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-03 15:02:13
-
-

*Thread Reply:* current one handles cases where you have one CTE (like this test) but not multiple - next one will handle arbitrary number of CTEs (like this test)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-04 10:54:47
-
-

Agenda items are requested for the next OpenLineage Technical Steering Committee meeting on Wednesday, April 13. Please reply here or ping me with your items!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-04 11:11:53
-
-

*Thread Reply:* I've mentioned it before but I want to talk a bit about new SQL parser

- - - -
- 🙌 Will Johnson, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-04 13:25:17
-
-

*Thread Reply:* Will the parser be released after the 13?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-08 11:47:05
-
-

*Thread Reply:* @Michael Robinson added additional item to Agenda - client transports feature that we'll have in next release

- - - -
- 🙌 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-08 12:56:44
-
-

*Thread Reply:* Thanks, Maciej

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sukanya Patra - (Sukanya_Patra@mckinsey.com) -
-
2022-04-05 02:39:59
-
-

Hi Everyone,

- -

I have come across OpenLineage at Data Council Austin, 2022 and am curious to try it out. I have reviewed the Getting Started section (https://openlineage.io/getting-started/) of OpenLineage docs but couldn't find clear reference documentation for using the API -• Are there any swagger API docs or equivalent dedicated for OpenLineage API? There is some reference docs of Marquez API: https://marquezproject.github.io/marquez/openapi.html#tag/Lineage -Secondly are there any means to use Open Lineage independent of Marquez? Any pointers would be appreciated.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Patrick Mol - (patrick.mol@prolin.com) -
-
2022-04-05 10:28:08
-
-

*Thread Reply:* I had kind of the same question. -I found https://marquezproject.github.io/marquez/openapi.html#tag/Lineage -With some of the entries marked Deprecated, I am not sure how to proceed.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-05 11:55:35
-
-

*Thread Reply:* Hey folks, are you looking for the OpenAPI specification found here?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-05 15:33:23
-
-

*Thread Reply:* @Patrick Mol, Marquez's deprecated endpoints were the old methods for creating lineage (making jobs, dataset, and runs independently), they were deprecated because we moved over to using the OpenLineage spec for all lineage collection purposes.

- -

The GET methods for jobs/datasets/etc are still functional

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sarat Chandra - (saratchandra9494@gmail.com) -
-
2022-04-05 21:10:39
-
-

*Thread Reply:* Hey John,

- -

Thanks for sharing the OpenAPI docs. Was wondering if there are any means to setup OpenLineage API that will receive events without a consumer like Marquez or is it essential to always pair with a consumer to receive the events?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-05 21:47:13
-
-

*Thread Reply:* the OpenLineage integrations don’t have any way to recieve events, since they’re designed to send events to other apps - what were you expecting OpenLinege to do?

- -

Marquez is our reference implementation of an OpenLineage consumer, but egeria also has a functional endpoint

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Patrick Mol - (patrick.mol@prolin.com) -
-
2022-04-06 09:53:31
-
-

*Thread Reply:* Hi @John Thomas, -Would creation of Sources and Datasets have an equivalent in the OpenLineage specification ? -Sofar I only see the Inputs and Outputs in the Run Event spec.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-06 11:31:10
-
-

*Thread Reply:* Inputs and outputs in the OL spec are Datasets in the old MZ spec, so they're equivalent

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-05 14:24:50
-
-

Hey Guys,

- -

The BaseExtractor is working fine with operators that are derived from Airflow BaseOperator. However for operators derived from LivyOperator the BaseExtractor does not seem to work. Is there a fix for this? We use livyoperator to run sparkjobs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-05 15:16:34
-
-

*Thread Reply:* Hi Marco - it looks like LivyOperator itself does derive from BaseOperator, have you seen any other errors around this problem?

- -

@Maciej Obuchowski might be more help here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-05 15:21:03
-
-

*Thread Reply:* It is the operators that inherit from LivyOperator. It doesn't find the parameters like sql, connection etc

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-05 15:25:42
-
-

*Thread Reply:* My guess is that operators that inherit from other operators (not baseoperator) will have the same problem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-05 15:32:13
-
-

*Thread Reply:* interesting! I'm not sure about that. I can look into it if I have time, but Maciej is definitely the person who would know the most.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-06 15:49:48
-
-

*Thread Reply:* @Marco Diaz I wonder - perhaps it would be better to instrument spark with OpenLineage. It doesn’t seem that Airflow will know much about what’s happening underneath here. Have you looked into openlineage-spark?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 15:51:57
-
-

*Thread Reply:* I have not tried that library yet. I need to see how it implement because we have several spark custom operators that use livy

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 15:52:59
-
-

*Thread Reply:* Do you have any examples?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-06 15:54:01
-
-

*Thread Reply:* there is a good blog post from @Michael Collado: https://openlineage.io/blog/openlineage-spark/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-06 15:54:37
-
-

*Thread Reply:* and the doc page here has a good overview: -https://openlineage.io/integration/apache-spark/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 16:38:15
-
-

*Thread Reply:* is this all we need to pass? -spark-submit --conf "spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener" \ - --packages "io.openlineage:openlineage_spark:0.2.+" \ - --conf "spark.openlineage.host=http://&lt;your_ol_endpoint&gt;" \ - --conf "spark.openlineage.namespace=my_job_namespace" \ - --class com.mycompany.MySparkApp my_application.jar

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 16:38:49
-
-

*Thread Reply:* If so, yes our operators have a way to pass configurations to spark and we may be able to implement it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-04-06 16:41:27
-
-

*Thread Reply:* Looks right to me

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 16:42:03
-
-

*Thread Reply:* Will give it a try

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 16:42:50
-
-

*Thread Reply:* Do we have to install the library on the spark side or the airflow side?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 16:42:58
-
-

*Thread Reply:* I assume is the spark side

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-04-06 16:44:25
-
-

*Thread Reply:* The —packages argument tells spark where to get the jar (you'll want to upgrade to 0.6.1)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-06 16:44:54
-
-

*Thread Reply:* sounds good

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-04-06 00:04:14
-
-

Hi, I saw there was some work done for integrating OpenLineage with Azure Purview - - - -

-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-06 04:54:27
-
-

*Thread Reply:* @Will Johnson

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-07 12:43:27
-
-

*Thread Reply:* Hey @Varun Singh! We are building a github repository that deploys a few resources that will support a limited number of Azure data sources being pushed into Azure Purview. You can expect a public release near the end of the month! Feel free to direct message me if you'd like more details!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-06 15:05:39
-
-

The next OpenLineage Technical Steering Committee meeting is Wednesday, April 13! Meetings are on the second Wednesday of each month from 9:00 to 10:00am PT. -Join us on Zoom: https://astronomer.zoom.us/j/87156607114?pwd=a3B0K210dnRaQmdkaFdGMytBREZEQT09 -All are welcome. -Agenda: -• OpenLineage 0.6.2 release overview -• Airflow integration update -• Dagster integration retrospective -• Open discussion -Notes: https://tinyurl.com/openlineagetsc

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
slackbot - -
-
2022-04-06 21:40:16
-
-

This message was deleted.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-07 01:00:43
-
-

*Thread Reply:* Are both airflow2 and Marquez installed locally on your computer?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorge Reyes (Zenta Group) - (jorge.reyes@zentagroup.com) -
-
2022-04-07 09:04:19
-
-

*Thread Reply:* yes Marco

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-07 15:00:18
-
-

*Thread Reply:* can you open marquez on -<http://localhost:3000>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-07 15:00:40
-
-

*Thread Reply:* and get a response from -<http://localhost:5000/api/v1/namespaces>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorge Reyes (Zenta Group) - (jorge.reyes@zentagroup.com) -
-
2022-04-07 15:26:41
-
-

*Thread Reply:* yes , i used this guide https://openlineage.io/getting-started and execute un post to marquez correctly

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-04-07 22:17:34
-
-

*Thread Reply:* In theory you should receive events in jobs under airflow namespace

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tien Nguyen - (tiennguyenhotel97@gmail.com) -
-
2022-04-07 14:18:05
-
-

Hi Everyone, Can someone please help me to debug this error ? Thank you very much all

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-07 14:59:06
-
-

*Thread Reply:* It looks like you need to add a payment method to your DBT account

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-04-11 12:46:41
-
-

Hello. Does Airflow's TaskFlow API work with OpenLineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-11 12:50:48
-
-

*Thread Reply:* It does, but admittedly not very well. It can't recognize what you're doing inside your tasks. The good news is that we're working on it and long term everything should work well.

- - - -
- 👍 Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-04-11 12:58:28
-
-

*Thread Reply:* Thanks for the quick reply Maciej.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 09:56:44
-
-

Hi all, watched few of your demos with airflow(astronomer) recently, really liked them. -Thanks for doing those

- -

Questions:

- -
  1. Are there plans to have a hive listener similar to the open-lineage spark integration ?
  2. If not will the sql parser work with the HiveQL ?
  3. Maybe one for presto too ?
  4. Will the run version and dataset version come out of the box or do we need to define some facets ?
  5. I read the blog on facets, is there a tutorial on how to create a sample facet ? -Background: -We have hive, spark jobs and big query tasks running from airflow in GCP Dataproc
  6. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 13:56:53
-
-

*Thread Reply:* Hi Sandeep,

- -

1&3: We don't currently have Hive or Presto on the roadmap! The best way to start the conversation around them would be to create a proposal in the OpenLineage repo, outlining your thoughts on implementation and benefits.

- -

2: I'm not familiar enough with HiveQL, but you can read about the new SQL parser we're implementing here

- -
  1. you can see the Standard Facets here - Dataset Version is included out of the box, but Run Version would have to be defined.

  2. the best place to start looking into making facets is the Spec doc here. We don't have a dedicated tutorial, but if you have more specific questions please feel free to reach out again on slack

  3. -
- - - -
- 👍 sandeep -
- -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 15:39:23
-
-

*Thread Reply:* Thank you John -The standard facets links to the github issues currently

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 15:40:33
- -
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 15:41:01
-
-

*Thread Reply:* Will check it out thank you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-12 10:37:58
-
-

Reminder: this month’s OpenLineage TSC meeting is tomorrow, 4/13, at 9 am PT! https://openlineage.slack.com/archives/C01CK9T7HKR/p1649271939878419

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 15:43:29
-
-

I setup the open-lineage spark integration for spark(dataproc) tasks from airflow. It’s able to post data to the marquez end point and I see the job information in Marquez UI.

- -

I don’t see any dataset information in it, I see just the jobs ? Is there some setup I need to do or something else I need to configure ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 16:08:30
-
-

*Thread Reply:* is there anything in your marquez-api logs that might indicate issues?

- -

What guide did you follow to setup the spark integration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 16:10:07
-
-

*Thread Reply:* Followed this guide https://openlineage.io/integration/apache-spark/ and used the spark-defaults.conf approach

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 16:11:04
-
-

*Thread Reply:* The logs from dataproc side show no errors, let me check from the marquez api side -To confirm, we should be able to see the datasets from the marquez UI with the spark integration right ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 16:11:50
-
-

*Thread Reply:* I'm not super familiar with the spark integration, since I work more with airflow - I'd start with looking through the readme for the spark integration here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 16:14:44
-
-

*Thread Reply:* Hmm, the readme says it aims to generate the input and output datasets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-12 16:40:38
-
-

*Thread Reply:* Are you looking at the same namespace?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 16:40:51
-
-

*Thread Reply:* Yes, the same one where I can see the job

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 16:54:49
-
-

*Thread Reply:* Tailing the API logs and rerunning the spark job now to hopefully catch errors if any, will ping back here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 17:01:10
-
-

*Thread Reply:* Don’t see any failures in the logs, any suggestions on how to debug this ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 17:08:24
-
-

*Thread Reply:* I'd next set up a basic spark notebook and see if you can't get it to send dataset information on something simple in order to check if it's a setup issue or a problem with your spark job specifically

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 17:14:43
-
-

*Thread Reply:* ok, that sounds good, will try that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 17:16:06
-
-

*Thread Reply:* before that, I see that spark-lineage integration posts lineage to the api -https://marquezproject.github.io/marquez/openapi.html#tag/Lineage/paths/~1lineage/post -We don’t seem to add a DataSet in this, does marquez internally create this “dataset” based on Output and fields ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 17:16:34
-
-

*Thread Reply:* yeah, you should be seeing "input" and "output" in the runEvents - that's where datasets come from

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 17:17:00
-
-

*Thread Reply:* I'm not sure if it's a problem with your specific spark job or with the integration itself, however

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 17:19:16
-
-

*Thread Reply:* By runEvents, do you mean a job Object or lineage Object ? -The integration seems to be only POSTing lineage objects

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-12 17:20:34
-
-

*Thread Reply:* yep, a runEvent is body that gets POSTed to the /lineage endpoint:

- -

https://openlineage.io/docs/openapi/

- - - -
- 👍 sandeep -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-12 17:41:01
-
-

*Thread Reply:* > Yes, the same one where I can see the job -I think you should look at other namespace, which name depends on what systems you're actually using

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 17:48:24
-
-

*Thread Reply:* Shouldn’t the dataset would be created in the same namespace we define in the spark properties?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-15 10:19:06
-
-

*Thread Reply:* I found few datasets in the table location, I ran it in a similar (hive metastore, gcs, sparksql and scala spark jobs) setup to the one mentioned in this post https://openlineage.slack.com/archives/C01CK9T7HKR/p1649967405659519

-
- - -
- - - } - - Will Johnson - (https://openlineage.slack.com/team/U02H4FF5M36) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-12 15:49:46
-
-

Is this the correct place for this Q or should I reach out to Marquez slack ? -I followed this post https://openlineage.io/integration/apache-spark/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-14 16:16:45
-
-

Before I create an issue around it, maybe I'm just not seeing it in Databricks. In the Spark Integration, does OpenLineage report Hive Metastore tables or it ONLY reports the file path?

- -

For example, if I have a Hive table called default.myTable stored at LOCATION /usr/hive/warehouse/default/mytable.

- -

For a query that reads a CSV file and inserts into default.myTable, would I see an output of default.myTable or /usr/hive/warehoues/default/mytable?

- -

We want to include a link between the physical path and the hive metastore table but it seems that OpenLineage (at least on Databricks) only reports the physical path with the table name showing up in the catalog but not as a facet.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sandeep - (sandeepgame07@gmail.com) -
-
2022-04-15 10:17:55
-
-

*Thread Reply:* This was my experience as well, I was under the impression we would see the table as a dataset. -Looking forward to understanding the expected behavior

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-15 10:39:34
-
-

*Thread Reply:* relevant: https://github.com/OpenLineage/OpenLineage/issues/435

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-15 12:36:08
-
-

*Thread Reply:* Ah! Thank you both for confirming this! And it's great to see the proposal, Maciej!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-06-10 12:37:41
-
-

*Thread Reply:* Is there a timeline around when we can expect this fix ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-10 12:46:47
-
-

*Thread Reply:* Not a simple fix, but I guess we'll start working on this relatively soon.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-06-10 13:10:31
-
-

*Thread Reply:* I see, thanks for the update ! We are very much interested in this feature.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-15 15:42:22
-
-

@channel A significant number of us have a conflict with the current TSC meeting day/time, so, unfortunately, we need to reschedule the meeting. When you have a moment, please share your availability here: https://doodle.com/meeting/participate/id/ejRnMlPe. Thanks in advance for your input!

-
-
doodle.com
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Arturo - (ggrmos@gmail.com) -
-
2022-04-19 13:35:23
-
-

Hello everyone, I'm learning Openlineage, I finally achieved the connection between Airflow 2+ and Openlineage+Marquez. The issue is that I don't see nothing on Marquez. Do I need to modify current airflow operators?

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-19 13:40:54
-
-

*Thread Reply:* You probably need to change dataset from default

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Arturo - (ggrmos@gmail.com) -
-
2022-04-19 13:47:04
-
-

*Thread Reply:* I click it on everything 😕 I manually (joining to the pod and send curl to the marquez local endpoint) created a namespaces to check if there is a network issue I was ok, I created a namespaces called: data-dev . The airflow is mounted over k8s using helm chart. -``` config: - AIRFLOWWEBSERVERBASEURL: "http://airflow.dev.test.io" - PYTHONPATH: "/opt/airflow/dags/repo/config" - AIRFLOWAPIAUTHBACKEND: "airflow.api.auth.backend.basicauth" - AIRFLOWCOREPLUGINSFOLDER: "/opt/airflow/dags/repo/plugins" - AIRFLOWLINEAGEBACKEND: "openlineage.lineage_backend.OpenLineageBackend"

- -

. -. -. -.

- -

extraEnv: - - name: OPENLINEAGEURL - value: http://marquez-dev.data-dev.svc.cluster.local - - name: OPENLINEAGENAMESPACE - value: data-dev```

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-19 15:16:47
-
-

*Thread Reply:* I think answer is somewhere in airflow logs 🙂 -For some reason, OpenLineage events aren't send to Marquez.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Arturo - (ggrmos@gmail.com) -
-
2022-04-20 11:08:09
-
-

*Thread Reply:* Thanks, finally was my error .. I created a dummy dag to see if maybe it's an issue over the dag and now I can see something over Marquez

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-20 08:15:32
-
-

One really novice question - there doesn't seem to be a way of deleting lineage elements (any of them)? While I can imagine that in production system we want to keep history, it's not practical while testing/developing. I'm using throw-away namespaces to step around the issue. Is there a better way, or alternatively - did I miss an API somewhere?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-20 08:20:35
-
-

*Thread Reply:* That's more of a Marquez question 🙂 -We have a long-standing issue to add that API https://github.com/MarquezProject/marquez/issues/1736

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-20 09:32:19
-
-

*Thread Reply:* I see it already got skipped for 2 releases, and my only conclusion is that people using Marquez don't make mistakes - ergo, API not needed 🙂 Lets see if I can stick around the project long enough to offer a bit of help, now I just need to showcase it and get interest in my org.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dan Mahoney - (dan.mahoney@sphericalanalytics.io) -
-
2022-04-20 10:08:33
-
-

Good day all. I’m trying out the openlineage-dagster plugin -• I’ve got dagit, dagster-daemon and marquez running locally -• The openlineagesensor is recognized in dagit and the daemon. -But, when I run a job, I see the following message in the daemon’s shell: -Sensor openlineage_sensor skipped: Last cursor: {"last_storage_id": 9, "running_pipelines": {"97e2efdf-9499-4ffd-8528-d7fea5b9362c": {"running_steps": {}, "repository_name": "hello_cereal_repository"}}} -I’ve attached my repos.py and serialjob.py. -Any thoughts?

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
David - (drobin1437@gmail.com) -
-
2022-04-20 10:40:03
-
-

Hi All, -I am walking through the curl examples on this page and have a question on the first curl example: -https://openlineage.io/getting-started/ -The curl command completes, and I can see the input file and job in the namespace, but the lineage graph does not show the input file connected as an input to the job. This only seems to happen after the job is marked complete.

- -

Is there a way to have a running job show connections to its input files in the lineage? -Thanks!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-20 18:06:29
-
-

Hi Team, we are using spark as a service, and we are planning to integrate open lineage spark listener and looking at the below params that we need to pass, we don't know the name of the spark cluster, is the spark.openlineage.namespace conf param mandatory? -spark-submit --conf "spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener" \ - --packages "io.openlineage:openlineage_spark:0.2.+" \ - --conf "spark.openlineage.host=http://&lt;your_ol_endpoint&gt;" \ - --conf "spark.openlineage.namespace=my_job_namespace" \ - --class com.mycompany.MySparkApp my_application.jar

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-20 18:11:19
-
-

*Thread Reply:* Namespace is defined by you, it does not have to be name of the spark cluster.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-20 18:11:42
-
-

*Thread Reply:* And I definitely recommend to use newer version than 0.2.+ 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-20 18:13:32
-
-

*Thread Reply:* oh i see that someone mentioned that it has to be replaced with name of the spark clsuter

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-20 18:13:57
-
-

*Thread Reply:* https://openlineage.slack.com/archives/C01CK9T7HKR/p1634089656188400?thread_ts=1634085740.187700&cid=C01CK9T7HKR

-
- - -
- - - } - - Julien Le Dem - (https://openlineage.slack.com/team/U01DCLP0GU9) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-20 18:19:19
-
-

*Thread Reply:* @Maciej Obuchowski may i know if i can add the --packages "io.openlineage:openlineage_spark:0.2.+" as part of the spark jar file, that meant as part of the pom.xml

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-21 03:54:25
-
-

*Thread Reply:* I think it needs to run on the driver

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-21 05:53:34
-
-

Hello, -when looking through Marquez API it seems that most individual-element creation APIs are marked as deprecated and are going to be removed by 0.25, with a point of switching to open lineage. That makes POST to /api/v1/lineage the only creation point of elements, but OpenLineage API is very limited in attributes that can be passed.

- -

Is that intended to stay that way? One practical question/example: how do we create a job of type STREAMING, when OL API only allows to pass name, namespace and facets. Do we now move all properties into facets?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-21 07:16:44
-
-

*Thread Reply:* > OpenLineage API is very limited in attributes that can be passed. -Can you specify where do you think it's limited? The way to solve that problems would be to evolve OpenLineage.

- -

> One practical question/example: how do we create a job of type STREAMING, -So, here I think the question is more how streaming jobs differ from batch jobs. One obvious difference is that output of the job is continuous (in practice, probably "microbatched" or commited on checkpoint). However, deprecated Marquez API didn't give us tools to properly indicate that. On the contrary, OpenLineage with different event types allows us to properly do that. -> Do we now move all properties into facets? -Basically, yes. Marquez should handle specific facets. For example, https://github.com/MarquezProject/marquez/pull/1847

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-21 07:23:11
-
-

*Thread Reply:* Hey Maciej

- -

first off - thanks for being active on the channel!

- -

> So, here I think the question is more how streaming jobs differ from batch jobs -Not really. I just gave an example of how would you express a specific job type creation which can be done with https://marquezproject.github.io/marquez/openapi.html#tag/Jobs/paths/~1namespaces~1{namespace}~1jobs~1{job}/put|/api/v1/namespaces/.../jobs/... , by passing the type field which is required. In the call to /api/v1/lineage the job field offers just to specify (namespace, name), but no other attributes.

- -

> However, deprecated Marquez API didn't give us tools to properly indicate that. On the contrary, OpenLineage with different event types allows us to properly do that. -I have the feeling I'm still missing some key concepts on how OpenLineage is designed. I think I went over the API and documentation, but trying to use just OpenLineage failed to reproduce mildly complex chain-of-job scenarios, and when I took a look how Marquez seed demo is doing it - it was heavily based on deprecated API. So, I'm kinda lost on how to use OpenLineage.

- -

I'm looking forward to some open-public meeting, as I don't think asking these long questions on chat really works. 😞 -Any pointers are welcome!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-21 07:53:59
-
-

*Thread Reply:* > I just gave an example of how would you express a specific job type creation -Yes, but you're trying to achieve something by passing this parameter or creating a job in a certain way. We're trying to cover everything in OpenLineage API. Even if we don't have everything, the spec from the beginning is focused to allow emitting custom data by custom facet mechanism.

- -

> I have the feeling I'm still missing some key concepts on how OpenLineage is designed. -This talk by @Julien Le Dem is a great place to start: https://www.youtube.com/watch?v=HEJFCQLwdtk

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-21 11:29:20
-
-

*Thread Reply:* > Any pointers are welcome! -BTW: OpenLineage is an open standard. Everyone is welcome to contribute and discuss. Every feedback ultimately helps us build better systems.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-22 03:32:48
-
-

*Thread Reply:* I agree, but for now I'm more likely to be in the I didn't get it category, and not in the brilliant new idea category 🙂

- -

My temporary goal is to go over the documentation and to write the gaps that confused me (and the solutions) and maybe publish that as an article for wider audience. So far I realized that: -• I don't get the naming convention - it became clearer that it's important with the Naming examples, but more info is needed -• I mis-interpret the namespaces. I was placing datasources and jobs in the same namespace which caused a lot of issues until I started using different ones. Not sure why... So now I'm interpreting namespaces=source as suggested by the naming convention -• JSON schema actually clarified things a lot, but that's not the most reader-friendly of resources, so surely there should be a better one -• I was questioning whether to move away from Marquez completely and go with DataHub, but for my scenario Marquez (with limitations outstanding) is still most suitable -• Marquez for some reason does not tolerate the datetimes if they're missing the 'T' delimiter in the ISO, which caused a lot of trial-and-error because the message is just "JSON parsing failed" -• Marquez doesn't give you (at least by default) meaningful OpenLineage parsing errors, so running examples against it is a very slow learning process

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Karatuğ Ozan BİRCAN - (karatugo@gmail.com) -
-
2022-04-21 10:20:55
-
-

Hi everyone,

- -

I'm running the Spark Listener on Databricks. It works fine for the event emit part for a basic Databricks SQL Create Table query. Nevertheless, it throws a NullPointerException exception after sending lineage successfully.

- -

I tried to debug a bit. Looks like it's thrown at the line: -QueryExecution queryExecution = SQLExecution.getQueryExecution(executionId); -So, does this mean that the listener can't get the query exec from Spark SQL execution?

- -

Please see the logs in the thread. Thanks.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Karatuğ Ozan BİRCAN - (karatugo@gmail.com) -
-
2022-04-21 10:21:33
-
-

*Thread Reply:* Driver logs from Databricks:

- -

```22/04/21 14:05:07 INFO EventEmitter: Lineage completed successfully: ResponseMessage(responseCode=200, body={}, error=null) {"eventType":"COMPLETE",[...], "schemaURL":"https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunEvent"}

- -

22/04/21 14:05:07 ERROR AsyncEventQueue: Listener OpenLineageSparkListener threw an exception -java.lang.NullPointerException - at io.openlineage.spark.agent.lifecycle.ContextFactory.createSparkSQLExecutionContext(ContextFactory.java:43) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$getSparkSQLExecutionContext$8(OpenLineageSparkListener.java:221) - at java.util.HashMap.computeIfAbsent(HashMap.java:1127) - at java.util.Collections$SynchronizedMap.computeIfAbsent(Collections.java:2674) - at io.openlineage.spark.agent.OpenLineageSparkListener.getSparkSQLExecutionContext(OpenLineageSparkListener.java:220) - at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:143) - at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:135) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:102) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:119) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:103) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1588) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-21 11:32:37
-
-

*Thread Reply:* @Karatuğ Ozan BİRCAN are you running on Spark 3.2? If yes, then new release should have fixed your problem: https://github.com/OpenLineage/OpenLineage/issues/609

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Karatuğ Ozan BİRCAN - (karatugo@gmail.com) -
-
2022-04-21 11:33:15
-
-

*Thread Reply:* Spark 3.1.2 with Scala 2.12

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Karatuğ Ozan BİRCAN - (karatugo@gmail.com) -
-
2022-04-21 11:33:50
-
-

*Thread Reply:* In fact, I couldn't make it work in Spark 3.2. But I'll test it again. Thanks for the info.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vinith Krishnan US - (vinithk@nvidia.com) -
-
2022-05-20 16:15:47
-
-

*Thread Reply:* Has this been resolved? -I am facing the same issue with spark 3.2.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ben - (ben@meridian.sh) -
-
2022-04-21 11:51:33
-
-

Does anyone have thoughts on the difference between the sourceCode and sql job facets - and whether we’d expect to ever see both on a particular job?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-21 15:34:24
-
-

*Thread Reply:* I don't think that the facets are particularly strongly defined, but I would expect that it could be possible to see both on a pythonOperator that's executing SQL queries, depending on how the extractor was written

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ben - (ben@meridian.sh) -
-
2022-04-21 15:34:45
-
-

*Thread Reply:* ah sure, that makes sense

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Xiaoyong Zhu - (xiaoyzhu@outlook.com) -
-
2022-04-21 15:14:03
-
-

Just get to know open lineage and it's really a great project! One question for the granularity on Spark + Openlineage - is it possible to track column level lineage (rather than the table lineage that's currently there)? Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-21 16:17:59
-
-

*Thread Reply:* We're actively working on it - expect it in next OpenLineage release. https://github.com/OpenLineage/OpenLineage/pull/645

-
- - - - - - - -
-
Labels
- enhancement, integration/spark -
- -
-
Milestone
- <a href="https://github.com/OpenLineage/OpenLineage/milestone/4">0.8.0</a> -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Xiaoyong Zhu - (xiaoyzhu@outlook.com) -
-
2022-04-21 16:24:16
-
-

*Thread Reply:* nice -thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Xiaoyong Zhu - (xiaoyzhu@outlook.com) -
-
2022-04-21 16:25:19
-
-

*Thread Reply:* Assuming we don't need to do anything except using the next update? Or do you expect that we need to change quite a lot of configs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-04-21 17:44:46
-
-

*Thread Reply:* No, it should be automatic.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-24 14:37:33
-
-

Hey, Team - We are starting to get requests for other, non Microsoft data sources (e.g. Teradata) for the Spark Integration. We (I) don't have a lot of bandwidth to fill every request but I DO want to help these people new to OpenLineage get started.

- -

Has anyone on the team written up a blog post about extending open lineage or is this an area that we could collaborate on for the OpenLineage blog? Alternatively, is it a bad idea to write this down since the internals have changed a few times over the past six months?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-04-25 03:52:20
-
-

*Thread Reply:* Hey Will,

- -

while I would not consider myself in the team, I'm dabbling in OL, hitting walls and learning as I go. If I don't have enough experience to contribute, I'd be happy to at least proof-read and point out things which are not clear from a novice perspective. Let me know!

- - - -
- 👍 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-25 13:49:48
-
-

*Thread Reply:* I'll hold you to that @Mirko Raca 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-25 17:18:02
-
-

*Thread Reply:* I will support! I’ve done a few recent presentations on the internals of OpenLineage that might also be useful - maybe some diagrams can be reused.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-25 17:56:44
-
-

*Thread Reply:* Any chance you have links to those old presentations? Would be great to build off of an existing one and then update for some of the new naming conventions.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-25 18:00:26
-
-

*Thread Reply:* the most recent one was an astronomer webinar

- -

happy to share the slides with you if you want 👍 here’s a PDF:

- -
- - - - - - - -
- - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-25 18:00:44
-
-

*Thread Reply:* the other ones have not been public, unfortunately 😕

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-25 18:02:24
-
-

*Thread Reply:* architecture, object model, run lifecycle, naming conventions == the basics IMO

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-26 09:14:42
-
-

*Thread Reply:* Thank you so much, Ross! This is a great base to work from.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-26 14:49:04
-
-

Your periodical reminder that Github stars are one of those trivial things that make a significant difference for an OS project like ours. Have you starred us yet?

- -
- - - - - - - -
- - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-26 15:02:10
-
-

Hi All, I have a simple spark job from converting csv to parquet and I am using https://openlineage.io/integration/apache-spark/ to generate lineage events and posting to maquez but I see that both events (START & COMPLETE) have the same event except eventType, i thought we should see outputsarray in the complete event right?

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-27 00:36:05
-
-

*Thread Reply:* For a spark job like that, you'd have at least four events:

- -
  1. START event - This represents the SparkSQLExecutionStart
  2. START event #2 - This represents a JobStart event
  3. COMPLET event - This represents a JobEnd event
  4. COMPLETE event #2 - This represents a SparkSQLExectionEnd event -For CSV to Parquet, you should be seeing inputs and outputs that match across each event. OpenLineage scans the logical plan and reports back the inputs / outputs / metadata across the different facets for each event BECAUSE each event might give you some different information.
  5. -
- -

For example, the JobStart event might give you access to properties that weren't there before. The JobEnd event might give you information about how many rows were written.

- -

Marquez / OpenLineage expects that you collect all of the resulting events and then aggregate the results.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-27 21:51:07
-
-

*Thread Reply:* Hi @Will Johnson good evening. We are seeing an issue while using spark integaration and found that when we provide openlinegae.host property a value like <http://lineage.com/common/marquez> where my marquez api is running I see that the below line is modifying the host to become <http://lineage.com/api/v1/lineage> instead of <http://lineage.com/common/marquez/api/v1/lineage> which is causing the problem -https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/src/main/common/java/io/openlineage/spark/agent/EventEmitter.java#L49 -I see that it has been added 5 months ago and released it as part of 0.4.0, is there anyway that we can fix the line to be like below -this.lineageURI = - new URI( - hostURI.getScheme(), - hostURI.getAuthority(), - hostURI.getPath() + uriPath, - queryParams, - null);

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-04-28 14:31:42
-
-

*Thread Reply:* Can you open up a Github issue for this? I had this same issue and so our implementation always has to feature the /api/v1/lineage. The host config is literally the host. You're specifying a host and path. I'd be happy to see greater flexibility with the api endpoint but the /v1/ is important to know which version of OpenLineage's specification you're communicating with.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Arturo - (ggrmos@gmail.com) -
-
2022-04-27 14:12:38
-
-

Hi all, guys ... anyone have an example of a custom extractor with different source-destination, I'm trying to build an extractor from a custom operator like mysql_to_s3

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-27 15:10:24
-
-

*Thread Reply:* @Michael Collado made one for a recent webinar:

- -

https://gist.github.com/collado-mike/d1854958b7b1672f5a494933f80b8b58

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-04-27 15:11:38
-
-

*Thread Reply:* it's not exactly for an operator that has source-destination, but it shows how to format lineage events for a few different kinds of datasets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Arturo - (ggrmos@gmail.com) -
-
2022-04-27 15:51:32
-
-

*Thread Reply:* Thanks! I'm going to take a look

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-27 23:04:18
-
-

A release has been requested by @Howard Yoo and @Ross Turk pending the merging of PR 644. Are there any +1s?

- - - -
- 👍 Julien Le Dem, Maciej Obuchowski, Ross Turk, Conor Beverland -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-28 17:44:00
-
-

*Thread Reply:* Thanks for your input. The release is authorized. Look for it tomorrow!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-28 14:29:13
-
-

Hi All, We are seeing the below exception when we integrate the openlineage-spark into our spark job, can anyone share pointers -Exception uncaught: java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.SerializationConfig.hasExplicitTimeZone()Z at openlineage.jackson.datatype.jsr310.ser.InstantSerializerBase.formatValue(InstantSerializerBase.java:144) at openlineage.jackson.datatype.jsr310.ser.InstantSerializerBase.serialize(InstantSerializerBase.java:103) at openlineage.jackson.datatype.jsr310.ser.ZonedDateTimeSerializer.serialize(ZonedDateTimeSerializer.java:79) at openlineage.jackson.datatype.jsr310.ser.ZonedDateTimeSerializer.serialize(ZonedDateTimeSerializer.java:13) at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:727) at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:719) at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:155) at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480) at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:319) at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3906) at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3220) at io.openlineage.spark.agent.client.OpenLineageClient.executeAsync(OpenLineageClient.java:123) at io.openlineage.spark.agent.client.OpenLineageClient.executeSync(OpenLineageClient.java:85) at <a href="http://io.openlineage.spark.agent.client.OpenLineageClient.post">io.openlineage.spark.agent.client.OpenLineageClient.post</a>(OpenLineageClient.java:80) at <a href="http://io.openlineage.spark.agent.client.OpenLineageClient.post">io.openlineage.spark.agent.client.OpenLineageClient.post</a>(OpenLineageClient.java:75) at <a href="http://io.openlineage.spark.agent.client.OpenLineageClient.post">io.openlineage.spark.agent.client.OpenLineageClient.post</a>(OpenLineageClient.java:70) at io.openlineage.spark.agent.EventEmitter.emit(EventEmitter.java:67) at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:69) at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:90) at java.util.Optional.ifPresent(Optional.java:159) at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:90) at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:81) at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:80) at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91) at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92) at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92) at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87) at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302) at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-28 14:41:10
-
-

*Thread Reply:* What's the spark job that's running - this looks similar to an error that can happen when jobs have a very short lifecycle

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-04-28 14:47:27
-
-

*Thread Reply:* nothing in spark job, its just a simple csv to parquet conversion file

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-04-28 14:48:50
-
-

*Thread Reply:* ah yeah that's probably it - when the job is finished before the Openlineage integration can poll it for information this error is thrown. Since the job is very quick it creates a race condition

- - - -
- :gratitude_thank_you: raghanag -
- -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-05-03 17:16:39
-
-

*Thread Reply:* @John Thomas may i know how to solve this kind of issue?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-03 17:20:11
-
-

*Thread Reply:* This is probably an issue with the integration - for now you can either open an issue, or see if you're still getting a subset of events and take it as is. I'm not sure what you could do on your end aside from adding a sleep call or similar

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-05-03 17:21:17
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/src/main/common/java/io/openlineage/spark/agent/OpenLineageSparkListener.java#L151 you meant if we add a sleep in this method this will solve this

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-03 18:44:43
-
-

*Thread Reply:* oh no I meant making sure your jobs don't close too quickly

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-05-06 00:14:15
-
-

*Thread Reply:* Hi @John Thomas we figured out the error that it is indeed causing with conflicted versions and with shadowJar and shading, we are not seeing it anymore.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-29 18:40:41
-
-

@channel The latest release (0.8.1) of OpenLineage is now available, featuring a new TaskInstance listener API for Airflow 2.3+, an HTTP client in the openlineage-java library for emitting run events, support for HiveTableRelation as an input source in the Spark integration, a new SQL parser used by multiple integrations, and bug fixes. For more info, visit https://github.com/OpenLineage/OpenLineage/releases/tag/0.8.1

- - - -
- 🚀 Willy Lulciuc, John Thomas, Minkyu Park, Ross Turk, Marco Diaz, Conor Beverland, Kevin Mellott, Howard Yoo, Peter Hicks, Maciej Obuchowski, Mario Measic -
- -
- 🙌 Francis McGregor-Macdonald, Ross Turk, Marco Diaz, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-04-29 18:41:37
-
-

*Thread Reply:* Amazing work on the new sql parser @Maciej Obuchowski 💯 :firstplacemedal:

- - - -
- 👍 Ross Turk, Howard Yoo, Peter Hicks -
- -
- 🙌 Ross Turk, Howard Yoo, Peter Hicks, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-04-30 07:54:48
-
-

The May meeting of the TSC will be postponed because most of the TSC will be attending the Astronomer Spring Summit the week of May 9th. Details to follow along with a new meeting day/time for the meeting going forward (thanks to all who responded to the poll!).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hubert Dulay - (hubert.dulay@gmail.com) -
-
2022-05-01 09:25:23
-
-

Are there examples of using openlineage with streaming data pipelines? Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-05-03 04:12:09
-
-

*Thread Reply:* Hi @Hubert Dulay,

- -

while I'm not an expert, I can offer the following: -• Marquez has had the but what I got here - that API is not encouraged -• I personally don't find the run->job metaphor to work nicely with streaming transformation, but I'm using that in my current setup (until someone points me in a better direction 😉 ) -• I register each change of the stream processing as a new "run", which ends immediately - so duration information is lost, but current set of parameters is recorded. It's not pretty, I know. -Maybe stream processing is a scenario to be re-evaluated in OL meetings, or at least clarified?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hubert Dulay - (hubert.dulay@gmail.com) -
-
2022-05-03 21:19:06
-
-

*Thread Reply:* Thanks for the details

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 09:32:23
-
-

Hey OL! My company is in the process of migrating off of Palantir and into Databricks/Azure. There are a couple of business units not wanting to budge due to the built-in data lineage and code reference features Palantir has. I am tasked with researching an alternative data lineage solution and I quickly came across OL. I love what I have read and seen demos of so far and want to do a POC for my org of its capabilities. I was able to set up the Marquez server on a VM and get it talking to Databricks. I also have the iniit script installed on the cluster and I can see from the log4j logs it’s communicating fine (I think). However, I am embarrassed to admit I can’t figure out how the instrumentation works for the databricks notebooks. I ran a simple notebook that loads data, runs a simple transform, and saves the output somewhere but I don’t see any entries in my namespace I configured. I am sure I missed something very obvious somewhere, but are there examples of how to get a simple example into Marquez from databricks? Thanks so much for any guidance you can give!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-02 13:26:52
-
-

*Thread Reply:* Hi Kostikey - this blog has an example with Spark and jupyter, which might be a good place to start!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 14:58:29
-
-

*Thread Reply:* Hi @John Thomas, thanks for the reply. I think I am close but my cluster is unable to talk to the marquez server. After looking at log4j I see the following rows:

- -

22/05/02 18:43:39 INFO SparkContext: Registered listener io.openlineage.spark.agent.OpenLineageSparkListener -22/05/02 18:43:40 INFO EventEmitter: Init OpenLineageContext: Args: ArgumentParser(host=<http://135.170.226.91:8400>, version=v1, namespace=gus-namespace, jobName=default, parentRunId=null, apiKey=Optional.empty, urlParams=Optional[{}]) URI: <http://135.170.226.91:8400/api/v1/lineage>? -22/05/02 18:46:21 ERROR EventEmitter: Could not emit lineage [responseCode=0]: {"eventType":"START","eventTime":"2022-05-02T18:44:08.36Z","run":{"runId":"91fd4e13-52ac-4175-8956-c06d7dee97fc","facets":{"spark_version":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.8.1/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","spark-version":"3.2.1","openlineage_spark_version":"0.8.1"},"spark.logicalPlan":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.8.1/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","plan":[{"class":"org.apache.spark.sql.catalyst.plans.logical.ShowNamespaces","num-children":1,"namespace":0,"output":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"databaseName","dataType":"string","nullable":false,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":4,"jvmId":"eaa0543b_5e04_4f5b_844b_0e4598f019a7"},"qualifier":[]}]]},{"class":"org.apache.spark.sql.catalyst.analysis.ResolvedNamespace","num_children":0,"catalog":null,"namespace":[]}]},"spark_unknown":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.8.1/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","output":{"description":"Unable to serialize logical plan due to: Infinite recursion (StackOverflowError) ... - OpenLineageHttpException(code=0, message=java.lang.RuntimeException: java.util.concurrent.ExecutionException: openlineage.hc.client5.http.ConnectTimeoutException: Connect to <http://135.170.226.91:8400> [/135.170.226.91] failed: Connection timed out, details=java.util.concurrent.CompletionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: openlineage.hc.client5.http.ConnectTimeoutException: Connect to <http://135.170.226.91:8400> [/135.170.226.91] failed: Connection timed out) - at io.openlineage.spark.agent.EventEmitter.emit(EventEmitter.java:68) - at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:69) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:90) - at java.util.Optional.ifPresent(Optional.java:159) - at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:90) - at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:81) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:102) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:119) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:103) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1612) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) -the connection timeout is surprising because I can connect just fine using the example curl code from the same cluster:

- -

%sh -curl -X POST <http://135.170.226.91:8400/api/v1/lineage> \ - -H 'Content-Type: application/json' \ - -d '{ - "eventType": "START", - "eventTime": "2020-12-28T19:52:00.001+10:00", - "run": { - "runId": "d46e465b-d358-4d32-83d4-df660ff614dd" - }, - "job": { - "namespace": "gus2~-namespace", - "name": "my-job" - }, - "inputs": [{ - "namespace": "gus2-namespace", - "name": "gus-input" - }], - "producer": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client>" - }' -Spark config: -spark.openlineage.host <http://135.170.226.91:8400> -spark.openlineage.version v1 -spark.openlineage.namespace gus-namespace -Not sure what is going on, the EventEmitter init log looks like it's right but clearly something is off. Thanks so much for the help

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-02 15:03:40
-
-

*Thread Reply:* hmmm, interesting - if it's easy could you spin both up locally and check that it's just a communication issue? It helps with diagnosis

- -

It might also be a firewall issue, but your cURL should preclude that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 15:05:38
-
-

*Thread Reply:* Since it's Databricks I was having a hard time figuring out how to try locally. Other than just using plain 'ol spark on my laptop and a localhost Marquez...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-02 15:07:13
-
-

*Thread Reply:* hmm, that could be an interesting test to see if it's a databricks issue - the databricks integration is pretty much the same as the spark integration, just with a little bit of a wrapper and the init script

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 15:08:44
-
-

*Thread Reply:* yeah, i was going to try that but it just didnt seem like helpful troubleshooting for exactly that reason... but i may just do that anyways just so i can see something working 🙂 (morale booster)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-02 15:09:22
-
-

*Thread Reply:* oh totally! Network issues are a huge pain in the ass, and if you're still seeing issues locally with spark/mz then we'll know a lot more than we do now 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 15:11:19
-
-

*Thread Reply:* sounds good, i will give it a go!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-05-02 15:16:16
-
-

*Thread Reply:* @Kostikey Mustakas - I think spark.openlineage.version should be equal to 1 not v1.

- -

In addition, is http://135.170.226.91:8400 accessible to Databricks? Could you try doing a %sh command inside of a databricks notebook and see if you can ping that IP address (https://linux.die.net/man/8/ping)?

- -

For your Databricks cluster did you VNET inject it into an existing VNET? If it's in an existing VNET, you should confirm that the VM running marquez can access it. If it's in a non-VNET injected VNET, you probably need to redeploy to a VNET that has that VM or has connectivity to that VM.

-
-
linux.die.net
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 15:19:22
-
-

*Thread Reply:* Ya, know i meant to ask about that. Docs say 1 like you mention: https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark/databricks. I second guessed from this thread https://openlineage.slack.com/archives/C01CK9T7HKR/p1638848249159700.

-
- - -
- - - } - - Dinakar Sundar - (https://openlineage.slack.com/team/U02MQ8E22HF) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 15:23:42
-
-

*Thread Reply:* @Will Johnson, ping fails... this is surprising as the curl command mentioned above works fine.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julius Rentergent - (julius.rentergent@thetradedesk.com) -
-
2022-05-02 15:37:00
-
-

*Thread Reply:* I’m also trying to set up Databricks according to Running Marquez on AWS. Right now I’m stuck on the database part rather than the Marquez part — I can’t connect my EKS cluster to the RDS database which I described in more detail on the Marquez slack.

- -

@Kostikey Mustakas Sorry for the distraction, but I’m curious how you have set up your networking to make the API requests work with Databricks. -Good luck with your issue!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 15:47:17
-
-

*Thread Reply:* @Julius Rentergent We are using Azure and leverage Private Endpoints to connect resources in separate subscriptions. There is a Bastion proxy in place that we can map http traffic through and I have a Load Balancer Inbound NAT rule I setup that maps one our whitelisted port ranges (8400) to 5000.

- - - -
- :gratitude_thank_you: Julius Rentergent -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 20:15:01
-
-

*Thread Reply:* @Will Johnson a little progress maybe... I created a private endpoint and updated dns to point to it. Now I get a 404 Not Found error instead of a timeout

- - - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Kostikey Mustakas - (kostikey.mustakas@gmail.com) -
-
2022-05-02 20:16:41
-
-

*Thread Reply:* 22/05/03 00:09:24 ERROR EventEmitter: Could not emit lineage [responseCode=404]: {"eventType":"START","eventTime":"2022-05-03T00:09:22.498Z","run":{"runId":"f41575a0-e59d-4cbc-a401-9b52d2b020e0","facets":{"spark_version":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.8.1/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","spark-version":"3.2.1","openlineage_spark_version":"0.8.1"},"spark.logicalPlan":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.8.1/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","plan":[{"class":"org.apache.spark.sql.catalyst.plans.logical.ShowNamespaces","num-children":1,"namespace":0,"output":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"databaseName","dataType":"string","nullable":false,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":4,"jvmId":"aad3656d_8903_4db3_84f0_fe6d773d71c3"},"qualifier":[]}]]},{"class":"org.apache.spark.sql.catalyst.analysis.ResolvedNamespace","num_children":0,"catalog":null,"namespace":[]}]},"spark_unknown":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.8.1/integration/spark>","_schemaURL":"<https://openlineage.io/spec/1-0-2/OpenLineage.json#/$defs/RunFacet>","output":{"description":"Unable to serialize logical plan due to: Infinite recursion (StackOverflowError) (through reference chain: org.apache.spark.sql.catalyst.expressions.AttributeReference[\"preCanonicalized\"] .... -OpenLineageHttpException(code=null, message={"code":404,"message":"HTTP 404 Not Found"}, details=null) - at io.openlineage.spark.agent.EventEmitter.emit(EventEmitter.java:68)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julius Rentergent - (julius.rentergent@thetradedesk.com) -
-
2022-05-27 00:03:30
-
-

*Thread Reply:* Following up on this as I encounter the same issue with the Openlineage Databricks integration. This issue seems quite malicious as it crashes the Spark Context and requires a restart.

- -

I have marquez running on AWS EKS; I’m using Openlineage 0.8.2 on Databricks 10.4 (Spark 3.2.1) and my Spark config looks like this: -spark.openlineage.host <https://internal-xxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxx.us-east-1.elb.amazonaws.com> -spark.openlineage.namespace default -spark.openlineage.version v1 &lt;- also tried "1" -I can run some simple read and write commands and successfully find the log4j events highlighted in the docs: -INFO SparkContext; -INFO OpenLineageContext; -INFO AsyncEventQueue for each time I run the cell -After doing this a few times I get The spark context has stopped and the driver is restarting. Your notebook will be automatically reattached. -stderr shows a bunch of things. log4j shows the same as for Kostikey: ERROR EventEmitter: [...] Unable to serialize logical plan due to: Infinite recursion (StackOverflowError)

- -

I have one more piece of information which I can’t make much sense of, but hopefully someone else can; if I include the port in the host, I can very reliably crash the Spark Context on the first attempt. So: -<https://internal-xxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxx.us-east-1.elb.amazonaws.com> &lt;- crashes after a couple of attempts, sometimes it takes me a while to reproduce it while repeatedly reading/writing the same datasets -<https://internal-xxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxx.us-east-1.elb.amazonaws.com:80> &lt;- crashes on first try -Any insights would be greatly appreciated! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julius Rentergent - (julius.rentergent@thetradedesk.com) -
-
2022-05-27 00:22:27
-
-

*Thread Reply:* I tried two more things: -• curl works, ping fails, just like in the previous report -• Databricks allows providing spark configs without quotes, whereas quotes are generally required for Spark. So I added the quotes to the host name, but now I’m getting: ERROR OpenLineageSparkListener: Unable to parse open lineage endpoint. Lineage events will not be collected

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2022-05-27 14:00:38
-
-

*Thread Reply:* @Kostikey Mustakas May I ask what is the reason for migration from Palantir? Sorry for this off-topic question!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-30 05:46:27
-
-

*Thread Reply:* @Julius Rentergent created issue on project github: https://github.com/OpenLineage/OpenLineage/issues/795

-
- - - - - - - -
-
Labels
- bug, integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julius Rentergent - (julius.rentergent@thetradedesk.com) -
-
2022-06-01 11:15:26
-
-

*Thread Reply:* Thank you @Maciej Obuchowski. -Just to clarify, the Spark Context crashes with and without port; it’s just that adding the port causes it to crash more quickly (on the 1st attempt).

- -

I will run some more experiments when I have time, and add the results to the ticket.

- -

Edit - added to issue:

- -

I ran some more experiments, this time with a fake host and on OpenLineage 0.9.0, and was not able to reproduce the issue with regards to the port; instead, the new experiments show that Spark 3.2 looks to be involved.

- -

On Spark 3.2.1 / Databricks 10.4 LTS: Using (fake) host http://ac7aca38330144df9.amazonaws.com:5000 crashes when the first notebook cell is evaluated with The spark context has stopped and the driver is restarting. - The same occurs when the port is removed.

- -

On Spark 3.1.2 / Databricks 9.1 LTS: Using (fake) host http://ac7aca38330144df9.amazonaws.com:5000 does not impede the cluster but, reasonably, produces for each lineage event ERROR EventEmitter: Could not emit lineage w/ exception io.openlineage.client.OpenLineageClientException: java.net.UnknownHostException - The same occurs when the port is removed.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-05-02 14:52:09
-
-

@channel The poll results are in, and the new day/time for the monthly TSC meeting is each second Thursday at 10 am PT. The next meeting will take place on Thursday, 5/19, at 10 am PT, due to a conflict with the Astronomer Spring Summit. Future meetings will take place on the second Thursday of each month. Calendar updates will be forthcoming. Thanks!

- - - -
- 🙌 Willy Lulciuc, Mynor Choc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-05-02 15:09:42
-
-

*Thread Reply:* @Michael Robinson - just to be sure, is the 5/19 meeting at 10 AM PT as well?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-05-02 15:14:11
-
-

*Thread Reply:* Yes, and I’ll update the msg for others. Thank you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-05-02 15:16:25
-
-

*Thread Reply:* Thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-05-02 21:45:39
-
-

Hii Team, as i saw marquez is building lineage by java code, from seed command, what should i do to connect with mysql (our database) with credentials and building a lineage for our data?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-05-03 12:40:55
-
-

@here How do we clear old jobs, datasets and namespaces from Marquez?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-05-04 07:04:48
-
-

*Thread Reply:* It seems we can't for now. This was the same question I had last week:

- -

https://github.com/MarquezProject/marquez/issues/1736

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-04 10:56:35
-
-

*Thread Reply:* Seems that it's really popular request 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-03 13:43:56
-
-

Hello, -I'm sending lineage events to astrocloud.datakin DB with the Marquez API. The event is sent- but the metadata for inputs and outputs isn't coming through. Below is an example of the event I'm sending. Not sure if this is the place for this question. Cross-posting to Marquez Slack. -{ - "eventTime": "2022-05-03T17:20:04.151087+00:00", - "run": { - "runId": "2dfc6dcd4011d2a1c3dc1e5861127e5b" - }, - "job": { - "namespace": "from-airflow", - "name": "Postgres_1_to_Snowflake_2.extract" - }, - "producer": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client>", - "inputs": [ - { - "name": "Postgres_1_to_Snowflake_2.extract", - "namespace": "from-airflow" - } - ] -} -Thanks.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-04 11:28:48
-
-

*Thread Reply:* @Mirko Raca pointed out that I was missing eventType.

- -

Mirko Raca : -"From a quick glance - you're missing "eventType": "START", attribute. It's also worth noting that metadata typically shows up after the second event (type COMPLETE)"

- -

thanks again.

- - - -
- 👍 Mirko Raca -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sandeep Bhat - (bhatsandeep424@gmail.com) -
-
2022-05-06 05:01:34
-
-

Hii Team, could anyone tell me, to view lineage in marquez do we have to write metadata as a code, or does marquez has a feature to scan the sql code and build a lineage automatically?please clarify my doubt regarding this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Carlos Fernández Rodríguez - (jcfernandez@keedio.com) -
-
2022-05-06 05:26:16
-
-

*Thread Reply:* As far as I understand, OpenLineage has tools to extract metadata from sources. Depend on your source, you could find an integration, if it doesn't exists you should write your own integration (and collaborate with the project)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-05-06 12:59:06
-
-

*Thread Reply:* @Sandeep Bhat take a look at https://openlineage.io/integration - there is some info there on the different integrations that can be used to automatically pull metadata.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-05-06 13:00:39
-
-

*Thread Reply:* The Airflow integration, in particular, uses a SQL parser to determine input/output tables (in cases where the data store can't be queried for that info)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorik - (jorik@scivis.net) -
-
2022-05-12 05:13:01
-
-

Hi all. We are looking at using OpenLineage for capturing some lineage in our custom processing system. I think we got the lineage events understood, but we have often datasets that get appended, or get overwritten by an operation. Is there anything in openlineage that would facilitate making this distinction? (ie. if a set gets overwritten we would be interested in the lineage events from the last overwrite, if it gets appended we would like to have all of these in the display)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mirko Raca - (racamirko@gmail.com) -
-
2022-05-12 05:48:43
-
-

*Thread Reply:* To my understanding - datasets model the structure, not the content. So, as long as your table doesn't change number of columns, it's the same thing.

- -

The catch-all would be to create a Dataset facet which would record the distinction between append/overwrite per run. But, while this is supported by the standard, Marquez does not handle custom facets at the moment (I'll happily be corrected).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorik - (jorik@scivis.net) -
-
2022-05-12 06:05:36
-
-

*Thread Reply:* Thanks, that makes sense. We're looking for a way to get the lineage of table contents. We may have to opt for new names on overwrite, or indeed extend a facet to flag these.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jorik - (jorik@scivis.net) -
-
2022-05-12 06:06:44
-
-

*Thread Reply:* Use case is compliancy, where we need to show how a certain delivered data product (at a given point in time) was constructed. We have all our transforms/transfers as code, but there are a few parts where datasets get recreated in the process after fixes have been made, and I wouldn't want to bother the auditors with those stray paths

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-12 06:12:09
-
-

*Thread Reply:* We have LifecycleStateChangeDataset facet that captures this information. It's currently emitted when using Spark integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-12 06:13:25
-
-

*Thread Reply:* > But, while this is supported by the standard, Marquez does not handle custom facets at the moment (I'll happily be corrected). -It displays this information when it exists

- - - -
- 🙌 Mirko Raca -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jorik - (jorik@scivis.net) -
-
2022-05-12 06:13:29
-
-

*Thread Reply:* Oh that looks perfect! I completely missed that, thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Marco Diaz - (mdiaz@roblox.com) -
-
2022-05-12 15:46:04
-
-

Are there any examples on how to use this facet ColumnLineageDatasetFacet.json?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-13 05:19:47
-
-

*Thread Reply:* Work with Spark is not yet fully merged

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-05-12 17:49:23
-
-

Hi All, I am trying to see where we can provide owner details when using openlineage-spark configuration, i see only namespace and other config parameters but not the owner. Can we add owner configuration also as part of openlineage-spark like spark.openlineage.owner? Owner will be used to even filter namespaces when showing the jobs or namespaces in Marquez UI.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-05-13 19:07:04
-
-

@channel The next OpenLineage Technical Steering Committee meeting is next Thursday, 5/19, at 10 am PT! Going forward, meetings will take place on the second Thursday of each month at 10 am PT. -Join us on Zoom: -https://astronomer.zoom.us/j/87156607114?pwd=a3B0K210dnRaQmdkaFdGMytBREZEQT09 -All are welcome! -Agenda: -• releases 0.7.1 & 0.8.1 -• column-level lineage -• open lineage -For notes and the agenda visit the wiki: https://tinyurl.com/openlineagetsc

- - - -
- 🙌 Maciej Obuchowski, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Libert - (yan@ioly.fr) -
-
2022-05-16 11:02:23
-
-

Hi all, we are considering using OL to send lineage events from various jobs and places in our company. Since there will be multiple producers, we would like to use Kafka as our main hub for communication. One of our sources will be Airflow (more particularly MWAA, ie airflow in its 2.2.2 version). Is there a way to configure the Airflow lineage backend to send event to kafka instead of Marquez directly? So far, from what I've seen in the docs and in here, the only way would be to create a simple proxy to stream the http events to Kafka. Is it still the case?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-16 11:31:17
-
-

*Thread Reply:* I think you can either use proxy backend: https://github.com/OpenLineage/OpenLineage/tree/main/proxy

- -

or configure OL client to send data to kafka: -https://github.com/OpenLineage/OpenLineage/tree/main/client/python#kafka

- - - -
- 👍 Yannick Libert -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Libert - (yan@ioly.fr) -
-
2022-05-16 12:15:59
-
-

*Thread Reply:* Thank you very much for the useful pointers. The proxy solutions could indeed work in our case but it implies creating another service in front of Kafka, and thus and another layer of complexity to the architecture. If there is another more "native" way of streaming event directly from the Airflow backend that'll be great to know

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-16 12:37:10
-
-

*Thread Reply:* The second link 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Libert - (yan@ioly.fr) -
-
2022-05-17 03:46:03
-
-

*Thread Reply:* Sure, we already implemented the python client for jobs outside airflow and it works great 🙂 -You are saying that there is a way to use this python client in conjonction with the MWAA lineage backend to relay the job events that come with the airflow integration (without including it in the DAGs)? -Our strategy is to use both the airflow backend to collect automatic lineage events without modifying any existing DAGs, and the in-code implementation to allow our data engineers to send their own events if they want to. -The second option works perfectly but the first one is where we struggle a bit, especially with MWAA.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-17 05:24:30
-
-

*Thread Reply:* If you can mount file to MWAA, then yes - it should work with config file option: https://github.com/OpenLineage/OpenLineage/tree/main/client/python#config-file

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Libert - (yan@ioly.fr) -
-
2022-05-17 05:40:45
-
-

*Thread Reply:* Brilliant! I'm going to test that. Thank you Maciej!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-05-17 15:20:58
-
-

A release has been requested. Are there any +1s? Three from committers will authorize. Thanks.

- - - -
- ➕ Maciej Obuchowski, Ross Turk, Willy Lulciuc, Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-05-18 10:33:03
-
-

The OpenLineage TSC meeting is tomorrow at 10am PT! https://openlineage.slack.com/archives/C01CK9T7HKR/p1652483224119229

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 16:23:56
-
-

Hey all, -Do custom extractors work with the taskflow api?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-18 16:34:25
-
-

*Thread Reply:* Hey Tyler - A custom extractor just needs to be able to assemble the runEvents and send the information out to the lineage backends.

- -

If the things you're sending/receiving with TaskFlow are accessible in terms of metadata in the environment the DAG is running in, then you should be able to make one that would work!

- -

This Webinar goes over creating custom extractors for reference.

- -

Does that answer your question?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-18 16:41:16
-
-

*Thread Reply:* Taskflow internally is just PythonOperator. If you'd write extractor that assumes something more than just it being PythonOperator then you'd probably make it work 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:15:52
-
-

*Thread Reply:* Thanks @John Thomas @Maciej Obuchowski, Your answers both make sense. I just keep running into this error in my logs: -[2022-05-18, 20:52:34 UTC] {__init__.py:97} WARNING - Unable to find an extractor. task_type=_PythonDecoratedOperator airflow_dag_id=Postgres_1_to_Snowflake_1_v3 task_id=Postgres_1 airflow_run_id=scheduled__2022-05-18T20:51:34.334045+00:00 -The picture is my custom extractor, it's not doing anything currently as this is just a test.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:16:05
-
-

*Thread Reply:* thanks again for the help yall

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-18 17:16:34
-
-

*Thread Reply:* did you set the environment variable with the path to your extractor?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:16:46
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:17:13
-
-

*Thread Reply:* i believe thats correct @John Thomas

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:18:35
-
-

*Thread Reply:* and the versions im using: -Astronomer Runtime 5.0.0 based on Airflow 2.3.0+astro.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-18 17:25:58
-
-

*Thread Reply:* this might not be the problem, but you should have only one of extract and extract_on_complete - which one are you meaning to use?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:32:26
-
-

*Thread Reply:* ahh thanks John, as of right now extract_on_complete.

- -

This is a similar setup as Michael had in the video.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john@datakin.com) -
-
2022-05-18 17:33:31
-
-

*Thread Reply:* if it's still not working I'm not really sure at this point - that's about what I had when I spun up my own custom extractor

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-18 17:39:44
-
-

*Thread Reply:* is there anything in logs regarding extractors?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:40:36
-
-

*Thread Reply:* just this: -[2022-05-18, 21:36:59 UTC] {__init__.py:97} WARNING - Unable to find an extractor. task_type=_PythonDecoratedOperator airflow_dag_id=competitive_oss_projects_git_to_snowflake task_id=Transform_git_logs_to_S3 airflow_run_id=scheduled__2022-05-18T21:35:57.694690+00:00

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-18 17:41:11
-
-

*Thread Reply:* @John Thomas Thanks, I appreciate your help.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-19 06:01:52
-
-

*Thread Reply:* No Failed to import messages?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-19 11:26:34
-
-

*Thread Reply: @Maciej Obuchowski None that I can see. Here is the full log: -``` Failed to verify remote log exists s3:///dag_id=Postgres_1_to_Snowflake_1_v3/run_id=scheduled2022-05-19T15:23:49.248097+00:00/task_id=Postgres_1/attempt=1.log. -Please provide a bucket_name instead of "s3:///dag_id=Postgres_1_to_Snowflake_1_v3/run_id=scheduled2022-05-19T15:23:49.248097+00:00/task_id=Postgres_1/attempt=1.log" - Falling back to local log -* Reading local file: /usr/local/airflow/logs/dagid=Postgres1toSnowflake1v3/runid=scheduled2022-05-19T15:23:49.248097+00:00/taskid=Postgres1/attempt=1.log -[2022-05-19, 15:24:50 UTC] {taskinstance.py:1158} INFO - Dependencies all met for <TaskInstance: Postgres1toSnowflake1v3.Postgres1 scheduled2022-05-19T15:23:49.248097+00:00 [queued]> -[2022-05-19, 15:24:50 UTC] {taskinstance.py:1158} INFO - Dependencies all met for <TaskInstance: Postgres1toSnowflake1v3.Postgres1 scheduled_2022-05-19T15:23:49.248097+00:00 [queued]>

- -

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1355} INFO -

- -

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1356} INFO - Starting attempt 1 of 1

- -

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1357} INFO -

- -

[2022-05-19, 15:24:50 UTC] {taskinstance.py:1376} INFO - Executing <Task(PythonDecoratedOperator): Postgres1> on 2022-05-19 15:23:49.248097+00:00 -[2022-05-19, 15:24:50 UTC] {standardtaskrunner.py:52} INFO - Started process 3957 to run task -[2022-05-19, 15:24:50 UTC] {standardtaskrunner.py:79} INFO - Running: ['airflow', 'tasks', 'run', 'Postgres1toSnowflake1v3', 'Postgres1', 'scheduled2022-05-19T15:23:49.248097+00:00', '--job-id', '96473', '--raw', '--subdir', 'DAGSFOLDER/pgtosnow.py', '--cfg-path', '/tmp/tmp9n7u3i4t', '--error-file', '/tmp/tmp9a55v9b'] -[2022-05-19, 15:24:50 UTC] {standardtaskrunner.py:80} INFO - Job 96473: Subtask Postgres1 -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/configuration.py:470 DeprecationWarning: The sqlalchemyconn option in [core] has been moved to the sqlalchemyconn option in [database] - the old setting has been used, but please update your config. -[2022-05-19, 15:24:50 UTC] {taskcommand.py:369} INFO - Running <TaskInstance: Postgres1toSnowflake1v3.Postgres1 scheduled2022-05-19T15:23:49.248097+00:00 [running]> on host 056ca0b6c7f5 -[2022-05-19, 15:24:50 UTC] {taskinstance.py:1568} INFO - Exporting the following env vars: -AIRFLOWCTXDAGOWNER=airflow -AIRFLOWCTXDAGID=Postgres1toSnowflake1v3 -AIRFLOWCTXTASKID=Postgres1 -AIRFLOWCTXEXECUTIONDATE=20220519T15:23:49.248097+00:00 -AIRFLOWCTXTRYNUMBER=1 -AIRFLOWCTXDAGRUNID=scheduled2022-05-19T15:23:49.248097+00:00 -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'executiondate' from the template is deprecated and will be removed in a future version. Please use 'dataintervalstart' or 'logicaldate' instead. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'nextds' from the template is deprecated and will be removed in a future version. Please use '{{ dataintervalend | ds }}' instead. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'nextdsnodash' from the template is deprecated and will be removed in a future version. Please use '{{ dataintervalend | dsnodash }}' instead. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'nextexecutiondate' from the template is deprecated and will be removed in a future version. Please use 'dataintervalend' instead. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'prevds' from the template is deprecated and will be removed in a future version. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'prevdsnodash' from the template is deprecated and will be removed in a future version. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'prevexecutiondate' from the template is deprecated and will be removed in a future version. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'prevexecutiondatesuccess' from the template is deprecated and will be removed in a future version. Please use 'prevdataintervalstartsuccess' instead. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'tomorrowds' from the template is deprecated and will be removed in a future version. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'tomorrowdsnodash' from the template is deprecated and will be removed in a future version. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'yesterdayds' from the template is deprecated and will be removed in a future version. -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/utils/context.py:202 AirflowContextDeprecationWarning: Accessing 'yesterdaydsnodash' from the template is deprecated and will be removed in a future version. -[2022-05-19, 15:24:50 UTC] {python.py:173} INFO - Done. Returned value was: extract -[2022-05-19, 15:24:50 UTC] {loggingmixin.py:115} WARNING - /usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py:1369 DeprecationWarning: Passing 'executiondate' to 'TaskInstance.xcompush()' is deprecated. -[2022-05-19, 15:24:50 UTC] {init.py:97} WARNING - Unable to find an extractor. tasktype=PythonDecoratedOperator airflowdagid=Postgres1toSnowflake1v3 taskid=Postgres1 airflowrunid=scheduled2022-05-19T15:23:49.248097+00:00 -[2022-05-19, 15:24:50 UTC] {client.py:74} INFO - Constructing openlineage client to send events to https://api.astro-livemaps.datakin.com/ -[2022-05-19, 15:24:50 UTC] {taskinstance.py:1394} INFO - Marking task as SUCCESS. dagid=Postgres1toSnowflake1v3, taskid=Postgres1, executiondate=20220519T152349, startdate=20220519T152450, enddate=20220519T152450 -[2022-05-19, 15:24:50 UTC] {localtaskjob.py:156} INFO - Task exited with return code 0 -[2022-05-19, 15:24:50 UTC] {localtask_job.py:273} INFO - 1 downstream tasks scheduled from follow-on schedule check```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Josh Owens - (Josh@kickstand.work) -
-
2022-05-19 16:57:38
-
-

*Thread Reply:* @Maciej Obuchowski is our ENV var wrong maybe? Do we need to mention the file to import somewhere else that we may have missed?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-20 10:26:01
-
-

*Thread Reply:* @Josh Owens one thing I can think of is that you might have older openlineage integration version, as OPENLINEAGE_EXTRACTORS variable was added very recently: https://github.com/OpenLineage/OpenLineage/pull/694

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-20 11:58:28
-
-

*Thread Reply:* @Maciej Obuchowski, that was it! For some reason, my requirements.txt wasn't pulling the latest version of openlineage-airflow. Working now with 0.8.2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-20 11:59:01
-
-

*Thread Reply:* 🙌

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Raymond - (michael.raymond@cervest.earth) -
-
2022-05-19 05:32:06
-
-

Hi 👋, I'm looking at OpenLineage as a solution for fine-grained data lineage tracking. Could I clarify a couple of points?

- -

Where does one specify the version of an input dataset in the RunEvent? In the Marquez seed data I can see that it's recorded, but I'm not sure where it goes from looking at the OpenLineage schema. Or does it just assume the last version?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-19 05:59:59
-
-

*Thread Reply:* Currently, it assumes latest version. -There's an effort with DatasetVersionDatasetFacet to be able to specify it manually - or extract this information from cases like Iceberg or Delta Lake tables.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Raymond - (michael.raymond@cervest.earth) -
-
2022-05-19 06:14:59
-
-

*Thread Reply:* Ah ok. Is it Marquez assuming the latest version when it records the OpenLineage event?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-19 06:18:20
-
-

*Thread Reply:* yes

- - - -
- ✅ Michael Raymond -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Raymond - (michael.raymond@cervest.earth) -
-
2022-05-19 06:54:40
-
-

*Thread Reply:* Thanks, that's very helpful 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-19 15:23:33
-
-

Hi all, -I was testing https://github.com/MarquezProject/marquez/tree/main/examples/airflow#step-21-create-dag-counter, and the following error was observed in my airflow env:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-19 15:23:52
-
-

Anybody know why this is happening? Any comments would be welcomed.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-19 15:27:35
-
-

*Thread Reply:* @Howard Yoo What version of airflow?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-19 15:27:51
-
-

*Thread Reply:* it's 2.3

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-19 15:28:42
-
-

*Thread Reply:* (sorry, it's 2.4)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-19 15:29:28
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow Id refer to the docs again.

- -

"Airflow 2.3+ -Integration automatically registers itself for Airflow 2.3 if it's installed on Airflow worker's python. This means you don't have to do anything besides configuring it, which is described in Configuration section."

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-19 15:29:53
-
-

*Thread Reply:* Right, configuring I don't see any issues

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-19 15:30:56
-
-

*Thread Reply:* so you dont need:

- -

from openlineage.airflow import DAG

- -

in your dag files

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-19 15:31:41
-
-

*Thread Reply:* Okay... that makes sense then

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-19 15:32:47
-
-

*Thread Reply:* so if you need to import DAG it would just be: -from airflow import DAG

- - - -
- 👍 Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-19 15:56:19
-
-

*Thread Reply:* Thanks!

- - - -
- 👍 Tyler Farris -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-05-19 17:13:02
-
-

@channel OpenLineage 0.8.2 is now available! The project now supports credentialing from the Airflow Secrets Backend and for the Azure Databricks Credential Passthrough, detection of datasets wrapped by ExternalRDDs, bug fixes, and more. For the details, see: https://github.com/OpenLineage/OpenLineage/releases/tag/0.8.2

- - - -
- 🎉 Marco Diaz, Howard Yoo, Willy Lulciuc, Michael Collado, Ross Turk, Francis McGregor-Macdonald, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-05-19 22:18:42
-
-

Hi~ everyone Is there possible to let openlineage to support camel pipeline?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-20 10:23:55
-
-

*Thread Reply:* What changes do you mean by letting openlineage support? -Or, do you mean, to write Apache Camel integration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-05-22 19:54:17
-
-

*Thread Reply:* @Maciej Obuchowski Yes, let openlineage work as same as airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-05-22 19:56:47
-
-

*Thread Reply:* I think this is a very valuable thing. I wish openlineage can support some commonly used pipeline tools, and try to abstract out some general interfaces so that users can expand by themselves

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-23 05:20:30
-
-

*Thread Reply:* For Python, we have OL client, common libraries (well, at least beginning of them) and SQL parser

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-23 05:20:44
-
-

*Thread Reply:* As we support more systems, the general libraries will grow as well.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-05-20 13:50:53
-
-

I see a change in the metadata collected from Airflow jobs which I think was introduced with the combination of Airflow 2.3/OpenLineage 0.8.1. There's an airflow_version facet that contains an operator attribute.

- -

Previously that attribute had values such as: airflow.providers.postgres.operators.postgres.PostgresOperator but I now see that for the very same task the operator is now tracked as: airflow.models.taskinstance.TaskInstance

- -

( fwiw there's also a taskInfo attribute in there containing a json string which itself has a operator that is still set to PostgresOperator )

- -

Is this an already known issue?

- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-05-20 20:23:15
-
-

*Thread Reply:* This looks like a bug. we are probably not looking at the right instance in the TaskInstanceListener

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-05-21 14:17:19
-
-

*Thread Reply:* @Howard Yoo I filed: https://github.com/OpenLineage/OpenLineage/issues/767 for this

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-05-20 21:42:46
-
-

Would anyone happen to have a link to the Technical Steering Committee meeting recordings?

- -

I have quite a few people interested in seeing the overview of column lineage that Pawel provided during the Technical Steering Committee meeting on Thursday May 19th.

- -

The wiki does not include a link to the recordings: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- -

Are the recordings made public? Thank you for any links and guidance!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-05-20 21:55:09
-
-

That would be @Michael Robinson Yes the recordings are made public.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-05-20 22:05:27
-
-

@Will Johnson I’ll put this on the https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting|wiki soon, but here is the link to the recording: https://astronomer.zoom.us/rec/share/xUBW-n6G4u1WS89tCSXStx8BMl99rCfCC6jGdXLnkN6gMGn5G-_BC7pxHKKeELhG.0JFl88isqb64xX-3 -PW: 1VJ=K5&X

- - - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-05-21 09:42:21
-
-

*Thread Reply:* Thank you so much, Michael!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-23 15:00:10
-
-

Is there documentation/examples around creating custom facets?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-24 06:41:11
-
-

*Thread Reply:* In Python or Java?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-24 06:44:32
-
-

*Thread Reply:* In python just inherit BaseFacet and add _get_schema static method that would point to some place where you have your json schema of a facet. For example our DbtVersionRunFacet

- -

In Java you can take a look at Spark's custom facets.

-
- - - - - - - - - - - - - - - - -
- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-24 16:40:00
-
-

*Thread Reply:* Thanks, @Maciej Obuchowski, I was asking in regards to Python, sorry I should have clarified.

- -

I'm not sure what the disconnect is, but the facets aren't showing up in the inputs and outputs. The Lineage event is sent successfully to my astrocloud.

- -

below is the facet and extractor, any help is appreciated. Thanks!

- -

```import logging -from openlineage.airflow.extractors.base import BaseExtractor, TaskMetadata -from openlineage.client.run import InputDataset, OutputDataset -from typing import List, Optional -from openlineage.client.facet import BaseFacet -import attr

- -

log = logging.getLogger(name)

- -

@attr.s -class ManualLineageFacet(BaseFacet): - database: Optional[str] = attr.ib(default=None) - cluster: Optional[str] = attr.ib(default=None) - connectionUrl: Optional[str] = attr.ib(default=None) - target: Optional[str] = attr.ib(default=None) - source: Optional[str] = attr.ib(default=None) - _producer: str = attr.ib(init=False) - _schemaURL: str = attr.ib(init=False)

- -
@staticmethod
-def _get_schema() -&gt; str:
-    return {
-        "$schema": "<http://json-schema.org/schema#>",
-        "$defs": {
-            "ManualLineageFacet": {
-                "allOf": [
-                    {
-                        "type": "object",
-                        "properties": {
-                            "database": {
-                                "type": "string",
-                                "example": "Snowflake",
-                            },
-                            "cluster": {
-                                "type": "string",
-                                "example": "us-west-2",
-                            },
-                            "connectionUrl": {
-                                "type": "string",
-                                "example": "<http://snowflake>",
-                            },
-                            "target": {
-                                "type": "string",
-                                "example": "Postgres",
-                            },
-                            "source": {
-                                "type": "string",
-                                "example": "Stripe",
-                            },
-                            "description": {
-                                "type": "string",
-                                "example": "Description of inlet/outlet",
-                            },
-                            "_producer": {
-                                "type": "string",
-                            },
-                            "_schemaURL": {
-                                "type": "string",
-                            },
-                        },
-                    },
-                ],
-                "type": "object",
-            }
-        },
-    }
-
- -

class ManualLineageExtractor(BaseExtractor): - @classmethod - def getoperatorclassnames(cls) -> List[str]: - return ["PythonOperator", "_PythonDecoratedOperator"]

- -
def extract_on_complete(self, task_instance) -&gt; Optional[TaskMetadata]:
-
-    return TaskMetadata(
-        f"{task_instance.dag_run.dag_id}.{task_instance.task_id}",
-        inputs=[
-            InputDataset(
-                namespace="default",
-                name=self.operator.get_inlet_defs()[0]["name"],
-                inputFacets=ManualLineageFacet(
-                    database=self.operator.get_inlet_defs()[0]["database"],
-                    cluster=self.operator.get_inlet_defs()[0]["cluster"],
-                    connectionUrl=self.operator.get_inlet_defs()[0][
-                        "connectionUrl"
-                    ],
-                    target=self.operator.get_inlet_defs()[0]["target"],
-                    source=self.operator.get_inlet_defs()[0]["source"],
-                ),
-            )
-            if self.operator.get_inlet_defs()
-            else {},
-        ],
-        outputs=[
-            OutputDataset(
-                namespace="default",
-                name=self.operator.get_outlet_defs()[0]["name"],
-                outputFacets=ManualLineageFacet(
-                    database=self.operator.get_outlet_defs()[0]["database"],
-                    cluster=self.operator.get_outlet_defs()[0]["cluster"],
-                    connectionUrl=self.operator.get_outlet_defs()[0][
-                        "connectionUrl"
-                    ],
-                    target=self.operator.get_outlet_defs()[0]["target"],
-                    source=self.operator.get_outlet_defs()[0]["source"],
-                ),
-            )
-            if self.operator.get_outlet_defs()
-            else {},
-        ],
-        job_facets={},
-        run_facets={},
-    )
-
-def extract(self) -&gt; Optional[TaskMetadata]:
-    pass```
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-25 09:21:02
-
-

*Thread Reply:* _get_schema should return address to the schema hosted somewhere else - afaik sending object field where server expects string field might cause some problems

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-25 09:21:59
-
-

*Thread Reply:* can you register ManualLineageFacet as facets not as inputFacets or outputFacets?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-05-25 13:15:30
-
-

*Thread Reply:* Thanks for the advice @Maciej Obuchowski, I was able to get it working! -Also great talk today at the airflow summit.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-25 13:25:17
-
-

*Thread Reply:* Thanks 🙇

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno González - (brugms2@gmail.com) -
-
2022-05-24 06:26:25
-
-

Hey guys! I'm pretty new with OL but would like to start using it for a combination of data lineage in Airflow + data quality metrics collection. I was wondering if that was possible, but Ross clarified that in the deeper dive webinar from some weeks ago (great one by the way!).

- -

I'm referencing this comment from Julien to see if you have any updates or more examples apart from the one from great expectations. We have some custom operators and would like to push lineage and data quality metrics to Marquez using custom extractors. Any reference will be highly appreciated. Thanks in advance!

-
- - -
- - - } - - Julien Le Dem - (https://openlineage.slack.com/team/U01DCLP0GU9) -
- - - - - - - - - - - - - - - -
-
-
YouTube
- -
- - - } - - Astronomer - (https://www.youtube.com/c/Astronomer) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-24 06:35:05
-
-

*Thread Reply:* We're also getting data quality from dbt if you're running dbt test or dbt build -https://github.com/OpenLineage/OpenLineage/blob/main/integration/common/openlineage/common/provider/dbt.py#L399

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-24 06:37:15
-
-

*Thread Reply:* Generally, you'd need to construct DataQualityAssertionsDatasetFacet and/or DataQualityMetricsInputDatasetFacet and attach it to tested dataset

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bruno González - (brugms2@gmail.com) -
-
2022-05-24 13:23:34
-
-

*Thread Reply:* Thanks @Maciej Obuchowski!!!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-24 16:55:08
-
-

Hi all, https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow#development <-- does this still work? I did follow the instructions, but running pytest failed with error messages like -________________________________________________ ERROR collecting tests/extractors/test_bigquery_extractor.py ________________________________________________ -ImportError while importing test module '/Users/howardyoo/git/OpenLineage/integration/airflow/tests/extractors/test_bigquery_extractor.py'. -Hint: make sure your test modules/packages have valid Python names. -Traceback: -openlineage/airflow/utils.py:251: in import_from_string - module = importlib.import_module(module_path) -/opt/homebrew/Caskroom/miniconda/base/envs/airflow/lib/python3.9/importlib/__init__.py:127: in import_module - return _bootstrap._gcd_import(name[level:], package, level) -&lt;frozen importlib._bootstrap&gt;:1030: in _gcd_import - ??? -&lt;frozen importlib._bootstrap&gt;:1007: in _find_and_load - ??? -&lt;frozen importlib._bootstrap&gt;:986: in _find_and_load_unlocked - ??? -&lt;frozen importlib._bootstrap&gt;:680: in _load_unlocked - ??? -&lt;frozen importlib._bootstrap_external&gt;:850: in exec_module - ??? -&lt;frozen importlib._bootstrap&gt;:228: in _call_with_frames_removed - ??? -../../../airflow.master/airflow/providers/google/cloud/operators/bigquery.py:39: in &lt;module&gt; - from airflow.providers.google.cloud.hooks.bigquery import BigQueryHook, BigQueryJob -../../../airflow.master/airflow/providers/google/cloud/hooks/bigquery.py:46: in &lt;module&gt; - from googleapiclient.discovery import Resource, build -E ModuleNotFoundError: No module named 'googleapiclient'

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-24 16:55:09
-
-

...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howardyoo@gmail.com) -
-
2022-05-24 16:55:54
-
-

looks like just running the pytest wouldn't be able to run all the tests - as some of these dag tests seems to be requiring connectivities to google's big query, databases, etc..

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mardaunt - (miostat@yandex.ru) -
-
2022-05-25 16:32:08
-
-

👋 Hi everyone! -I didn't find this in the documentation. -Can open lineage show me which source columns the final DataFrame column came from? (Spark)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-25 16:59:47
-
-

*Thread Reply:* We're working on this feature - should be in the next release from OpenLineage side

- - - -
- 🙌 Mardaunt -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mardaunt - (miostat@yandex.ru) -
-
2022-05-25 17:06:12
-
-

*Thread Reply:* Thanks! I will keep an eye on updates.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2022-05-25 21:08:39
-
-

Hi all, showcase time:

- -

We have implemented a native OpenLineage endpoint and metadata writer in our Keboola all-in-one data platform. -The reason was that for more complex data pipeline scenarios it is beneficial to display the lineage in more detail. Additionally, we hope that OpenLineage as a standard will catch up and open up the ability to push lineage data into other data governance tools than Marquez. -The implementation started as an internal POC of tweaking our metadata into OpenLineage /lineage format and resulted into a native API endpoint and later on an app within Keboola platform ecosystem - feeding platform job metadata in a regular cadence. -We furthermore use a namespace for each keboola project so users can observe the data through their whole data mesh setup (multi-project architecture). -Please reach me out if you have any questions!

- -
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
-
- - - - - - - -
- - -
- 🙌 Maciej Obuchowski, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-26 06:05:33
-
-

*Thread Reply:* Looks great! Thanks for sharing!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Gopi Krishnan Rajbahadur - (gopikrishnanrajbahadur@gmail.com) -
-
2022-05-26 10:13:26
-
-

Hi OpenLineage team,

- -

I am Gopi Krishnan Rajbahadur, one of the core members of OpenDatalogy project (a project that we are currently trying to sandbox as a part of LF-AI). Our OpenDatalogy project focuses on providing a process that allows users of publicly available datasets (e.g., CIFAR-10) to ensure license compliance. In addition, we also aim to provide a public repo that documents the final rights and obligations associated with common publicly available datasets, so that users of these datasets can use them compliantly in their AI models and software.

- -

One of the key aspects of conducting dataset license compliance analysis involves tracking the lineage and provenance of the dataset (as we highlight in this paper here: https://arxiv.org/abs/2111.02374). We think that in this regard, our projects (i.e., OpenLineage and OpenDatalogy) could work together to use the existing OpenLineage standard and also collaborate to adopt/modify/enhance and use OpenLineage to track and document the lineage of a publicly available dataset. On that note, we are also working with the SPDX community to make the lineage and provenance of a dataset be tracked as a part of the SPDX BOM that is in the works for representing AI software (AI SBOM).

- -

We think our projects could mutually benefit from collaborating with each other. Our project's Github could be found here: https://github.com/OpenDataology/OpenDataology. Any feedback that you have about our project would be greatly appreciated. Also, as we are trying to sandbox our project, if you could also show us your support we would greatly appreciate it!

- -

Look forward to hearing back from you

- -

Sincerely, -Gopi

-
-
arXiv.org
- - - - - - - - - - - - - - - - - -
-
- - - - - - - -
-
Stars
- 3 -
- -
-
Last updated
- 3 days ago -
- - - - - - - - -
- - - -
- 👀 Howard Yoo, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ilqar Memmedov - (ccmilgar@gmail.com) -
-
2022-05-30 04:25:10
-
-

Hi guys, sorry for basics. -I did some PoC for OpenLineage usage for gathering metrics on Spark job, especially for table creation, alter and drop -I detect that Drop/Alter table statements is not trigger listener to post lineage data, Is it normal behaviour?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-30 05:38:41
-
-

*Thread Reply:* Might be that case if you're using Spark 3.2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-30 05:38:54
-
-

*Thread Reply:* There were some changes to those operators

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-30 05:39:09
-
-

*Thread Reply:* If you're not using 3.2, please share more details 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilqar Memmedov - (ccmilgar@gmail.com) -
-
2022-05-30 07:58:58
-
-

*Thread Reply:* Yeap, im using spark version 3.2.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilqar Memmedov - (ccmilgar@gmail.com) -
-
2022-05-30 07:59:35
-
-

*Thread Reply:* is it open issue, or i have some option to force them to be sent?)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilqar Memmedov - (ccmilgar@gmail.com) -
-
2022-05-30 07:59:58
-
-

*Thread Reply:* btw thank you for quick response @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-05-30 08:00:34
-
-

*Thread Reply:* Yes, we have issue for AlterTable at least

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-06-01 02:52:14
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/616 -> that’s the issue for altering tables in Spark 3.2. -@Ilqar Memmedov Did you mean drop table or drop columns? I am not aware of any drop table issue.

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/tnazarew">@tnazarew</a> -
- -
-
Labels
- enhancement, integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilqar Memmedov - (ccmilgar@gmail.com) -
-
2022-06-01 06:03:38
-
-

*Thread Reply:* @Paweł Leszczyński drop table statement.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilqar Memmedov - (ccmilgar@gmail.com) -
-
2022-06-01 06:05:58
-
-

*Thread Reply:* For reproduce it, i just create simple spark job. -Create table as select from other, -Select data from table, and then drop entire table.

- -

Lineage data was posted only for "Create table as select" part

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-06-01 05:16:01
-
-

Hi~all, I have a question about lineage. I am now running airflow 2.3.1 and have started a latest marquez service by docker-compose. I found that using the example DAG of airflow can only see the job information, but not the lineage of the job. How can I configure it to see the lineage ?

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-03 14:20:16
-
-

*Thread Reply:* hi xiang 👋 lineage in airflow depends on the operator. some operators have extractors as part of the integration, but when they are missing you only see job information in Marquez.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-03 14:20:51
- -
-
-
- - - - - -
-
- - - - -
- -
xiang chen - (cdmikechen@hotmail.com) -
-
2022-06-01 05:23:54
-
-

Another problem is that if I declare a skip task(e.g. DummyOperator) in the DAG, it will never appear in the job list. I think this is a problem, because even if it can not run, it should be able to see it as a metadata object.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-01 10:19:33
-
-

@channel The next OpenLineage Technical Steering Committee meeting is on Thursday, June 9 at 10 am PT. Join us on Zoom: https://us02web.zoom.us/j/81831865546?pwd=RTladlNpc0FTTDlFcWRkM2JyazM4Zz09 -All are welcome! -Agenda:

- -
  1. a recent blog post about Snowflake
  2. the Great Expectations integration
  3. the dbt integration
  4. Open discussion -Notes: https://tinyurl.com/openlineagetsc -Is there a topic you think the community should discuss at this or a future meeting? DM me to add items to the agenda.
  5. -
- - - -
- 👀 Howard Yoo, Francis McGregor-Macdonald -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-04 09:45:41
-
-

@channel OpenLineage 0.9.0 is now available, featuring column-level lineage in the Spark integration, bug fixes and more! For the details, see: https://github.com/OpenLineage/OpenLineage/releases/tag/0.9.0 and https://github.com/OpenLineage/OpenLineage/compare/0.8.2...0.9.0. Thanks to all the contributors who made this release possible, including @Paweł Leszczyński for authoring the column-level lineage PRs and new contributor @JDarDagran!

- - - -
- 👍 Howard Yoo, Jarek Potiuk, Maciej Obuchowski, Ross Turk, Minkyu Park, pankaj koti, Jorik, Li Ding, Faouzi, Howard Yoo, Mardaunt -
- -
- 🎉 pankaj koti, Faouzi, Howard Yoo, Sheeri Cabral (Collibra), Mardaunt -
- -
- ❤️ Faouzi, Howard Yoo, Mardaunt -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-06-06 16:14:52
-
-

Hey, all. Working on a PR to OpenLineage. I'm curious about file naming conventions for facets. Im noticing that there are two conventions being used:

- -

• In OpenLineage.spec.facets; ex. ExampleFacet.json -• In OpenLineage.integration.common.openlineage.common.schema; ex. example-facet.json. -Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-08 08:02:58
-
-

*Thread Reply:* I think internal naming is more important 🙂

- -

I guess, for now, try to match what the local directory has.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tyler Farris - (tyler@kickstand.work) -
-
2022-06-08 10:59:39
-
-

*Thread Reply:* Thanks @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-06-07 03:24:03
-
-

Hi Team, we are seeing DatasetName as the Custom query when we run a spark job which queries Oracle DB using JDBC with a Custom Query and the custom query is having newline syntax in it which is causing the NodeId ID_PATTERN match to fail. How to give custom dataset name when we use custom queries?

- -

Marquez API regex ref: https://github.com/MarquezProject/marquez/blob/main/api/src/main/java/marquez/service/models/NodeId.java#L44 -ERROR [2022-06-07 06:11:49,592] io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: 3648e87216d7815b -! java.lang.IllegalArgumentException: node ID (dataset:oracle:thin:_//&lt;host-name&gt;:1521:( -! SELECT -! RULE.RULE_ID, -! ASSG.ASSIGNED_OBJECT_ID, ASSG.ORG_ID, ASSG.SPLIT_PCT, -! PRTCP.PARTICIPANT_NAME, PRTCP.START_DATE, PRTCP.END_DATE -! FROM RULE RULE, -! ASSG ASSG, -! PRTCP PRTCP -! WHERE -! RULE.RULE_ID = ASSG.RULE_ID(+) -! --AND RULE.RULE_ID = 300100207891651 -! AND PRTCP.PARTICIPANT_ID = ASSG.ASSIGNED_OBJECT_ID -! -- and RULE.created_by = ' 1=1 ' -! and 1=1 -! )) must start with 'dataset', 'job', or 'run'

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Zachariah V - (manish.zack@gmail.com) -
-
2022-06-08 07:48:16
-
-

Hi Team, -We have a spark job xyz that uses OpenLineageListener which posts Lineage events to Marquez server. But we are seeing some unknown jobs in the Marquez UI : -• xyz.collect_limit -• xyz.execute_insert_into_hadoop_fs_relation_command -What jobs are these (collect_limit, execute_insert_into_hadoop_fs_relation_command ) ? -How do we get the lineage listener to post only our job (xyz) ?

- - - -
- 👍 Pradeep S -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-08 11:00:41
-
-

*Thread Reply:* Those jobs are actually what Spark does underneath 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-08 11:00:57
-
-

*Thread Reply:* Are you using Delta Lake btw?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Moiz - (moiz.groups@gmail.com) -
-
2022-06-08 12:02:39
-
-

*Thread Reply:* No, this is not Delta Lake. It is a normal Spark app .

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-06-08 13:58:05
-
-

*Thread Reply:* @Maciej Obuchowski i think David posted about this before. https://openlineage.slack.com/archives/C01CK9T7HKR/p1636011698055200

-
- - -
- - - } - - David Virgil - (https://openlineage.slack.com/team/U02K9U58X7F) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-08 14:27:46
-
-

*Thread Reply:* I agree that it looks bad on UI, but I also think integration is going good job here. The eventual "aggregation" should be done by event consumer.

- -

If anything, we should filter some 'useless' nodes like collect_limit since they add nothing.

- -

We have an issue for doing this to specifically delta lake operations, as they are the biggest offenders: https://github.com/OpenLineage/OpenLineage/issues/628

-
- - - - - - - -
-
Milestone
- <a href="https://github.com/OpenLineage/OpenLineage/milestone/6">0.10.0</a> -
- - - - - - - - - - -
- - - -
- 👍 George Zachariah V -
- -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-06-08 14:33:09
-
-

*Thread Reply:* @Maciej Obuchowski but we only see these 2 jobs in the namespace, no other jobs were part of the lineage metadata, are we doing something wrong?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-06-08 16:09:15
-
-

*Thread Reply:* @Michael Robinson On this note, may we know how to form a lineage if we have different set of API's before calling the spark job (already integrated with OpenLineageSparkListener), we want to see how the different set of params pass thru these components before landing into the spark job. If we use openlineage client to post the lineage events into the Marquez, do we need to mention the same Run UUID across the lineage events for the run or is there any other way to do this? Can you pls advise?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-08 22:51:38
-
-

*Thread Reply:* I think I understand what you are asking -

- -

The runID is used to correlate different state updates (i.e., start, fail, complete, abort) across the lifespan of a run. So if you are trying to add additional metadata to the same job run, you’d use the same runID.

- -

So you’d generate a runID and send a START event, then in the various components you could send OTHER events containing the same runID + params you want to study in facets, then at the end you would send a COMPLETE.

- -

(I think there should be an UPDATE event type in the spec for this sort of thing.)

- - - -
- 👍 George Zachariah V, raghanag -
- -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-06-08 22:59:39
-
-

*Thread Reply:* thanks @Ross Turk but what i am looking for is lets say for example, if we have 4 components in the system then we want to show the 4 components as job icons in the graph and the datasets between them would show the input/output parameters that these components use. -A(job) --> DS1(dataset) --> B(job) --> DS2(dataset) --> C(job) --> DS3(dataset) --> D(job)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-08 23:04:37
-
-

*Thread Reply:* then you would need to have separate Jobs for each, with inputs and outputs defined

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-08 23:06:03
-
-

*Thread Reply:* so there would be a Run of job B that shows DS1 as an input and DS2 as an output

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
raghanag - (raghanag@gmail.com) -
-
2022-06-08 23:06:18
-
-

*Thread Reply:* got it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-08 23:06:34
-
-

*Thread Reply:* (fyi: I know openlineage but my understanding stops at spark 😄)

- - - -
- 👍 raghanag -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-06-10 12:27:58
-
-

*Thread Reply:* > The eventual “aggregation” should be done by event consumer. -@Maciej Obuchowski Are there any known client side libraries that support this aggregation already ? In case of spark applications running as part of ETL pipelines, most of the times our end user is interested in seeing only the aggregated view where all jobs spawned as part of a single application are rolled up into 1 job.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-10 12:32:14
-
-

*Thread Reply:* I believe Microsoft @Will Johnson has something similar to that, but it's probably proprietary.

- -

We'd love to have something like it, but AFAIK it affects only some percentage of Spark jobs and we can only do so much.

- -

With exception of Delta Lake/Databricks, where it affects every job, and we know some nodes that could be safely filtered client side.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-06-11 23:38:27
-
-

*Thread Reply:* @Maciej Obuchowski Microsoft ❤️ OSS!

- -

Apache Atlas doesn't have the same model as Marquez. It only knows of effectively one entity that represents the complete asset.

- -

@Mark Taylor designed this solution available now on Github to consolidate OpenLineage messages

- -

https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator/blob/d6514f2[…]/Function.Domain/Helpers/OlProcessing/OlMessageConsolodation.cs

- -

In addition, we do some filtering only based on inputs and outputs to limit the messages AFTER it has been emitted.

-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-06-19 09:37:06
-
-

*Thread Reply:* thank you !

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-08 10:54:32
-
-

@channel The next OpenLineage TSC meeting is tomorrow! https://openlineage.slack.com/archives/C01CK9T7HKR/p1654093173961669

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Maciej Obuchowski, Sheeri Cabral (Collibra), Willy Lulciuc, raghanag, Mardaunt -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Moravec - (jkb.moravec@gmail.com) -
-
2022-06-09 13:04:00
-
-

*Thread Reply:* Hi, is the link correct? The meeting room is empty

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-09 16:04:23
-
-

*Thread Reply:* sorry about that, thanks for letting us know

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mark Beebe - (mark_j_beebe@progressive.com) -
-
2022-06-13 15:13:59
-
-

Hello all, after sending dbt openlineage events to Marquez, I am now looking to use the Marquez API to extract the lineage information. I am able to use python requests to call the Marquez API to get other information such as namespaces, datasets, etc., but I am a little bit confused about what I need to enter to get the lineage. I included screenshots for what the API reference shows regarding retrieving the lineage where it shows that a nodeId is required. However, this is where I seem to be having problems. It is not exactly clear where the nodeId needs to be set or what the nodeId needs to include. I would really appreciate any insights. Thank you!

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-13 18:49:37
-
-

*Thread Reply:* Hey @Mark Beebe!

- -

In this case, nodeId is going to be either a dataset or a job. You need to tell Marquez where to start since there is likely to be more than one graph. So you need to get your hands on an identifier for that starting node.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-13 18:50:07
-
-

*Thread Reply:* You can do this in a few ways (that I can think of). First, by looking for a namespace, then querying for the datasets in that namespace:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-13 18:53:43
-
-

*Thread Reply:* Or you can search, if you know the name of the dataset:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-13 18:53:54
-
-

*Thread Reply:* aaaaannnnd that’s actually all the ways I can think of.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mark Beebe - (mark_j_beebe@progressive.com) -
-
2022-06-14 08:11:30
-
-

*Thread Reply:* That worked, thank you so much!

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-06-14 05:52:39
-
-

Hi all, I need to send the lineage information from spark integration directly to a kafka topic. Java client seems to have a KafkaTransport, is it planned to have this support from inside the spark integration as well?

- - - -
- 👀 Francis McGregor-Macdonald -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-14 10:35:48
-
-

Hi all, I’m working on a blog post about the Spark integration and would like to credit @tnazarew and @Sbargaoui for their contributions. Anyone know these contributors’ names? Are you on here? Thanks for any leads.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-14 10:37:01
-
-

*Thread Reply:* tnazarew - Tomasz Nazarewicz

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-14 10:37:14
-
-

*Thread Reply:* 🙌

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-06-15 12:46:45
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-06-14 13:58:07
-
-

Has anyone tried getting the OpenLineage Spark integration working with GCP Dataproc ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Peter Hanssens - (peter@cloudshuttle.com.au) -
-
2022-06-15 15:49:17
-
-

Hi Folks, -DataEngBytes is a community data engineering conference here in Australia and will be hosted on the 27th and 29th of September. Our CFP is open for just under a month and tickets are on sale now: -Call for paper: https://sessionize.com/dataengbytes-2022/ -Tickets: https://www.tickettailor.com/events/dataengbytes/713307 -Promo video -https://youtu.be/1HE_XNLvHss

-
-
sessionize.com
- - - - - - - - - - - - - - - -
-
-
tickettailor.com
- - - - - - - - - - - - - - - - - -
-
-
YouTube
- -
- - - } - - DataEngAU - (https://www.youtube.com/c/DataEngAU) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👀 Ross Turk, Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-17 16:23:32
-
-

A release of OpenLineage has been requested pending the merging of #856. Three +1s will authorize a release today. -@Willy Lulciuc @Michael Collado @Ross Turk @Maciej Obuchowski @Paweł Leszczyński @Mandy Chessell @Daniel Henneberger @Drew Banin @Julien Le Dem @Ryan Blue @Will Johnson @Zhamak Dehghani

-
- - - - - - - - - - - - - - - - -
- - - -
- ➕ Willy Lulciuc, Maciej Obuchowski, Michael Collado -
- -
- ✅ Michael Collado -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Chase Christensen - (christensenc3526@gmail.com) -
-
2022-06-22 17:09:18
-
-

👋 Hi everyone!

- - - -
- 👋 Conor Beverland, Ross Turk, Maciej Obuchowski, Michael Robinson, George Zachariah V, Willy Lulciuc, Dinakar Sundar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Lee - (chenzuoli709@gmail.com) -
-
2022-06-23 21:54:05
-
-

hi

- - - -
- 👋 Maciej Obuchowski, Sheeri Cabral (Collibra), Willy Lulciuc, Michael Robinson, Dinakar Sundar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-06-25 07:34:32
-
-

@channel OpenLineage 0.10.0 is now available! We added SnowflakeOperatorAsync extractor support to the Airflow integration, an InMemoryRelationInputDatasetBuilder for InMemory datasets to the Spark integration, a static code analysis tool to run in CircleCI on Python modules, a copyright to all source files, and a debugger called PMD to the build process. -Changes we made include skipping FunctionRegistry.class serialization in the Spark integration, installing the new rust-based SQL parser by default in the Airflow integration, improving the integration tests for the Airflow integration, reducing event payload size by excluding local data and including an output node in start events, and splitting the Spark integration into submodules. -Thanks to all the contributors who made this release possible! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.10.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.9.0...0.10.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Maciej Obuchowski, Filipe Comparini Vieira, Manuel, Dinakar Sundar, Ross Turk, Paweł Leszczyński, Willy Lulciuc, Adisesha Reddy G, Conor Beverland, Francis McGregor-Macdonald, Jam Car -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:29:29
-
-

Why has put dataset been deprecated? How do I add an initial data set via api?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:39:16
-
-

*Thread Reply:* I think you’re reference the deprecation of the DatasetAPI in Marquez? A milestone for the Marquez is to only collect metadata via OpenLineage events. This includes metadata for datasets , jobs , and runs . The DatasetAPI won’t be removed until support for collecting dataset metadata via OpenLineage has been added, see https://github.com/OpenLineage/OpenLineage/issues/323

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/mobuchowski">@mobuchowski</a> -
- -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:40:28
-
-

*Thread Reply:* Once the spec supports dataset metadata, we’ll outline steps in the Marquez project to switch to using the new dataset event type

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:43:20
-
-

*Thread Reply:* The DatasetAPI was also deprecated to avoid confusion around which API to use

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:41:38
-
-

🥺

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:42:21
-
-

So how would you propose I create the initial node if I am trying to do a POC?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:44:49
-
-

*Thread Reply:* Do you want to register just datasets? Or are you extracting metadata for a job that would include input / output datasets? (outside of Airflow of course)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:45:09
-
-

*Thread Reply:* Sorry didn't notice you over here ! lol

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:45:53
-
-

*Thread Reply:* So ideally I would like to map out our current data flow from on prem to aws

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:47:39
-
-

*Thread Reply:* What do you mean by mapping to AWS? Like send OL events to a service on AWS that would process the lineage metadata?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:48:14
-
-

*Thread Reply:* no, just visualize the current migration flow.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:48:53
-
-

*Thread Reply:* Ah I see, youre doing a infra migration from on prem to AWS 👌

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:49:08
-
-

*Thread Reply:* really AWS is irrelevant. Source sink -> migration scriipts -> s3 -> additional processing -> final sink

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:49:19
-
-

*Thread Reply:* correct

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:49:45
-
-

*Thread Reply:* right right. so you want to map out that flow and visualize it in Marquez? (or some other meta service)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:50:05
-
-

*Thread Reply:* yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:50:26
-
-

*Thread Reply:* which I think I can do once the first nodes exist

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:51:18
-
-

*Thread Reply:* But I don't know how to get that initial node. I tried using the input facet at job start , that didn't do it. I also can't get the sql context that is in these examples.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:51:54
-
-

*Thread Reply:* really just want to re-create food_devlivery using my own biz context

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:52:14
-
-

*Thread Reply:* Have you looked over our workshops and this example? (assuming you’re using python?)

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:53:49
-
-

*Thread Reply:* that goes over the py client with some OL examples, but really calling openlineage.emit(...) method with RunEvents and specifying Marquez as the backend will get you up and running!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:54:32
-
-

*Thread Reply:* Don’t forget to configure the transport for the client

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:54:45
-
-

*Thread Reply:* sweet. Thank you! I'll take a look. Also.. Just came across datakin for the first time. very nice 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:55:25
-
-

*Thread Reply:* thanks! …. but we’re now part of astronomer.io 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:55:48
-
-

*Thread Reply:* making airflow oh-so-easy-to-use one DAG at a time

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:55:52
-
-

*Thread Reply:* saw that too !

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:56:03
-
-

*Thread Reply:* you’re on top of it!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:56:28
-
-

*Thread Reply:* ha. Thanks again!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:42:40
-
-

This would be outside of Airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-06-28 18:43:22
-
-

Hello, -Is OpenLineage planning to add support for inlets and outlets for Airflow integration? I am working on a project that relies on it and was hoping to contribute to this feature if its something that is in the talks. -I saw an open issue here

- -

I am willing to work on it. My plan was to just support Files and Tables entities (for inlets and outlets). -Pass the inlets and outlets info into extract_metadata function here and then convert Airflow entities into TaskMetaData entities here.

- -

Does this sound reasonable?

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:59:38
-
-

*Thread Reply:* Honestly, I’ve been a huge fan of using / falling back on inlets and outlets since day 1. AND if you’re willing to contribute this support, you get a +1 from me (I’ll add some minor comments to the issue) /cc @Julien Le Dem

- - - -
- 🙌 Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:59:59
-
-

*Thread Reply:* would be great to get @Maciej Obuchowski thoughts on this as well

- - - -
- 👍 Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-08 12:40:39
-
-

*Thread Reply:* I have created a draft PR for this here. -Please let me know if the changes make sense.

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-08 12:42:30
-
-

*Thread Reply:* I think this effort: https://github.com/OpenLineage/OpenLineage/pull/904 ultimately makes more sense, since it will allow getting lineage on Airflow 2.3+ too

-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
- ✅ Fenil Doshi -
- -
- 👀 Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-08 18:12:47
-
-

*Thread Reply:* I have made the changes in-line to the mentioned comments here. -Does this look good?

-
- - - - - - - -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-12 09:35:22
-
-

*Thread Reply:* I think it looks good! Would be great to have tests for this feature though.

- - - -
- 👍 Fenil Doshi, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-15 21:56:50
-
-

*Thread Reply:* I have added the tests! Would really appreciate it if someone can take a look and let me know if anything else needs to be done. -Thank you for the support! 😄

- - - -
- 👀 Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-18 06:48:03
-
-

*Thread Reply:* One change and I think it will be good for now.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-18 06:48:07
-
-

*Thread Reply:* Have you tested it manually?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-20 13:22:04
-
-

*Thread Reply:* Thanks a lot for the review! Appreciate it 🙌 -Yes, I tested it manually (for Airflow versions 2.1.4 and 2.3.3) and it works 🎉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-20 13:24:55
-
-

*Thread Reply:* I think this is such a useful feature to have, thank you! Would you mind adding a little example to the PR of how to use it? Like a little example DAG or something? ( either in a comment or edit the PR description )

- - - -
- 👍 Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-20 15:20:32
-
-

*Thread Reply:* Yes, Sure! I will add it in the PR description

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-21 05:30:56
-
-

*Thread Reply:* I think it would be easy to convert to integration test then if you provided example dag

- - - -
- 👍 Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-27 12:20:43
-
-

*Thread Reply:* ping @Fenil Doshi if possible I would really love to see the example DAG on there 🙂 🙏

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-27 12:26:22
-
-

*Thread Reply:* Yes, I was going to but the PR got merged so did not update the description. Should I just update the description of merged PR? Or should I add it somewhere in the docs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-27 12:42:29
-
-

*Thread Reply:* ^ @Ross Turk is it easy for @Fenil Doshi to contribute doc for manual inlet definition on the new doc site?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-27 12:48:32
-
-

*Thread Reply:* It is easy 🙂 it's just markdown: https://github.com/openlineage/docs/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-27 12:49:23
-
-

*Thread Reply:* @Fenil Doshi feel free to create new page here and don't sweat where to put it, we'll still figuring the structure of it out and will move it then

- - - -
- 👍 Ross Turk, Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-27 13:12:31
-
-

*Thread Reply:* exactly, yes - don’t be worried about the doc quality right now, the doc site is still in a pre-release state. so whatever you write will be likely edited or moved before it becomes official 👍

- - - -
- 👍 Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-27 20:37:34
-
-

*Thread Reply:* I added documentations here - https://github.com/OpenLineage/docs/pull/16

- -

Also, have added an example for it. 🙂 -Let me know if something is unclear and needs to be updated.

-
- - - - - - - - - - - - - - - - -
- - - -
- ✅ Conor Beverland -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-28 12:50:54
-
-

*Thread Reply:* Thanks! very cool.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-28 12:52:22
-
-

*Thread Reply:* Does Airflow check the types of the inlets/outlets btw?

- -

Like I wonder if a user could directly define an OpenLineage DataSet ( which might even have various other facets included on it ) and specify it in the inlets/outlets ?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-28 12:54:56
-
-

*Thread Reply:* Yeah, I was also curious about using the models from airflow.lineage.entities as opposed to openlineage.client.run.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-28 12:55:42
-
-

*Thread Reply:* I am accustomed to creating OpenLineage entities like this:

- -

taxes = Dataset(namespace="<postgres://foobar>", name="schema.table")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-28 12:56:45
-
-

*Thread Reply:* I don’t dislike the airflow.lineage.entities models especially, but if we only support one of them…

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-28 12:58:18
-
-

*Thread Reply:* yeah, if Airflow allows that class within inlets/outlets it'd be nice to support both imo.

- -

Like we would suggest users to use openlineage.client.run.Dataset but if a user already has DAGs that use Table then they'd still work in a best efforts way.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-28 13:03:07
-
-

*Thread Reply:* either Airflow depends on OpenLineage or we can probably change those entities as part of AIP-48 overhaul to more openlineage-like ones

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-28 17:18:35
-
-

*Thread Reply:* hm, not sure I understand the dependency issue. isn’t this extractor living in openlineage-airflow?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-08-15 09:49:02
-
-

*Thread Reply:* I gave manual lineage a try with native OL Datasets specified in the Airflow inlets/outlets and it seems to work! Had to make some small tweaks which I have attempted here: https://github.com/OpenLineage/OpenLineage/pull/1015

- -

( I left the support for converting the Airflow Table to Dataset because I think that's nice to have also )

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:44:24
-
-

food_delivery example example.etl_categories node

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike brenes - (brenesmi@gmail.com) -
-
2022-06-28 18:44:40
-
-

how do I recreate that using Openlineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:45:52
-
-

*Thread Reply:* Ahh great question! I actually just updated the seeding cmd for Marquez to do just this (but in java of course)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:46:15
-
-

*Thread Reply:* Give me a sec to send you over the diff…

- - - -
- ❤️ Mike brenes -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-06-28 18:56:35
-
-

*Thread Reply:* … continued here https://openlineage.slack.com/archives/C01CK9T7HKR/p1656456734272809?thread_ts=1656456141.097229&cid=C01CK9T7HKR

-
- - -
- - - } - - Willy Lulciuc - (https://openlineage.slack.com/team/U01DCMDFHBK) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-06-28 20:05:33
-
-

I'm very new to DBT but wanted to give it a try with OL. I had a couple of questions when going through the DBT tutorial here: https://docs.getdbt.com/guides/getting-started/learning-more/getting-started-dbt-core

- -
  1. An earlier part of the tutorial has you build a model in a single sql file: https://docs.getdbt.com/guides/getting-started/learning-more/getting-started-dbt-core#build-your-first-model When I did this and ran dbt-ol I got a lineage graph like this:
  2. -
- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-06-28 20:07:11
-
-

then a later part of the tutorial has you split that same example into multiple models and when I run it again I get the graph like:

- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-06-28 20:08:54
-
-

^ I'm just kind of curious if it's working as expected? And/or could it be possible to parse the DBT .sql so that the lineage in the first case would still show those staging tables?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-29 10:04:14
-
-

*Thread Reply:* I think you should declare those as sources? Or do you need something different?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-06-29 21:15:33
-
-

*Thread Reply:* I'll try to experiment with this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-06-28 20:09:19
-
-
  1. I see that DBT has a concept of adding tests to your models. Could those add data quality facets in OL ?
  2. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-06-29 10:02:17
-
-

*Thread Reply:* this should already be working if you run dbt-ol test or dbt-ol build

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-06-29 21:15:25
-
-

*Thread Reply:* oh, nice!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
shweta p - (shweta.pbs@gmail.com) -
-
2022-07-04 02:48:35
-
-

Hi everyone, i am trying openlineage-dbt. It works perfectly on locally when i try to publish the events to Marquez...but when i run the same commands from mwaa...i dont see those events triggered..i amnt able to view any logs if there is any error. How do i debug the issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-07-06 14:26:59
-
-

*Thread Reply:* Maybe @Maciej Obuchowski knows? You need to check, it's using the dbt-ol command and that the configuration is available. (environment variables or conf file)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-06 15:31:20
-
-

*Thread Reply:* Maybe some aws networking stuff? I'm not really sure how mwaa works internally (or, at all - never used it)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-06 15:35:06
-
-

*Thread Reply:* anyway, any logs/errors should be in the same space where your task logs are

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-06 05:32:28
-
-

Agenda items are requested for the next OpenLineage Technical Steering Committee meeting on July 14. Reply in thread or ping me with your item(s)!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-06 10:21:50
-
-

*Thread Reply:* What is the status on the Flink / Streaming decisions being made for OpenLineage / Marquez?

- -

A few months ago, Flink was being introduced and it was said that more thought was needed around supporting streaming services in OpenLineage.

- -

It would be very helpful to know where the community stands on how streaming data sources should work in OpenLineage.

- - - -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-06 11:08:01
-
-

*Thread Reply:* @Will Johnson added your item

- - - -
- 👍 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-06 10:19:44
-
-

Request for Creating a New OpenLineage Release

- -

Hello #general, as per the Governance guide (https://github.com/OpenLineage/OpenLineage/blob/main/GOVERNANCE.md#openlineage-project-releases), I am asking that we generate a new release based on the latest commit by @Maciej Obuchowski (c92a93cdf3df636a02984188563d019474904b2b) which fixes a critical issue running OpenLineage on Azure Databricks.

- -

Having this release made available to the general public on Maven would allow us to enable the hundred+ users of the solution to run OpenLineage on the latest LTS versions of Databricks. In addition, it would enable the Microsoft team to integrate the amazing column level lineage feature contributed by @Paweł Leszczyński with our solution for Microsoft Purview.

- - - -
- 👍 Maciej Obuchowski, Jakub Dardziński, Ross Turk, Willy Lulciuc, Will Johnson, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-07 10:33:41
-
-

@channel The next OpenLineage Technical Steering Committee meeting is on Thursday, July 14 at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom -All are welcome! -Agenda:

- -
  1. Announcements/recent talks
  2. Release 0.10.0 overview
  3. Flink integration retrospective
  4. Discuss: streaming services in Flink integration
  5. Open discussion -Notes: https://bit.ly/OLwiki -Is there a topic you think the community should discuss at this or a future meeting? Reply or DM me to add items to the agenda.
  6. -
-
-
Zoom Video
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Cecchi - (david_cecchi@cargill.com) -
-
2022-07-11 10:30:34
-
-

*Thread Reply:* would appreciate a TSC discussion on OL philosophy for Streaming in general and where/if it fits in the vision and strategy for OL. fully appreciate current maturity, moreso just validating how OL is being positioned from a vision perspective. as we consider aligning enterprise lineage solution around OL want to make sure we're not making bad assumptions. neat discussion might be "imagine that Confluent decided to make Stream Lineage OL compliant/capable - are we cool with that and what are the implications?".

- - - -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-12 12:36:17
-
-

*Thread Reply:* @Michael Robinson could I also have a quick 5m to talk about plans for a documentation site?

- - - -
- 👍 Michael Robinson, Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-12 12:46:29
-
-

*Thread Reply:* @David Cecchi @Ross Turk Added your items to the agenda. Thanks and looking forward to the discussion!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Cecchi - (david_cecchi@cargill.com) -
-
2022-07-12 15:08:48
-
-

*Thread Reply:* this is great - will keep an eye out for recording. if it got tabled due to lack of attendance will pick it up next TSC.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-07-12 16:12:43
-
-

*Thread Reply:* I think OpenLineage should have some representation at https://impactdatasummit.com/2022

- -

I’m happy to help craft the abstract, look over slides, etc. (I could help present, but all I’ve done with OpenLineage is one tutorial, so I’m hardly an expert).

- -

CfP closes 31 Aug so there’s plenty of time, but if you want a 2nd set of eyes on things, we can’t just wait until the last minute to submit 😄

-
-
impactdatasummit.com
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-07 12:04:09
-
-

How to create custom facets without recompiling OpenLineage?

- -

I have a customer who is interested in using OpenLineage but wants to extend the facets WITHOUT recompiling OL / maintaining a clone of OL with their changes.

- -

Do we have any examples of how someone might create their own jar but using the OpenLineage CustomFacetBuilder and then have that jar's classes be injected into OpenLineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-07 12:04:55
-
-

*Thread Reply:* @Michael Collado would you have any thoughts on how to extend the Facets without having to alter OpenLineage itself?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-07-07 15:16:45
-
-

*Thread Reply:* This is described here. Notably: -> Custom implementations are registered by following Java's ServiceLoader conventions. A file called io.openlineage.spark.api.OpenLineageEventHandlerFactory must exist in the application or jar's META-INF/service directory. Each line of that file must be the fully qualified class name of a concrete implementation of OpenLineageEventHandlerFactory. More than one implementation can be present in a single file. This might be useful to separate extensions that are targeted toward different environments - e.g., one factory may contain Azure-specific extensions, while another factory may contain GCP extensions.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-07-07 15:17:55
-
-

*Thread Reply:* This example is present in the test package - https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/tes[…]ervices/io.openlineage.spark.api.OpenLineageEventHandlerFactory

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-07 20:19:01
-
-

*Thread Reply:* @Michael Collado you are amazing! Thank you so much for pointing me to the docs and example!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-07 19:27:47
-
-

@channel @Will Johnson -OpenLineage 0.11.0 is now available! -We added: -• an HTTP option to override timeout and properly close connections in openlineage-java lib, -• dynamic mapped tasks support to the Airflow integration, -• a SqlExtractor to the Airflow integration, -• PMD to Java and Spark builds in CI. -We changed: -• when testing extractors in the Airflow integration, the extractor list length assertion is now dynamic, -• templates are rendered at the start of integration tests for the TaskListener in the Airflow integration. -Thanks to all the contributors who made this release possible! -For the bug fixes and more details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.11.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.10.0...0.11.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👍 Chandru TMBA, John Thomas, Maciej Obuchowski, Fenil Doshi -
- -
- 👏 John Thomas, Willy Lulciuc, Ricardo Gaspar -
- -
- 🙌 Will Johnson, Maciej Obuchowski, Sergio Sicre -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-07-11 07:06:36
-
-

Hi all, I am using openlineage-spark in my project where I lock the dependency versions in gradle.lockfile. After release 0.10.0, this is not working. Is this a known limitation of switching to splitting the integration into submodules?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-14 06:18:29
-
-

*Thread Reply:* Can you expand on what's not working exactly?

- -

This is not something we're aware of.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-07-19 04:09:39
-
-

*Thread Reply:* @Maciej Obuchowski Sure, I have my own library where I am creating a shadowJar. This includes the open lineage library into the new uber jar. This worked fine till 0.9.0 but now building the shadowJar gives this error -Could not determine the dependencies of task ':shadowJar'. -&gt; Could not resolve all dependencies for configuration ':runtimeClasspath'. - &gt; Could not find spark:app:0.10.0. - Searched in the following locations: - - <https://repo.maven.apache.org/maven2/spark/app/0.10.0/app-0.10.0.pom> - If the artifact you are trying to retrieve can be found in the repository but without metadata in 'Maven POM' format, you need to adjust the 'metadataSources { ... }' of the repository declaration. - Required by: - project : &gt; io.openlineage:openlineage_spark:0.10.0 - &gt; Could not find spark:shared:0.10.0. - Searched in the following locations: - - <https://repo.maven.apache.org/maven2/spark/shared/0.10.0/shared-0.10.0.pom> - If the artifact you are trying to retrieve can be found in the repository but without metadata in 'Maven POM' format, you need to adjust the 'metadataSources { ... }' of the repository declaration. - Required by: - project : &gt; io.openlineage:openlineage_spark:0.10.0 - &gt; Could not find spark:spark2:0.10.0. - Searched in the following locations: - - <https://repo.maven.apache.org/maven2/spark/spark2/0.10.0/spark2-0.10.0.pom> - If the artifact you are trying to retrieve can be found in the repository but without metadata in 'Maven POM' format, you need to adjust the 'metadataSources { ... }' of the repository declaration. - Required by: - project : &gt; io.openlineage:openlineage_spark:0.10.0 - &gt; Could not find spark:spark3:0.10.0. - Searched in the following locations: - - <https://repo.maven.apache.org/maven2/spark/spark3/0.10.0/spark3-0.10.0.pom> - If the artifact you are trying to retrieve can be found in the repository but without metadata in 'Maven POM' format, you need to adjust the 'metadataSources { ... }' of the repository declaration. - Required by: - project : &gt; io.openlineage:openlineage_spark:0.10.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-19 05:00:02
-
-

*Thread Reply:* Can you try 0.11? I think we might already fixed that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-07-19 05:50:03
-
-

*Thread Reply:* Tried with that as well. Doesn't work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-07-19 05:56:50
-
-

*Thread Reply:* Same error with 0.11.0 as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-19 08:11:13
-
-

*Thread Reply:* I think I see - we removed internal dependencies from maven's pom.xml but we also publish gradle metadata: https://repo1.maven.org/maven2/io/openlineage/openlineage-spark/0.11.0/openlineage-spark-0.11.0.module

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-19 08:11:34
-
-

*Thread Reply:* we should remove the dependencies or disable the gradle metadata altogether, it's not required

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-19 08:16:18
-
-

*Thread Reply:* @Varun Singh For now I think you can try ignoring gradle metadata: https://docs.gradle.org/current/userguide/declaring_repositories.html#sec:supported_metadata_sources

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-07-19 14:18:45
-
-

*Thread Reply:* @Varun Singh did you find out how to build shadowJar successful with release 0.10.0. I can build shadowJar with 0.9.0, but not higher version. If your problem already resolved, could you share some suggestion. thanks ^^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-07-20 03:44:40
-
-

*Thread Reply:* @Hanbing Wang I followed @Maciej Obuchowski's instructions (Thank you!) and added this to my build.gradle file: -repositories { - mavenCentral() { - metadataSources { - mavenPom() - ignoreGradleMetadataRedirection() - } - } -} -I am able to build the jar now. I am not proficient in gradle so don't know if this is the right way to do this. Please correct me if I am wrong.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-07-20 05:26:04
-
-

*Thread Reply:* Also, I am not able to see the 3rd party dependencies in the dependency lock file, but they are present in some folder inside the jar (relocated in subproject's build file). But this is a different problem ig

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-07-20 18:45:50
-
-

*Thread Reply:* Thanks @Varun Singh for the very helpful info. I will also try update build.gradle and rebuild shadowJar again.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-13 01:10:01
-
-

Java Question: Why Can't I Find a Class on the Class Path? / How the heck does the ClassLoader know where to find a class?

- -

Are there any java pros that would be willing to share alternatives to searching if a given class exists or help explain what should change in the Kusto package to make it work for the behaviors as seen in Kafka and SQL DW relation visitors? ---- Details --- -@Hanna Moazam and I are trying to introduce two new Azure data sources into OpenLineage's Spark integration. The https://github.com/Azure/azure-kusto-spark package is nearly done but we're getting tripped up on some Java concepts. In order to know if we should add the KustoRelationVisitor to the input dataset visitors, we need to see if the Kusto jar is installed on the spark / databricks cluster. In this case, the com.microsoft.kusto.spark.datasource.DefaultSource is a public class but it cannot be found using the KustRelationVisitor.class.getClassLoader().loadClass("class name") methods as seen in:

- -

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/[…]nlineage/spark/agent/lifecycle/plan/SqlDWDatabricksVisitor.java -• https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/[…]penlineage/spark/agent/lifecycle/plan/KafkaRelationVisitor.java -At first I thought it was the Azure packages but then I tried to do the same approach with a simple java library

- -

I instantiate a spark-shell like this -spark-shell --master local[4] \ ---conf spark.driver.extraClassPath=/mnt/repos/SparkListener-Basic/lib/build/libs/custom-listener.jar \ ---conf spark.extraListeners=listener.MyListener ---jars /mnt/repos/wjtestlib/lib/build/libs/lib.jar -With lib.jar containing a class that looks like this: -```package wjtestlib;

- -

public class WillLibrary { - public boolean someLibraryMethod() { - return true; - } -} -And the custom listener is very simple. -public class MyListener extends org.apache.spark.scheduler.SparkListener {

- -

private static final Logger log = LoggerFactory.getLogger("MyLogger");

- -

public MyListener() { - log.info("INITIALIZING"); - }

- -

@Override - public void onJobStart(SparkListenerJobStart jobStart) { - log.info("MYLISTENER: ON JOB START"); - try{ - log.info("Trying wjtestlib.WillLibrary"); - MyListener.class.getClassLoader().loadClass("wjtestlib.WillLibrary"); - log.info("Got wjtestlib.WillLibrary"); - } catch(ClassNotFoundException e){ - log.info("Could not get wjtestlib.WillLibrary"); - }

- -
try{
-  <a href="http://log.info">log.info</a>("Trying wjtestlib.WillLibrary using Class.forName");
-  Class.forName("wjtestlib.WillLibrary", false, this.getClass().getClassLoader());
-  <a href="http://log.info">log.info</a>("Got wjtestlib.WillLibrary using Class.forName");
-} catch(ClassNotFoundException e){
-  <a href="http://log.info">log.info</a>("Could not get wjtestlib.WillLibrary using Class.forName");
-}
-
- -

} -} -And I still a result indicating it cannot find the class. -2022-07-12 23:58:22,048 INFO MyLogger: MYLISTENER: ON JOB START -2022-07-12 23:58:22,048 INFO MyLogger: Trying wjtestlib.WillLibrary -2022-07-12 23:58:22,057 INFO MyLogger: Could not get wjtestlib.WillLibrary -2022-07-12 23:58:22,058 INFO MyLogger: Trying wjtestlib.WillLibrary using Class.forName -2022-07-12 23:58:22,065 INFO MyLogger: Could not get wjtestlib.WillLibrary using Class.forName``` -Are there any java pros that would be willing to share alternatives to searching if a given class exists or help explain what should change in the Kusto package to make it work for the behaviors as seen in Kafka and SQL DW relation visitors?

- -

Thank you for any guidance.!

-
- - - - - - - -
-
Stars
- 58 -
- -
-
Language
- Scala -
- - - - - - - - -
- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-07-13 08:50:15
-
-

*Thread Reply:* Could you unzip the created jar and verify that classes you’re trying to use are present? Perhaps there’s some relocate in shadowJar plugin, which renames the classes. Making sure the classes are present in jar good point to start.

- -

Then you can try doing classForName just from the spark-shell without any listeners added. The classes should be available there.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-13 11:42:25
-
-

*Thread Reply:* Thank you for the reply Pawel! Hanna and I just wrapped up some testing.

- -

It looks like Databricks AND open source spark does some magic when you install a library OR use --jars on the spark-shell. In both Databricks and Apache Spark, the thread running the SparkListener cannot see the additional libraries installed unless they're on the original / main class path.

- -

• Confirmed the uploaded jars are NOT shaded / renamed. -• The databricks class path ($CLASSPATH) is focused on /databricks/jars -• The added libraries are in /local_disk0/tmp and are not found in $CLASSPATH. -• The sparklistener only recognizes $CLASSPATH. -• Using a classloader with an object like spark does not find our installed class: spark.getClass().getClassLoader().getResource("com/microsoft/kusto/spark/datasource/KustoSourceOptions.class") -• When we use a classloader on a class we installed and imported, it DOES find the class. myImportedClass.getClass().getClassLoader().getResource("com/microsoft/kusto/spark/datasource/KustoSourceOptions.class") -@Michael Collado and @Maciej Obuchowski have you seen any challenges with using --jars on the spark-shell and detecting if the class is installed?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-13 12:02:05
-
-

*Thread Reply:* We run tests using --packages for external stuff like Delta - which is the same as --jars , but getting them from maven central, not local disk, and it works, like in KafkaRelationVisitor.

- -

What if you did it like it? By that I mean adding it to your code with compileOnly in gradle or provided in maven, compiling with it, then using static method to check if it loads?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-13 12:02:36
-
-

*Thread Reply:* > • When we use a classloader on a class we installed and imported, it DOES find the class. myImportedClass.getClass().getClassLoader().getResource("com/microsoft/kusto/spark/datasource/KustoSourceOptions.class") -Isn't that this actual scenario?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-13 12:36:47
-
-

*Thread Reply:* Thank you for the reply, Maciej!

- -

I will try the compileOnly route tonight!

- -

Re: myImportedClass.getClass().getClassLoader().getResource("com/microsoft/kusto/spark/datasource/KustoSourceOptions.class")

- -

I failed to mention that this was only achieved in the interactive shell / Databricks notebook. It never worked inside the SparkListener UNLESS we installed the Kusto jar on the databricks class path.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-07-14 06:43:47
-
-

*Thread Reply:* The difference between --jars and --packages is that for packages all transitive dependencies will be handled. But this does not seem to be the case here.

- -

More doc can be found here: (https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management)

- -

When starting a SparkContext, all the jars available on the classpath should be listed and put into Spark logs. So that’s the place one can check if the jar is loaded or not.

- -

If --conf spark.driver.extraClassPath is working, you can add multiple jar files there (they must be separated by commas).

- -

Other examples of adding multiple jars to spark classpath can be found here -> https://sparkbyexamples.com/spark/add-multiple-jars-to-spark-submit-classpath/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 11:20:02
-
-

*Thread Reply:* @Paweł Leszczyński thank you for the reply! Hanna and I experimented with jars vs extraClassPath.

- -

When using jars, the spark listener does NOT find the class using a classloader.

- -

When using extraClassPath, the spark listener DOES find the class using a classloader.

- -

When using --jars, we can see in the spark logs that after spark starts (and after the spark listener is already established?) there are Spark.AddJar commands being executed.

- -

@Maciej Obuchowski we also experimented with doing a compileOnly on OpenLineage's spark listener, it did not change the behavior. OpenLineage still failed to identify that I had the kusto-spark-connector.

- -

I'm going to reach out to Databricks to see if there is any guidance on letting the SparkListener be aware of classes added via their libraries / --jar method on the spark-shell.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-14 11:22:01
-
-

*Thread Reply:* So, this is only relevant to Databricks now? Because I don't understand what do you do different than us with Kafka/Iceberg/Delta

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-14 11:22:48
-
-

*Thread Reply:* I'm not the spark/classpath expert though - maybe @Michael Collado have something to add?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 11:24:12
-
-

*Thread Reply:* @Maciej Obuchowski that's a super good question on Iceberg. How do you instantiate a spark job with Iceberg installed?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 11:26:04
-
-

*Thread Reply:* It is still relevant to apache spark because I can't get OpenLineage to find the installed package UNLESS I use extraClassPath.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-14 11:29:13
-
-

*Thread Reply:* Basically, by adding --packages org.apache.iceberg:iceberg_spark_runtime_3.1_2.12:0.13.0

- -

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/tes[…]a/io/openlineage/spark/agent/SparkContainerIntegrationTest.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 11:29:51
-
-

*Thread Reply:* Trying with --pacakges right now.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 11:54:37
-
-

*Thread Reply:* Using --packages wouldn't let me find the Spark relation's default source:

- -

Spark Shell command -spark-shell --master local[4] \ ---conf spark.driver.extraClassPath=/customListener-1.0-SNAPSHOT.jar \ ---conf spark.extraListeners=listener.MyListener \ ---jars /WillLibrary.jar \ ---packages com.microsoft.azure.kusto:kusto_spark_3.0_2.12:3.0.0 -Code inside customListener:

- -

try{ - <a href="http://log.info">log.info</a>("Trying Kusto DefaultSource"); - MyListener.class.getClassLoader().loadClass("com.microsoft.kusto.spark.datasource.DefaultSource"); - <a href="http://log.info">log.info</a>("Got Kusto DefaultSource!!!!"); - } catch(ClassNotFoundException e){ - <a href="http://log.info">log.info</a>("Could not get Kusto DefaultSource"); - } -Logs indicating it still can't find the class when using --packages. -2022-07-14 10:47:35,997 INFO MyLogger: MYLISTENER: ON JOB START -2022-07-14 10:47:35,997 INFO MyLogger: Trying wjtestlib.WillLibrary -2022-07-14 10:47:36,000 INFO 2022-07-14 10:47:36,052 INFO MyLogger: Trying LogicalRelation -2022-07-14 10:47:36,053 INFO MyLogger: Got logical relation -2022-07-14 10:47:36,053 INFO MyLogger: Trying Kusto DefaultSource -2022-07-14 10:47:36,064 INFO MyLogger: Could not get Kusto DefaultSource -😢

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-14 11:59:07
-
-

*Thread Reply:* what if you load your listener using also packages?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-14 12:00:38
-
-

*Thread Reply:* That's how I'm doing it locally using spark.conf: -spark.jars.packages com.google.cloud.bigdataoss:gcs_connector:hadoop3-2.2.2,io.delta:delta_core_2.12:1.0.0,org.apache.iceberg:iceberg_spark3_runtime:0.12.1,io.openlineage:openlineage_spark:0.9.0

- - - -
- 👀 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 12:20:47
-
-

*Thread Reply:* @Maciej Obuchowski - You beautiful bearded man! -🙏 -2022-07-14 11:14:21,266 INFO MyLogger: Trying LogicalRelation -2022-07-14 11:14:21,266 INFO MyLogger: Got logical relation -2022-07-14 11:14:21,266 INFO MyLogger: Trying org.apache.iceberg.catalog.Catalog -2022-07-14 11:14:21,295 INFO MyLogger: Got org.apache.iceberg.catalog.Catalog!!!! -2022-07-14 11:14:21,295 INFO MyLogger: Trying Kusto DefaultSource -2022-07-14 11:14:21,361 INFO MyLogger: Got Kusto DefaultSource!!!! -I ended up setting my spark-shell like this (and used --jars for my custom spark listener since it's not on Maven).

- -

spark-shell --master local[4] \ ---conf spark.extraListeners=listener.MyListener \ ---packages org.apache.iceberg:iceberg_spark_runtime_3.1_2.12:0.13.0,com.microsoft.azure.kusto:kusto_spark_3.0_2.12:3.0.0 \ ---jars customListener-1.0-SNAPSHOT.jar -So, now I just need to figure out how Databricks differs from this approach 😢

- - - -
- 😂 Maciej Obuchowski, Jakub Dardziński, Hanna Moazam -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-07-14 12:21:35
-
-

*Thread Reply:* This is an annoying detail about Java ClassLoaders and the way Spark loads extra jars/packages

- -

Remember Java's ClassLoaders are hierarchical - there are parent ClassLoaders and child ClassLoaders. Parents can't see their children's classes, but children can see their parent's classes.

- -

When you use --spark.driver.extraClassPath , you're adding a jar to the main application ClassLoader. But when you use --jars or --packages, you're instructing the Spark application itself to load the extra jars into its own ClassLoader - a child of the main application ClassLoader that the Spark code creates and manages separately. Since your listener class is loaded by the main application ClassLoader, it can't see any classes that are loaded by the Spark child ClassLoader. Either both jars need to be on the driver classpath or both jars need to be loaded by the --jar or --packages configuration parameter

- - - -
- 🙌 Will Johnson, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-07-14 12:26:15
-
-

*Thread Reply:* In Databricks, we were not able to simply use the --packages argument to load the listener, which is why we have that init script that copies the jar into the classpath that Databricks uses for application startup (the main ClassLoader). You need to copy your visitor jar into the same location so that both jars are loaded by the same ClassLoader and can see each other

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-07-14 12:29:09
-
-

*Thread Reply:* (as an aside, this is one of the major drawbacks of the java agent approach and one reason why all the documentation recommends using the spark.jars.packages configuration parameter for loading the OL library - it guarantees that any DataSource nodes loaded by the Spark ClassLoader can be seen by the OL library and we don't have to use reflection for everything)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 12:30:25
-
-

*Thread Reply:* @Michael Collado Thank you so much for the reply. The challenge is that Databricks has their own mechanism for installing libraries / packages.

- -

https://docs.microsoft.com/en-us/azure/databricks/libraries/

- -

These packages are installed on databricks AFTER spark is started and the physical files are located in a folder that is different than the main classpath.

- -

I'm going to reach out to Databricks and see if we can get any guidance on this 😢

-
-
docs.microsoft.com
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-14 12:31:32
-
-

*Thread Reply:* Unfortunately, I can't ask users to install their packages on Databricks in a non-standard way (e.g. via an init script) because no one will follow that recommendation.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-07-14 12:32:46
-
-

*Thread Reply:* yeah, I'd prefer if we didn't need an init script to get OL on Databricks either 🤷‍♂️:skintone4:

- - - -
- 🤣 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-17 01:03:02
-
-

*Thread Reply:* Quick update: -• Turns out using a class loader from a Scala spark listener does not have this problem. -• https://stackoverflow.com/questions/7671888/scala-classloaders-confusion -• I'm trying to use URLClassLoader as recommended by a few MSFT folks and point it at the /local_disk0/tmp folder. -• https://stackoverflow.com/questions/17724481/set-classloader-different-directory -• I'm not having luck so far but hoping I can reason about it tomorrow and Monday. This is blocking us from adding additional data sources that are not pre-installed on databricks 😢

-
-
Stack Overflow
- - - - - - - - - - - - - - - - - -
-
-
Stack Overflow
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-18 05:45:59
-
-

*Thread Reply:* Can't help you now, but I'd love if you dumped the knowledge you've gained through this process into some doc on new OpenLineage doc site 🙏

- - - -
- 👍 Hanna Moazam -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-07-18 05:48:15
-
-

*Thread Reply:* We'll definitely put all of it together as a reference for others, and hopefully have a solution by the end of it too

- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-13 12:06:24
-
-

@channel The next OpenLineage TSC meeting is tomorrow at 10 am PT! https://openlineage.slack.com/archives/C01CK9T7HKR/p1657204421157959

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Willy Lulciuc, Maciej Obuchowski -
- -
- 💯 Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
David Cecchi - (david_cecchi@cargill.com) -
-
2022-07-13 16:32:12
-
-

check this out folks - marklogic datahub flow lineage into OL/marquez with jobs and runs and more. i would guess this is a pretty narrow use case but it went together really smoothly and thought i'd share sometimes it's just cool to see what people are working on

- -
- - - - - - - -
- - -
- 🍺 Willy Lulciuc, Conor Beverland, Maciej Obuchowski, Paweł Leszczyński -
- -
- ❤️ Willy Lulciuc, Conor Beverland, Julien Le Dem, Michael Robinson, Maciej Obuchowski, Minkyu Park -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-13 16:40:48
-
-

*Thread Reply:* Soo cool, @David Cecchi 💯💯💯. I’m not familiar with marklogic, but pretty awesome ETL platform and the lineage graph looks 👌! Did you have to write any custom integration code? Or where you able to use our off the self integrations to get things working? (Also, thanks for sharing!)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Cecchi - (david_cecchi@cargill.com) -
-
2022-07-13 16:57:29
-
-

*Thread Reply:* team had to write some custom stuff but it's all framework so it can be repurposed not rewritten over and over. i would see this as another "Platform" in the context of the integrations semantic OL uses, so no, we didn't start w/ an existing solution. just used internal hooks and then called lineage APIs.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-13 17:02:53
-
-

*Thread Reply:* Ah totally make sense. Would you be open to a brief presentation and/or demo in a future OL community meeting? The community is always looking to hear how OL is used in the wild, and this seems aligned with that (assuming you can talk about the implementation at a high-level)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-13 17:05:35
-
-

*Thread Reply:* No pressure, of course 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Cecchi - (david_cecchi@cargill.com) -
-
2022-07-13 17:08:50
-
-

*Thread Reply:* ha not feeling any pressure. familiar with the intentions and dynamic. let's keep that on radar - i don't keep tabs on community meetings but mid/late august would be workable. and to be clear, this is being used in the wild in a sandbox 🙂.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-13 17:12:55
-
-

*Thread Reply:* Sounds great, and a reasonable timeline! (cc @Michael Robinson can follow up). Even if it’s in a sandbox, talking about the level of effort helps with improving our APIs or sharing with others how smooth it can be!

- - - -
- 👍 David Cecchi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-13 17:18:27
-
-

*Thread Reply:* chiming in as well to say this is really cool 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-07-13 18:26:28
-
-

*Thread Reply:* Nice! Would this become a product feature in Marklogic Data Hub?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mark Chiarelli - (mark.chiarelli@marklogic.com) -
-
2022-07-14 11:07:42
-
-

*Thread Reply:* MarkLogic is a multi-model database and search engine. This implementation triggers off the MarkLogic Datahub Github batch records created when running the datahub flows. Just a toe in the water so far.

-
- - - - - - - -
-
Location
- San Carlos, CA USA -
- -
-
URL
- <http://developer.marklogic.com> -
- -
-
Repositories
- 23 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-14 20:31:18
-
-

@Ross Turk, in the OL community meeting today, you presented the new doc site (awesome!) that isn’t up (yet!), but I’ve been talk with @Julien Le Dem about the usage of _producer and would like to add a section on the use / function of _producer in OL events. Feel like the new doc site would be a great place to add this! Let me know when’s a good time to start crowd sourcing content for the site

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-14 20:37:25
-
-

*Thread Reply:* That sounds like a good idea to me. Be good to have some guidance on that.

- -

The repo is open for business! Feel free to add the page where you think it fits.

-
- - - - - - - -
-
Website
- <https://docs.openlineage.io> -
- -
-
Stars
- 1 -
- - - - - - - - -
- - - -
- ❤️ Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-14 20:42:09
-
-

*Thread Reply:* OK! Let’s do this!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-14 20:59:36
-
-

*Thread Reply:* @Ross Turk, feel free to assign to me https://github.com/OpenLineage/docs/issues/1!

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-14 20:39:26
-
-

Hey everyone! As Willy says, there is a new documentation site for OpenLineage in the works.

- -

It’s not quite ready to be, uh, a proper reference yet. But it’s not too far away. Help us get there by submitting issues, making page stubs, and adding sections via PR.

- -

https://github.com/openlineage/docs/

-
- - - - - - - -
-
Website
- <https://docs.openlineage.io> -
- -
-
Stars
- 1 -
- - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-14 20:43:09
-
-

*Thread Reply:* Thanks, @Ross Turk for finding a home for more technical / how-to docs… long overdue 💯

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-14 21:22:09
-
-

*Thread Reply:* BTW you can see the current site at http://openlineage.io/docs/ - merges to main will ship a new site.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-14 21:23:32
-
-

*Thread Reply:* great, was using <a href="http://docs.openlineage.io">docs.openlineage.io</a> … we’ll eventually want the docs to live under the docs subdomain though?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-14 21:25:32
-
-

*Thread Reply:* TBH I activated GitHub Pages on the repo expecting it to live at openlineage.github.io/docs, thinking we could look at it there before it's ready to be published and linked in to the website

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-14 21:25:39
-
-

*Thread Reply:* and it came live at openlineage.io/docs 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-07-14 21:26:06
-
-

*Thread Reply:* nice and sounds good 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-14 21:26:31
-
-

*Thread Reply:* still do not understand why, but I'll take it as a happy accident. we can move to docs.openlineage.io easily - just need to add the A record in the LF infra + the CNAME file in the static dir of this repo

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
shweta p - (shweta.pbs@gmail.com) -
-
2022-07-15 09:10:46
-
-

Hi #general, how do i link the tasks of airflow which may not have any input or output datasets as they are running some conditions. the dataset is generated only on the last task

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
shweta p - (shweta.pbs@gmail.com) -
-
2022-07-15 09:11:25
-
-

In the lineage, though there is option to link the parent , it doesnt show up the lineage of job -> job

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
shweta p - (shweta.pbs@gmail.com) -
-
2022-07-15 09:11:43
-
-

does it need to be job -> dataset -> job only ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-15 14:41:30
-
-

*Thread Reply:* yes - openlineage is job -> dataset -> job. particularly, the model is designed to observe the movement of data

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-15 14:43:41
-
-

*Thread Reply:* the spec is based around run events, which are observed states of job runs. jobs are observed to see how they affect datasets, and that relationship is what OpenLineage traces

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilya Davidov - (idavidov@marpaihealth.com) -
-
2022-07-18 11:32:06
-
-

👋 Hi everyone!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilya Davidov - (idavidov@marpaihealth.com) -
-
2022-07-18 11:32:51
-
-

i am looking for some information regarding openlineage integration with AWS Glue jobs/workflows

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ilya Davidov - (idavidov@marpaihealth.com) -
-
2022-07-18 11:33:32
-
-

i am wondering if it possible and someone already give a try and maybe documented it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john.thomas@astronomer.io) -
-
2022-07-18 15:16:54
-
-

*Thread Reply:* This thread covers glue in some detail: https://openlineage.slack.com/archives/C01CK9T7HKR/p1637605977118000?threadts=1637605977.118000&cid=C01CK9T7HKR|https://openlineage.slack.com/archives/C01CK9T7HKR/p1637605977118000?threadts=1637605977.118000&cid=C01CK9T7HKR

-
- - -
- - - } - - Francis McGregor-Macdonald - (https://openlineage.slack.com/team/U02K353H2KF) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john.thomas@astronomer.io) -
-
2022-07-18 15:17:49
-
-

*Thread Reply:* TL;Dr: you can use the spark integration to capture some lineage, but it's not comprehensive

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Cecchi - (david_cecchi@cargill.com) -
-
2022-07-18 16:29:02
-
-

*Thread Reply:* i suspect there will be opportunities to influence AWS to be a "fast follower" if OL adoption and buy-in starts to feel authentically real in non-aws portions of the stack. i discussed OL casually with AWS analytics leadership (Rahul Pathak) last winter and he seemed curious and open to this type of idea. to be clear, ~95% chance he's forgotten that conversation now but hey it's still something.

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2022-07-18 19:34:32
-
-

*Thread Reply:* There are a couple of aws people here (including me) following.

- - - -
- 👍 David Cecchi, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mikkel Kringelbach - (mikkel@theoremlp.com) -
-
2022-07-19 18:01:46
-
-

Hi all, I have been playing around with Marquez for a hackday. I have been able to get some lineage information loaded in (using the local docker version for now). I have been trying set the location (for the link) and description information for a job (the text saying "Nothing to show here") but I haven't been able to figure out how to do this using the /lineage api. Any help would be appreciated.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-19 20:11:38
-
-

*Thread Reply:* I believe what you want is the DocumentationJobFacet. It adds a description property to a job.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-19 20:13:03
-
-

*Thread Reply:* You can see a Python example here, in the Airflow integration: https://github.com/OpenLineage/OpenLineage/blob/65a5f021a1ba3035d5198e759587737a05b242e1/integration/airflow/openlineage/airflow/adapter.py#L217

- - - -
- :gratitude_thank_you: Mikkel Kringelbach -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-19 20:13:18
-
-

*Thread Reply:* (looking for a curl example…)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mikkel Kringelbach - (mikkel@theoremlp.com) -
-
2022-07-19 20:25:49
-
-

*Thread Reply:* I see, so there are special facet keys which will get translated into something special in the ui, is that correct?

- -

Are these documented anywhere?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-19 20:27:55
-
-

*Thread Reply:* Correct - info from the various OpenLineage facets are used in the Marquez UI.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-19 20:28:28
-
-

*Thread Reply:* I couldn’t find a curl example with a description field, but I did generate this one with a sql field:

- -

{ - "job": { - "name": "order_analysis.find_popular_products", - "facets": { - "sql": { - "query": "DROP TABLE IF EXISTS top_products;\n\nCREATE TABLE top_products AS\nSELECT\n product,\n COUNT(order_id) AS num_orders,\n SUM(quantity) AS total_quantity,\n SUM(price ** quantity) AS total_value\nFROM\n orders\nGROUP BY\n product\nORDER BY\n total_value desc,\n num_orders desc;", - "_producer": "https: //github.com/OpenLineage/OpenLineage/tree/0.11.0/integration/airflow", - "_schemaURL": "<https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/SqlJobFacet>" - } - }, - "namespace": "workshop" - }, - "run": { - "runId": "13460e52-a829-4244-8c45-587192cfa009", - "facets": {} - }, - "inputs": [ - ... - ], - "outputs": [ - ... - ], - "producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.11.0/integration/airflow>", - "eventTime": "2022-07-20T00: 23: 06.986998Z", - "eventType": "COMPLETE" -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-19 20:28:58
-
-

*Thread Reply:* The facets (at least, those in the core spec) are here: https://github.com/OpenLineage/OpenLineage/tree/65a5f021a1ba3035d5198e759587737a05b242e1/spec/facets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-19 20:29:19
-
-

*Thread Reply:* it’s designed so that facets can exist outside the core, in other repos, as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mikkel Kringelbach - (mikkel@theoremlp.com) -
-
2022-07-19 22:25:39
-
-

*Thread Reply:* Thank you for sharing these, I was able to get the sql query highlighting to work. But I failed to get the location link or the documentation to work. My facet attempt looked like: -{ - "facets": { - "description": "test-description-job", - "sql": { - "query": "SELECT QUERY", - "_schema": "<https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/SqlJobFacet>" - }, - "documentation": { - "documentation": "Test docs?", - "_schema": "<https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/DocumentationJobFacet>" - }, - "link": { - "type": "", - "url": "<a href="http://www.google.com/test_url">www.google.com/test_url</a>", - "_schema": "<https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/SourceCodeLocationJobFacet>" - } - } -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mikkel Kringelbach - (mikkel@theoremlp.com) -
-
2022-07-19 22:36:55
-
-

*Thread Reply:* I got the documentation link to work by renaming the property from documentation -> description . I still haven't been able to get the external link to work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-20 10:33:36
-
-

Hey all. I've been doing a cleanup of issues on GitHub. If I've closed your issue that you think is still relevant, please reopen it and let us know.

- - - -
- 🙌 Jakub Dardziński, Michael Collado, Will Johnson, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-07-21 16:09:08
-
-

Is https://databricks.com/blog/2022/06/08/announcing-the-availability-of-data-lineage-with-unity-catalog.html - are they using OpenLineage? I know there’s been a lot of work to make sure OpenLineage integrates with Databricks, even earlier this year.

-
-
Databricks
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-21 16:25:47
-
-

*Thread Reply:* There’s a good integration between OL and Databricks for pulling metadata out of running Spark clusters. But there’s not currently a connection between OL and the Unity Catalog.

- -

I think it would be cool to see some discussions start to develop around it 👍

- - - -
- 👍 Sheeri Cabral (Collibra), Julius Rentergent -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-07-21 16:26:44
-
-

*Thread Reply:* Absolutely. I saw some mention of APIs and access, and was wondering if maybe they used OpenLineage as a framework, which would be awesome.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-07-21 16:30:55
-
-

*Thread Reply:* (and since Azure Databricks uses it - https://openlineage.io/blog/openlineage-microsoft-purview/ I wasn’t sure about Unity Catalog)

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-07-21 16:56:24
-
-

*Thread Reply:* We're in the early stages of discussion regarding an OpenLineage integration for Unity. You showing interest would help increase the priority of that on the DB side.

- - - -
- 👍 Sheeri Cabral (Collibra), Will Johnson, Thijs Koot -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Thijs Koot - (thijs.koot@gmail.com) -
-
2022-07-27 11:41:48
-
-

*Thread Reply:* I'm interested in Databricks enabling an openlineage endpoint, serving as a catalogue. Similar to how they provide hosted MLFlow. I can mention this to our Databricks reps as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Joao Vicente - (joao.diogo.vicente@gmail.com) -
-
2022-07-23 04:09:55
-
-

Hi all -I am trying to find the state of columnLineage in OL -I see a proposal and some examples in https://github.com/OpenLineage/OpenLineage/search?q=columnLineage&type=|https://github.com/OpenLineage/OpenLineage/search?q=columnLineage&type= but I can't find it in the spec. -Can anyone shed any light why this would be the case?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Joao Vicente - (joao.diogo.vicente@gmail.com) -
-
2022-07-23 04:12:26
-
-

*Thread Reply:* Link to spec where I looked https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.json

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Joao Vicente - (joao.diogo.vicente@gmail.com) -
-
2022-07-23 04:37:11
-
-

*Thread Reply:* My bad. I realize now that column lineage has been implemented as a facet, hence not visible in the main spec https://github.com/OpenLineage/OpenLineage/search?q=ColumnLineageDatasetFacet&type=|https://github.com/OpenLineage/OpenLineage/search?q=ColumnLineageDatasetFacet&type=

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-07-26 19:37:54
-
-

*Thread Reply:* It is supported in the Spark integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-07-26 19:39:13
-
-

*Thread Reply:* @Paweł Leszczyński could you add the Column Lineage facet here in the spec? https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md#standard-facets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-24 16:24:15
-
-

SundayFunday

- -

Putting together some internal training for OpenLineage and highlighting some of the areas that have been useful to me on my journey with OpenLineage. Many thanks to @Michael Collado, @Maciej Obuchowski, and @Paweł Leszczyński for the continued technical support and guidance.

- -
- - - - - - - -
- - -
- ❤️ Hanna Moazam, Ross Turk, Minkyu Park, Atif Tahir, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-24 16:26:59
-
-

*Thread Reply:* @Ross Turk I still want to contribute something like this to the OpenLineage docs / new site but the bar for an internal doc is lower in my mind 😅

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-25 11:49:54
-
-

*Thread Reply:* 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-25 11:50:54
-
-

*Thread Reply:* @Will Johnson happy to help you with docs, when the time comes! sketching outline --> editing, whatever you need

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-07-26 19:39:56
-
-

*Thread Reply:* This looks nice by the way.

- - - -
- ❤️ Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sylvia Seow - (sylviaseow@gmail.com) -
-
2022-07-26 09:06:28
-
-

hi all, really appreciate if anyone could help. I have been trying to create a poc project with openlineage with dbt. attached will be the pip list of the openlineage packages that i have. However, when i run "dbt-ol"command, it prompted as öpen as file, instead of running as a command. the regular dbt run can be executed without issue. i would want i had done wrong or if any configuration that i have missed. Thanks a lot

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-26 10:39:57
-
-

*Thread Reply:* do you have proper execute permissions?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-26 10:41:09
-
-

*Thread Reply:* not sure how that works on windows, but it just looks like it does not recognize dbt-ol as executable

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sylvia Seow - (sylviaseow@gmail.com) -
-
2022-07-26 10:43:00
-
-

*Thread Reply:* yes i have admin rights. how to make this as executable?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sylvia Seow - (sylviaseow@gmail.com) -
-
2022-07-26 10:43:25
-
-

*Thread Reply:* btw do we have a sample docker image where dbt-ol can run?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-26 17:33:08
-
-

*Thread Reply:* I have also never tried on Windows 😕 but you might try python3 dbt-ol run?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sylvia Seow - (sylviaseow@gmail.com) -
-
2022-07-26 21:03:43
-
-

*Thread Reply:* will try that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-26 16:41:04
-
-

Running a single unit test on the Spark Integration - How it works with the different modules?

- -

Prior to splitting up the OpenLineage spark integration, I could run a command like the one below to test a single test or even a single test method. Now I get a failure and it's pointing to the app: module. Can anyone share the right syntax for running a unit test with the current package structure? Thank you!!

- -

```wj@DESKTOP-ECF9QME:~/repos/OpenLineageWill/integration/spark$ ./gradlew test --tests io.openlineage.spark.agent.OpenLineageSparkListenerTest

- -

> Task :app:test FAILED

- -

SUCCESS: Executed 0 tests in 872ms

- -

FAILURE: Build failed with an exception.

- -

** What went wrong: -Execution failed for task ':app:test'. -> No tests found for given includes: io.openlineage.spark.agent.OpenLineageSparkListenerTest

- -

** Try: -> Run with --stacktrace option to get the stack trace. -> Run with --info or --debug option to get more log output. -> Run with --scan to get full insights.

- -

** Get more help at https://help.gradle.org

- -

Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.

- -

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

- -

See https://docs.gradle.org/7.4/userguide/command_line_interface.html#sec:command_line_warnings

- -

BUILD FAILED in 2s -18 actionable tasks: 4 executed, 14 up-to-date```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-07-27 01:54:31
-
-

*Thread Reply:* This may be a result of splitting Spark integration into multiple submodules: app, shared, spark2, spark3, spark32, etc. If the test case is from shared submodule (this one looks like that), you could try running: -./gradlew :shared:test --tests io.openlineage.spark.agent.OpenLineageSparkListenerTest

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-07-27 03:18:42
-
-

*Thread Reply:* @Paweł Leszczyński, I tried running that command, and I get the following error:

- -

```> Task :shared:test FAILED

- -

FAILURE: Build failed with an exception.

- -

** What went wrong: -Execution failed for task ':shared:test'. -> No tests found for given includes: io.openlineage.spark.agent.OpenLineageSparkListenerTest

- -

** Try: -> Run with --stacktrace option to get the stack trace. -> Run with --info or --debug option to get more log output. -> Run with --scan to get full insights.

- -

** Get more help at https://help.gradle.org

- -

Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.

- -

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

- -

See https://docs.gradle.org/7.4/userguide/command_line_interface.html#sec:command_line_warnings

- -

BUILD FAILED in 971ms -6 actionable tasks: 2 executed, 4 up-to-date```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-07-27 03:24:41
-
-

*Thread Reply:* When running build and test for all the submodules, I can see outputs for tests in different submodules (spark3, spark2 etc), but for some reason, I cannot find any indication that the tests in -OpenLineage/integration/spark/app/src/test/java/io/openlineage/spark/agent/lifecycle/plan -are being run at all.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-07-27 03:42:43
-
-

*Thread Reply:* That’s interesting. Let’s ask @Tomasz Nazarewicz about that.

- - - -
- 👍 Hanna Moazam -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-07-27 03:57:08
-
-

*Thread Reply:* For reference, I attached the stdout and stderr messages from running the following: -./gradlew :shared:spotlessApply &amp;&amp; ./gradlew :app:spotlessApply &amp;&amp; ./gradlew clean build test

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2022-07-27 04:27:23
-
-

*Thread Reply:* I'll look into it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2022-07-28 05:17:36
-
-

*Thread Reply:* Update: some test appeared to not be visible after split, that's fixed but now I have to solevr some dependency issues

- - - -
- 🙌 Hanna Moazam, Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-07-28 05:19:16
-
-

*Thread Reply:* That's great, thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-07-29 06:05:55
-
-

*Thread Reply:* Hi Tomasz, thanks so much for looking into this. Is this your PR (https://github.com/OpenLineage/OpenLineage/pull/953) that fixes the whole issue, or is there still some work to do to solve the dependency issues you mentioned?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2022-07-29 06:07:58
-
-

*Thread Reply:* I'm still testing it, should've changed it to draft, sorry

- - - -
- 👍 Hanna Moazam, Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-07-29 06:08:59
-
-

*Thread Reply:* No worries! If I can help with testing or anything please let me know!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2022-07-29 06:09:29
-
-

*Thread Reply:* Will do! Thanks :)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-08-02 11:06:31
-
-

*Thread Reply:* Hi @Tomasz Nazarewicz, if possible, could you please share an estimated timeline for resolving the issue? We have 3 PRs which we are either waiting to open or to update which are dependent on the tests.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2022-08-02 13:45:34
-
-

*Thread Reply:* @Hanna Moazam hi, it's quite difficult to do that because the issue is that all the tests are passing when I execute ./gradlew app:test -but one is failing with ./gradlew app:build

- -

but if it fixes your problem I can disable this test for now and make a PR without it, then you can maybe unblock your stuff and I will have more time to investigate the issue.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-08-02 14:54:45
-
-

*Thread Reply:* Oh that's a strange issue. Yes that would be really helpful if you can, because we have some tests we implemented which we need to make sure pass as expected.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-08-02 14:54:52
-
-

*Thread Reply:* Thank you for your help Tomasz!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2022-08-03 06:12:07
-
-

*Thread Reply:* @Hanna Moazam https://github.com/OpenLineage/OpenLineage/pull/980 here is the pull request with the changes

-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 Hanna Moazam -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2022-08-03 06:12:26
-
-

*Thread Reply:* its waiting for review currently

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-08-03 06:20:41
-
-

*Thread Reply:* Thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-26 18:44:47
-
-

Is there any doc yet about column level lineage? I see a spec for the facet here: https://github.com/openlineage/openlineage/issues/148

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-07-26 19:41:13
-
-

*Thread Reply:* The doc site would benefit from a page about it. Maybe @Paweł Leszczyński?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-07-27 01:59:27
-
-

*Thread Reply:* Sure, it’s already on my list, will do

- - - -
- :gratitude_thank_you: Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-07-29 07:55:40
-
-

*Thread Reply:* https://openlineage.io/docs/integrations/spark/spark_column_lineage

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- ✅ Conor Beverland -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-07-26 20:03:55
-
-

maybe another question for @Paweł Leszczyński: I was watching the Airflow summit talk that you and @Maciej Obuchowski did ( very nice! ). How is this exposed? I'm wondering if it shows up as an edge on the graph in Marquez? ( I guess it may be tracked as a parent run and if so probably does not show on the graph directly at this time? )

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-07-27 04:08:18
-
-

*Thread Reply:* To be honest, I have never seen that in action and would love to have that in our documentation.

- -

@Michael Collado or @Maciej Obuchowski: are you able to create some doc? I think one of you was working on that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-27 04:24:19
-
-

*Thread Reply:* Yes, parent run

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
shweta p - (shweta.pbs@gmail.com) -
-
2022-07-27 01:29:05
-
-

Hi #general, there has been a issue with airflow+dbt+openlineage. This was working fine with openlineage-dbt v0.11.0 but there has been some change to the typeextensions due to which i had to upgrade to latest dbt (from 1.0.0 to 1.1.0) and now the dbt-ol is failing with schema version support (the version generated is v5 vs dbt-ol supports only v4). Has anyone else been able to fix this

- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-27 04:47:18
-
-

*Thread Reply:* Will take a look

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-27 04:47:40
-
-

*Thread Reply:* But generally this support message is just a warning

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-27 10:04:20
-
-

*Thread Reply:* @shweta p any actual error you've found? -I've tested it with dbt-bigquery on 1.1.0 and it works despite warning:

- -

➜ small OPENLINEAGE_URL=<http://localhost:5050> dbt-ol build -Running OpenLineage dbt wrapper version 0.11.0 -This wrapper will send OpenLineage events at the end of dbt execution. -14:03:16 Running with dbt=1.1.0 -14:03:17 Found 2 models, 3 tests, 0 snapshots, 0 analyses, 191 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics -14:03:17 -14:03:17 Concurrency: 2 threads (target='dev') -14:03:17 -14:03:17 1 of 5 START table model dbt_test1.my_first_dbt_model .......................... [RUN] -14:03:21 1 of 5 OK created table model dbt_test1.my_first_dbt_model ..................... [CREATE TABLE (2.0 rows, 0 processed) in 3.31s] -14:03:21 2 of 5 START test unique_my_first_dbt_model_id ................................. [RUN] -14:03:22 2 of 5 PASS unique_my_first_dbt_model_id ....................................... [PASS in 1.55s] -14:03:22 3 of 5 START view model dbt_test1.my_second_dbt_model .......................... [RUN] -14:03:24 3 of 5 OK created view model dbt_test1.my_second_dbt_model ..................... [OK in 1.38s] -14:03:24 4 of 5 START test not_null_my_second_dbt_model_id .............................. [RUN] -14:03:24 5 of 5 START test unique_my_second_dbt_model_id ................................ [RUN] -14:03:25 5 of 5 PASS unique_my_second_dbt_model_id ...................................... [PASS in 1.38s] -14:03:25 4 of 5 PASS not_null_my_second_dbt_model_id .................................... [PASS in 1.42s] -14:03:25 -14:03:25 Finished running 1 table model, 3 tests, 1 view model in 8.44s. -14:03:25 -14:03:25 Completed successfully -14:03:25 -14:03:25 Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5 -Artifact schema version: <https://schemas.getdbt.com/dbt/manifest/v5.json> is above dbt-ol supported version 4. This might cause errors. -Emitting OpenLineage events: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00&lt;00:00, 274.42it/s] -Emitted 10 openlineage events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Fenil Doshi - (fdoshi@salesforce.com) -
-
2022-07-27 20:39:21
-
-

When will the next version of OpenLineage be available tentatively?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-27 20:41:44
-
-

*Thread Reply:* I think it's safe to say we'll see a release by the end of next week

- - - -
- :gratitude_thank_you: Fenil Doshi -
- -
- 👍 Fenil Doshi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yehuda Korotkin - (yehudak@elementor.com) -
-
2022-07-28 04:02:06
-
-

👋 Hi everyone! -Yesterday was a great presentation by @Julien Le Dem that talked about OpenLineage and did grate comparison between OL and Open-Telemetry, (i wrote a small summary here: https://bit.ly/3z5caOI )

- -

Julian’s charm sparked inside me curiosity especially regarding OL in streaming. -I saw the design/architecture of OL I got some questions/discussions that I would like to understand better.

- -

In the context of streaming jobs reporting “start job” - “end job” might be more relevant in the context of a batch mode. -or do you mean reporting start job/end job should be processed each event?

  • and this will be equivalent to starting job each row in a table via UDF, for example.
  • -
- -

Thank you in advance

-
-
linkedin.com
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski, Michael Robinson, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-07-28 08:50:44
-
-

*Thread Reply:* Welcome to the community!

- -

We talked about this exact topic in the most recent community call. -https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting#MonthlyTSCmeeting-Nextmeeting:Nov10th2021(9amPT)

- -

Discussion: streaming in Flink integration -• Has there been any evolution in the thinking on support for streaming? - ◦ Julien: start event, complete event, snapshots in between limited to certain number per time interval - ◦ Paweł: we can make the snapshot volume configurable -• Does Flink support sending data to multiple tables like Spark? - ◦ Yes, multiple outputs supported by OpenLineage model - ◦ Marquez, the reference implementation of OL, combines the outputs

- - - -
- 🙏 Yehuda Korotkin -
- -
- ❤️ Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-28 09:56:05
-
-

*Thread Reply:* > or do you mean reporting start job/end job should be processed each event? -We definitely want to avoid tracking every single event 🙂

- -

One thing worth mentioning is that OpenLineage events are meant to be cumulative - the streaming jobs start, run, and eventually finish or restart. In the meantime, we capture additional events "in the middle" - for example, on Apache Flink checkpoint, or every few minutes - where we can emit additional information connected to the state of the job.

- - - -
- 🙏 Yehuda Korotkin -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yehuda Korotkin - (yehudak@elementor.com) -
-
2022-07-28 11:11:17
-
-

*Thread Reply:* @Will Johnson and @Maciej Obuchowski Thank you for your answer

- -

jobs start, run, and eventually finish or restart

- -

This is the perspective that I have a hard time understanding in the context of streaming.

- -

The classic streaming job should always be on it should not be “finish” event (Except failure). -usually, streaming data is “dripping”.

- -

It is possible to understand if the job starts/ends in the resolution of the running application and represents when the application begin and when it failed.

- -

if you do start/stop events from the checkpoints on Flink it might be the wrong representation instead use the concept of event-driven for example reporting state.

- -

What do you think?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yehuda Korotkin - (yehudak@elementor.com) -
-
2022-07-28 11:11:36
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-28 12:00:34
-
-

*Thread Reply:* The idea is that jobs usually get upgraded - for example, you change Apache Flink version, increase resources, or change the structure of a job - that's the difference for us. The stop events make sense, because if you for example changed SQL of your Flink SQL job, you probably would want this to be captured - from X to Y job was running with older SQL version well, but after change, the second run started and throughput dropped to 10% of the previous one.

- -

> if you do start/stop events from the checkpoints on Flink it might be the wrong representation instead use the concept of event-driven for example reporting state. -But this is an misunderstanding 🙂 -The information exposed from a checkpoints are in addition to start and stop events.

- -

We want to get information from running job - I just argue that sometimes end of a streaming job is also relevant.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-07-28 12:01:16
-
-

*Thread Reply:* The checkpoint would be captured as a new eventType: RUNNING - do I miss something why you want to add StateFacet?

- - - -
- 👍 Yehuda Korotkin -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yehuda Korotkin - (yehudak@elementor.com) -
-
2022-07-28 14:24:03
-
-

*Thread Reply:* About argue - it’s depends on what the definition of job in streaming mode, i agree that if you already have ‘job’ you want to know about the job more information.

- -

each event that entering the sub process (job) should do REST call “Start job” and “End job” ?

- -

Nope, I just represented two possible ways that i thought, - or StateFacet - or add new Event type eg. RUNNING 😉

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-28 09:14:28
-
-

Hi everyone, I’d like to request a release to publish the new Flink integration (thanks, @Maciej Obuchowski) and an important fix to the Spark integration (thanks, @Paweł Leszczyński). As per our policy here, 3 +1s from committers will authorize an immediate release. Thanks!

- - - -
- ➕ Maciej Obuchowski, Paweł Leszczyński, Willy Lulciuc, Will Johnson, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-07-28 17:30:33
-
-

*Thread Reply:* Thanks for the +1s. We will initiate the release by Tuesday.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Barak F - (fargoun@gmail.com) -
-
2022-07-28 10:30:15
-
-

Static code annotations for OpenLineage: hi everyone, i heard yesterday a great lecture by @Julien Le Dem on OpenLineage, and as i'm very interested in this area, i wanted to raise a question: are there any plans to have OpenLineage-like annotations on actual code (e.g. Spark, AirFlow, arbitrary code) to allow deducing some of the lineage informtion from static code analysis?

- -

The reason i'm asking this is because while OpenLineage does a great job of integrating with multiple platforms (AirFlow, Dbt, Spark), some companies still have a lot of legacy-related data processing stack that will probably not get full OpenLineage (as it's a one-off, and the companies themselves will probably won't implement OpenLineage support for their custom frameworks). -Having some standard way to annotate code with information like: "reads from X; writes to Y; Job name regexp: Z", may allow writing a "generic" OpenLineage colelctor that can go over the source code, collect this configuration information and then use it when constructing the lineage graph (even though it won't be as complete and full as the full OpenLineage info).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-03 08:30:15
-
-

*Thread Reply:* I think this is an interesting idea, however, just the static analysis does not convey any runtime information.

- -

We're doing something similar within Airflow now, but as a fallback mechanism: https://github.com/OpenLineage/OpenLineage/pull/914

- -

You can manually annotate DAG with information instead of writing extractor for your operator. This still gives you runtime information. Similar features might get added to other integrations, especially with such a vast scope as Airflow has - but I think it's unlikely we'd work on a feature for just statically traversing code without runtime context.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Barak F - (fargoun@gmail.com) -
-
2022-08-03 14:25:31
-
-

*Thread Reply:* Thanks for the detailed response @Maciej Obuchowski! It seems like this solution is specific only to AirFlow, and i wonder why wouldn't we generalize this outside of just AirFlow? My thinking is that there are other areas where there is vast scope (e.g. arbitrary code that does data manipulations), and without such an option, the only path is to provide full runtime information via building your own extractor, which might be a bit hard/expensive to do. -If i understand your response correctly, then you assume that OpenLineage can get wide enough "native" support across the stack without resorting to a fallback like 'static code analysis'. Is that your base assumption?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Petr Hajek - (petr.hajek@profinit.eu) -
-
2022-07-29 04:36:03
-
-

Hi all, does anybody have an experience extracting Airflow lineage using Marquez as documented here https://www.astronomer.io/guides/airflow-openlineage/#generating-and-viewing-lineage-data ? -We tested it on our Airflow instance with Marquez hoping to get the standard .json files describing lineage in accord with open-lineage model as described in https://json-schema.org/draft/2020-12/schema. -But there seems to be only one GET method related to lineage export in Marquez API library called "Get a lineage graph". This produces quite different .json structure than what we know from open-lineage. Could anybody help if there is a chance to get open-lineage .json structure from Marquez?

-
-
astronomer.io
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-07-29 12:58:38
-
-

*Thread Reply:* The query API has a different spec than the reporting API, so what you’d get from Marquez would look different from what Marquez receives.

- -

Few ideas:

- -
  1. you could send the lineage to a pipedream endpoint to inspect, if you’re just trying to experiment
  2. you could grab them from the lineage table in Marquez’s postgres
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Petr Hajek - (petr.hajek@profinit.eu) -
-
2022-07-30 16:29:24
-
-

*Thread Reply:* ok, now I understand, thank you

- - - -
- 👍 Jan Kopic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-03 08:25:57
-
-

*Thread Reply:* FYI we want to have something like that too: https://github.com/MarquezProject/marquez/issues/1927

- -

But if you need just the raw events endpoint, without UI, then Marquez might be overkill for your needs

-
- - - - - - - -
-
Comments
- 2 -
- -
-
Milestone
- <a href="https://github.com/MarquezProject/marquez/milestone/4">Roadmap</a> -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2022-07-30 13:44:13
-
-

Hi @everyone , we are trying to extract lineage information and import into amundsen .please point us right direction to move - based on the documentation -> Databricks + marquez + amundsen is this the only way to move on ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Thomas - (john.thomas@astronomer.io) -
-
2022-07-30 13:49:25
-
-

*Thread Reply:* Short of implementing an open lineage endpoint in Amundsen, yes that's the right approach.

- -

The Lineage endpoint in Marquez can output the whole graph centered on a node ID, and you can use the jobs/datasets apis to grab lists of each for reference

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Barak F - (fargoun@gmail.com) -
-
2022-07-31 00:35:06
-
-

*Thread Reply:* Is your lineage information coming via OpenLineage? if so - you can quickly use the Amundsen scripts in order to load data into Amundsen, for example, see this script here: https://github.com/amundsen-io/amundsendatabuilder/blob/master/example/scripts/sample_data_loader.py

- -

Where is your lineage coming from?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2022-08-01 20:17:22
-
-

*Thread Reply:* yes @Barak F we are using open lineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Barak F - (fargoun@gmail.com) -
-
2022-08-02 01:26:18
-
-

*Thread Reply:* So, have you tried using Amundsen data builder scripts to load the lineage information into Amundsen? (maybe you'll have to "play" with those a bit)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-03 08:24:58
-
-

*Thread Reply:* AFAIK there is OpenLineage extractor: https://www.amundsen.io/amundsen/databuilder/#openlineagetablelineageextractor

- -

Not sure it solves your issue though 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Dinakar Sundar - (dinakar_sundar@condenast.com) -
-
2022-08-05 04:46:45
-
-

*Thread Reply:* thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-01 17:08:46
-
-

@channel -OpenLineage 0.12.0 is now available! -We added: -• an Apache Flink integration, -• support for Spark 3.3.0, -• the ability to extend column level lineage mechanism, -• an ErrorMessageRunFacet to the OpenLineage spec, -• SQLCheckExtractors, a RedshiftSQLExtractor & RedshiftDataExtractor to the Airflow integration, -• a dataset builder to the AlterTableCommand class in the Spark integration. -We changed: -• the filtering of Delta events to reduce noise, -• the flow of metadata in the Airflow integration to allow metadata from Airflow through inlets and outlets. -Thanks to all the contributors who made this release possible! -For the bug fixes and more details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.12.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.11.0...0.12.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/ (edited)

- - - -
- ❤️ Minkyu Park, Harel Shein, Willy Lulciuc, Peter Hicks, Fenil Doshi, Maciej Obuchowski, Howard Yoo, Paul Wilson Villena, Jarek Potiuk, Dinakar Sundar, Shubham Mehta, Sharanya Santhanam, Sheeri Cabral (Collibra) -
- -
- 🎉 Minkyu Park, Peter Hicks, Fenil Doshi, Howard Yoo, Jarek Potiuk, Paweł Leszczyński, Ryan Peterson -
- -
- 🚀 Minkyu Park, Howard Yoo, Jarek Potiuk -
- -
- 🙌 Minkyu Park, Willy Lulciuc, Maciej Obuchowski, Howard Yoo, Jarek Potiuk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-02 10:12:01
-
-

What is the right way of handling/parsing facets on the server side?

- -

I see the generated server side stubs are generic : https://github.com/OpenLineage/OpenLineage/blob/main/client/java/generator/src/main/java/io/openlineage/client/Generator.java#L131 and dont have any resolved facet information. -Marquez seems to have duplicated the OL model with https://github.com/MarquezProject/marquez/blob/main/api/src/main/java/marquez/service/models/LineageEvent.java#L71 and converts the incoming OL events to a “LineageEvent” for appropriate handling. Is there a cleaner approach where in the known facets can be generated in io.openlineage.server?

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-02 12:28:11
-
-

*Thread Reply:* I think the reason for server model being very generic is because new facets can be added later (also as custom facets) - and generally server wants to accept all valid events and get the facet information that it can actually use, rather than reject event because it has unknown field.

- -

Server model was added here after some discussion in Marquez which is relevant - I think @Michael Collado @Willy Lulciuc can add to that

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-02 15:54:24
-
-

*Thread Reply:* Thanks for the response. I realize the server stubs were created to support flexibility , but it also makes the parsing logic on server side a bit more complex as we need to maintain code on the server side to look for specific facets & their properties from maps or like maquez duplicate the OL model on our end with the facets we care about. Wanted to know whats the guidance around managing this server side. @Willy Lulciuc @Michael Collado Any suggestions ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-02 18:27:27
-
-

Agenda items are requested for the next OpenLineage Technical Steering Committee meeting on August 11 at 10am PT. Reply in thread or ping me with your item(s)!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-08-03 04:16:22
-
-

Hi all, -I am trying out the openlineage spark integration and can't find any column lineage information included with the events. I tried it out with an input dataset where I renamed one of the columns but the columnLineage facet was not present. Can anyone suggest some other examples where it might show up?

- -

Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-03 04:45:36
-
-

*Thread Reply:* @Paweł Leszczyński do we collect column level lineage on renames?

- - - -
-
-
-
- - - - - - - - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-08-05 05:55:12
-
-

*Thread Reply:* I’ve created an issue for column lineage in case of renaming: -https://github.com/OpenLineage/OpenLineage/issues/993

-
- - - - - - - -
-
Labels
- integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-08-08 09:37:43
-
-

*Thread Reply:* Thanks @Paweł Leszczyński!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-08-03 12:58:44
-
-

Hey everyone! I am looking into Fivetran a bit, and it occurs to me that the NAMING.md document does not have an opinion about how to deal with entire systems as datasets. More in 🧵.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-08-03 13:00:22
-
-

*Thread Reply:* Fivetran is a tool that copies data from source systems to target databases. One of these source systems might be SalesForce, for example.

- -

This copying results in thousands of SQL queries run against the target database for each sync. I don’t think each of these queries should map to an OpenLineage job, I think the entire synchronization should. Maybe I’m wrong here.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-08-03 13:01:00
-
-

*Thread Reply:* But if I’m right, that means that there needs to be a way to specify “SalesForce Account #45123452233” as a dataset.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-08-03 13:01:44
-
-

*Thread Reply:* or it ends up just being a job with outputs and no inputs…but that’s not very illuminating

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-08-03 13:02:27
-
-

*Thread Reply:* or is that good enough?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-04 10:31:11
-
-

*Thread Reply:* You are looking at a pretty big topic here 🙂

- -

Basically you're asking what is a job in OpenLineage - and it's not fully answered yet.

- -

I think the discussion is kinda relevant to this proposed facet and I kinda replied there: https://github.com/OpenLineage/OpenLineage/issues/812#issuecomment-1205337556

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-08-04 15:50:22
-
-

*Thread Reply:* my 2 cents on this is that in the Salesforce example, the system is to complex to capture as a single dataset. and so maybe different objects within a salesforce account (org/account/opportunity/etc…) could be treated as individual datasets. But as @Maciej Obuchowski pointed out, this is quite a large topic 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-08-08 13:46:31
-
-

*Thread Reply:* I guess it depends on whether you actually care about the table/column level lineage for an operation like “copy salesforce to snowflake”.

- -

I can see it being a nuisance having all of that on a lineage graph. OTOH, I can see it being useful to know that a datum can be traced back to a specific endpoint at SFDC.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-08-08 13:46:55
-
-

*Thread Reply:* this is a design decision, IMO.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-04 11:30:00
-
-

@channel The next OpenLineage Technical Steering Committee meeting is on Thursday, August 11 at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom -All are welcome! -Agenda:

- -
  1. Announcements
  2. Docs site update
  3. Release 0.11.0 and 0.12.0 overview
  4. Extractors: examples and how to write them
  5. Open discussion -Notes: https://bit.ly/OLwiki -Is there a topic you think the community should discuss at this or a future meeting? Reply or DM me to add items to the agenda. (edited)
  6. -
-
-
Zoom Video
- - - - - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski, Harel Shein, Paul Wilson Villena -
- -
- 👀 Francis McGregor-Macdonald -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Chris Coulthrust - (coulthrust@gmail.com) -
-
2022-08-06 12:06:47
-
-

👋 Hi everyone!

- - - -
- 👋 Jakub Dardziński, Michael Robinson, Ross Turk, Harel Shein, Willy Lulciuc, Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-10 11:00:01
-
-

@channel The next OpenLineage TSC meeting is tomorrow! https://openlineage.slack.com/archives/C01CK9T7HKR/p1659627000308969

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👀 Howard Yoo -
- -
- ❤️ Minkyu Park -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-10 22:34:29
-
-

*Thread Reply:* I am so sad I'm going to miss this month's meeting 😰 Looking forward to the recording!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 06:19:58
-
-

*Thread Reply:* We missed you too @Will Johnson 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Raj Mishra - (hax0755@gmail.com) -
-
2022-08-11 18:50:18
-
-

Hi everyone! I have a REST endpoint that I use for other pipelines that can POST their RunEvent and I forward that to marquez. I'm expecting a JSON which has the RunEvent details, which also has the input or output dataset depending upon the EventType. I can see the Run details always shows up on the marquez UI, but the dataset has issues. I can see the dataset listed but when I can click on it, just shows "something went wrong." I don't see any details of that dataset. -{ - "eventType": "START", - "eventTime": "2022-08-09T19:49:24.201361Z", - "run": { - "runId": "d46e465b-d358-4d32-83d4-df660ff614dd" - }, - "job": { - "namespace": "TEST-NAMESPACE", - "name": "test-job" - }, - "inputs": [ - { - "namespace": "TEST-NAMESPACE", - "name": "my-test-input", - "facets": { - "schema": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client>", - "_schemaURL": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/spec/OpenLineage.json#/definitions/SchemaDatasetFacet>", - "fields": [ - { - "name": "a", - "type": "INTEGER" - }, - { - "name": "b", - "type": "TIMESTAMP" - }, - { - "name": "c", - "type": "INTEGER" - }, - { - "name": "d", - "type": "INTEGER" - } - ] - } - } - } - ], - "producer": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client>" -} -In above payload, the input data set is never created on marquez. I can only see the Run details, but input data set is just empty. Does the input data set needs to created first and then only the RunEvent can be created?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 06:09:57
-
-

*Thread Reply:* From the first look, you're missing outputsfield in your event - this might break something

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 06:10:20
-
-

*Thread Reply:* If not, then Marquez logs might help to see something

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Raj Mishra - (hax0755@gmail.com) -
-
2022-08-12 13:12:56
-
-

*Thread Reply:* Does the START event needs to have an output?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 13:19:24
-
-

*Thread Reply:* It can have empty output 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 13:32:43
-
-

*Thread Reply:* well, in your case you need to send COMPLETE event

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 13:33:44
-
-

*Thread Reply:* Internally, Marquez does not create dataset version until you complete event. It makes sense when your semantics are transactional - you can still read from previous dataset version until it's finished writing.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 13:34:06
-
-

*Thread Reply:* After I send COMPLETE event with the same information I can see the dataset.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Raj Mishra - (hax0755@gmail.com) -
-
2022-08-12 13:56:37
-
-

*Thread Reply:* Thanks for the explanation @Maciej Obuchowski So, if I understand this correct. I won't see the my-test-input dataset till I have the COMPLETE event with input and output?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 14:34:51
-
-

*Thread Reply:* @Raj Mishra Yes and no 🙂

- -

Basically your COMPLETE event does not need to contain any input and output datasets at all - OpenLineage model is cumulative, so it's enough to have datasets on either start or complete. -That also means you can add different datasets in different moment of a run lifecycle - for example, you know inputs, but not outputs, so you emit inputs on START , but not COMPLETE.

- -

Or, the job is modifying the same dataset it reads from (which happens surprisingly often), Then, you want to collect various input metadata from the dataset before modifying it - most likely you won't have them on COMPLETE 🙂

- -

In this example I've added my-test-input on START and my-test-input2 on COMPLETE :

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Raj Mishra - (hax0755@gmail.com) -
-
2022-08-12 14:47:56
-
-

*Thread Reply:* @Maciej Obuchowski Thank you so much! This is great explanation.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-11 20:28:40
-
-

Effectively handling file datasets on server side. We have a common usecase where dataset of type is produced/consumed per day. On the Lineage UI/server side it would be ideal to treat all files of this pattern as 1 dataset Vs 1 dataset per daily file. Any suggestions ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-11 20:35:33
-
-

*Thread Reply:* Would adding support for alias/grouping as a config on OL client side be valuable to other users ? i.e OL client could pass down an Alias/grouping facet Or should this be treated purely a server side feature

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 06:11:21
-
-

*Thread Reply:* Agreed 🙂

- -

How do you produce this dataset? Spark integration? Are you using any system like Apache Iceberg/Delta Lake or just writing raw files?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-12 12:59:48
-
-

*Thread Reply:* these are raw files written from Spark or map reduce jobs. And downstream Spark jobs read these raw files to produce tables

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 13:27:34
-
-

*Thread Reply:* written using Spark dataframe API, like -df.write.format("parquet").save("/tmp/spark_output/parquet") - or RDD?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 13:27:59
-
-

*Thread Reply:* the actual API used matters, because we're handling different cases separately

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-12 13:29:48
-
-

*Thread Reply:* I see. Let me look that up to be absolutely sure

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-12 19:21:41
-
-

*Thread Reply:* It is like. this : df.write.format("parquet").save("/tmp/spark_output/parquet")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-15 12:43:45
-
-

*Thread Reply:* @Maciej Obuchowski curious what you had in mind with respect to RDDs & Dataframes. Also what if we cannot integrate OL with the frameworks that produce this dataset , but only those that consume from the already produced datasets. Is there a way we could still capture the dataset appropriately ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-16 05:30:57
-
-

*Thread Reply:* @Sharanya Santhanam the naming should be consistent between reading and writing, so it wouldn't change much of you can't integrate OL into writers. For the rest, can you create an issue on OL GitHub so someone can pick it up? I'm at vacation now.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2022-08-16 15:08:41
-
-

*Thread Reply:* Sounds good , Ty !

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-08-12 06:02:00
-
-

Hi, Minor Suggestion: -This line https://github.com/OpenLineage/OpenLineage/blob/46efab1e7c2a0aa5ebe8d11185fe8d5225[…]/app/src/main/java/io/openlineage/spark/agent/EventEmitter.java is printing variables like api key and other parameters in the logs. Wouldn't it be more appropriate to use log.debug instead? -I'll create an issue if others agree

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 06:09:11
-
-

*Thread Reply:* yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-12 06:09:32
-
-

*Thread Reply:* please do create 🙂

- - - -
- ✅ Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-08-15 09:01:47
-
-

dumb question but, is it easy to run all the OpenLineage tests locally? ( and if so how? 🙂 )

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-17 13:54:19
-
-

*Thread Reply:* it's per project. -java based: ./gradlew test -python based: https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow#development

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-18 23:45:30
-
-

Spark Integration: The Order of Processing Events in the Async Event Queue

- -

Hey, OpenLineage team, I'm working on a PR (https://github.com/OpenLineage/OpenLineage/pull/849/) that is going to store information given in different spark events (e.g. SparkListenerSQLExecutionStart, SparkListenerJobStart).

- -

However, I want to avoid holding all this data once the execution of the job is complete. As a result, I want to remove the data once I receive a SparkListenerSQLExecutionEnd.

- -

However, can I be guaranteed that the ExecutionEnd event will be processed AFTER the JobStart event? Is it possible that I can take too long to process the the JobStart event that the ExecutionEnd executes prior to the JobStart finishing?

- -

I know we do something similar to this with sparkSqlExecutionRegistry (https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/mai[…]n/java/io/openlineage/spark/agent/OpenLineageSparkListener.java) but do we have any docs to help explain how the AsyncEventQueue orders and consumes events for a listener?

- -

Thank you so much for any insights

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:38:10
-
-

*Thread Reply:* Hey Will! A bunch of folks are on vacation or out this week. Sorry for the delay, I am personally not sure but if it's not too urgent you can have an answer when knowledgable folks are back.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-19 20:21:18
-
-

*Thread Reply:* Hah! No worries, @Julien Le Dem! I can definitely wait for the lucky people who are enjoying the last few weeks of summer unlike the rest of us 😋

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-29 05:31:32
-
-

*Thread Reply:* @Paweł Leszczyński might want to look at that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-08-19 01:53:56
-
-

Hi, -I try to find out if openLineage spark support pyspark (Non-sql) use cases? -Is there any doc I could get more details about non-sql openLineage support? -Thanks a lot

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 12:30:08
-
-

*Thread Reply:* Hello Hanbing, the spark integration works for PySpark since pyspark is wrapped into regular spark operators.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-08-19 13:49:35
-
-

*Thread Reply:* @Julien Le Dem Thanks a lot for your help. I searched around, but I couldn't find any doc introduce how pyspark supported in openLineage. -My company want to integrate with openLineage-spark, I am working on figure out what info does OpenLineage make available for non-sql and does it at least have support for logging the logical plan?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:26:48
-
-

*Thread Reply:* Yes, it does send the logical plan as part of the event

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:27:32
-
-

*Thread Reply:* This configuration here should work as well for pyspark https://openlineage.io/docs/integrations/spark/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:28:11
-
-

*Thread Reply:* --conf "spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:28:26
-
-

*Thread Reply:* you need to add the jar, set the listener and pass your OL config

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:31:11
-
-

*Thread Reply:* Actually I'm demoing this at 27:10 right here 🙂 https://pretalx.com/bbuzz22/talk/FHEHAL/

-
-
pretalx
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:32:11
-
-

*Thread Reply:* you can see the parameters I'm passing to the pyspark command line in the video

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-08-19 18:35:50
-
-

*Thread Reply:* @Julien Le Dem Thanks for the info, Let me take a look at the video now.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-19 18:40:10
-
-

*Thread Reply:* The full demo starts at 24:40. It shows lineage connected together in Marquez coming from 3 different sources: Airflow, Spark and a custom integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-22 14:32:53
-
-

Hi everyone, a release has been requested by @Harel Shein. As per our policy here, 3 +1s from committers will authorize an immediate release. Thanks! -Unreleased commits: https://github.com/OpenLineage/OpenLineage/compare/0.12.0...HEAD

-
- - - - - - - - - - - - - - - - -
- - - -
- ➕ Willy Lulciuc, Michael Robinson, Minkyu Park, Jakub Dardziński, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-08-22 14:38:58
-
-

*Thread Reply:* @Michael Robinson can we start posting the “Unreleased” section in the changelog along with the release request? That way, we / the community will know what will be in the upcoming release

- - - -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-22 15:00:37
-
-

*Thread Reply:* The release is approved. Thanks @Willy Lulciuc, @Minkyu Park, @Harel Shein

- - - -
- 🙌 Willy Lulciuc, Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-22 16:18:30
-
-

@channel -OpenLineage 0.13.0 is now available! -We added: -• BigQuery check support -• RUNNING EventType in the spec and Python client -• databases and schemas to SQL extractors -• an event forwarding feature via HTTP -• Azure Cosmos Handler to the Spark integration -• support for OL datasets in manual lineage inputs/outputs -• ownership facets. -We changed: -• use RUNNING EventType in Flink integration for currently running jobs -• convert task object into JSON encodable when creating Airflow version facet. -Thanks to all the contributors who made this release possible! -For the bug fixes and more details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.13.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.12.0...0.13.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/ (edited)

- - - -
- 🎉 Harel Shein, Ross Turk, Jarek Potiuk, Sheeri Cabral (Collibra), Willy Lulciuc, Howard Yoo, Howard Yoo, Ernie Ostic, Francis McGregor-Macdonald -
- -
- ✅ Sheeri Cabral (Collibra), Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-08-23 03:55:24
-
-

*Thread Reply:* Cool! Are the new ownership facets populated by the Airflow integration ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
AMRIT SARKAR - (sarkaramrit2@gmail.com) -
-
2022-08-24 08:23:35
-
-

Hi everyone, excited to work with OpenLineage. I am new to both OpenLineage and Data Lineage in general. Are there working examples/blog posts around actually integrating OpenLineage with existing graph DBs like Neo4J, Neptune etc? (I understand the service layer in between) I understand we have Amundsen with sample open lineage sample data - databuilder/example/sample_data/openlineage/sample_openlineage_events.ndjson. Thanks in advance.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-25 18:15:59
-
-

*Thread Reply:* There is not that I know of besides the Amundsen integration example you pointed at. -A basic idea to do such a thing would be to implement an OpenLineage endpoint (receive the lineage events through http posts) and convert them to a format the graph db understand. If others in the community have ideas, please chime in

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
AMRIT SARKAR - (sarkaramrit2@gmail.com) -
-
2022-09-01 13:48:09
-
-

*Thread Reply:* Understood, thanks a lot Julien. Make sense.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-08-25 17:30:46
-
-

Hey all, can I ask for a release for OpenLineage?

- - - -
- 👍 Harel Shein, Minkyu Park, Michael Robinson, Michael Collado, Ross Turk, Julien Le Dem, Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-08-25 17:32:44
-
-

*Thread Reply:* @Michael Robinson ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-25 17:34:04
-
-

*Thread Reply:* Thanks, Harel. 3 +1s from committers is all we need to make this happen today.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2022-08-25 17:52:40
-
-

*Thread Reply:* 🙏

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-25 18:09:51
-
-

*Thread Reply:* Thanks, all. The release is authorized

- - - -
- 🎉 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-08-25 18:16:44
-
-

*Thread Reply:* can you also state the main purpose for this release?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-25 18:25:49
-
-

*Thread Reply:* I believe (correct me if wrong, @Harel Shein) that this is to make available a fix of a bug in the compare functionality

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2022-08-25 18:27:53
-
-

*Thread Reply:* ParentRunFacet from the airflow integration is not compliant to OpenLineage spec and this release includes the fix of that so that the marquez can handle parent run/job information.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-08-25 18:49:30
-
-

@channel -OpenLineage 0.13.1 is now available! -We fixed: -• Rename all parentRun occurrences to parent from Airflow integration #1037 @fm100 -• Do not change task instance during on_running event #1028 @JDarDagran -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.13.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.13.0...0.13.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🎉 Harel Shein, Minkyu Park, Ross Turk, Michael Collado, Howard Yoo -
- -
- ❤️ Minkyu Park, Ross Turk, Howard Yoo -
- -
- 🥳 Minkyu Park, Ross Turk, Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-26 18:58:17
-
-

Hi, I am new to openlineage. Any one know how to enable spark column level lineage? I saw the code comment, it said default is disabled, thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-08-26 19:26:22
-
-

*Thread Reply:* What version of Spark are you using? it should be enabled by default for Spark 3 -https://openlineage.io/docs/integrations/spark/spark_column_lineage

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-26 20:21:12
-
-

*Thread Reply:* Thanks. Good to here that. I am use 0.9.+ . I will try again

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-29 13:14:01
-
-

*Thread Reply:* I tested 0.9.+ 0.12.+ with spark 3.0 and 3.2 version. There still do not have dataset facet columnlineage. This is strange. I saw the column lineage design proposals 148. It should support from 0.9.+ Do I miss something?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-29 13:14:41
-
-

*Thread Reply:* @Harel Shein

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-30 00:56:18
-
-

*Thread Reply:* @Jason it depends on the data source. What sort of data are you trying to read? Is it in a hive metastore? Is it on an S3 bucket? Is it a delta file format?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-30 13:51:03
-
-

*Thread Reply:* I tried read hive megastore on s3 and cave file on local. All are miss the columnlineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-31 00:33:17
-
-

*Thread Reply:* @Jason - Sorry, you'll have to translate a bit for me. Can you share a snippet of code you're using to do the read and write? Is it a special package you need to install or is it just using the hadoop standard for S3? https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-31 20:00:47
-
-

*Thread Reply:* spark.read \ - .option("header", "true") \ - .option("inferschema", "true") \ - .csv("data/input/batch/wikidata.csv") \ - .write \ - .mode('overwrite') \ - .csv("data/output/batch/python-sample.csv")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-31 20:01:21
-
-

*Thread Reply:* This is simple code run on my local for testing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-31 21:41:31
-
-

*Thread Reply:* Which version of OpenLineage are you running? You might look at the code on the main branch. This looks like a HadoopFSRelation which I implemented for column lineage but the latest release (0.13.1) does not include it yet.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-31 21:42:05
-
-

*Thread Reply:* Specifically this commit is what implemented it. -https://github.com/OpenLineage/OpenLineage/commit/ce30178cc81b63b9930be11ac7500ed34808edd3

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-31 22:02:16
-
-

*Thread Reply:* I see. I use 0.13.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-09-01 12:04:41
-
-

*Thread Reply:* @Jason we have our monthly release coming up now, so it should be included in 0.14.0 when released today/tomorrow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-09-01 12:52:52
-
-

*Thread Reply:* Great. Thanks Harel.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Raj Mishra - (hax0755@gmail.com) -
-
2022-08-28 17:46:38
-
-

Hi! I have ran into some issues and wanted to clarify my doubts. -• Why are input schema changes(column delete, new columns) doesn't show up on the UI. I have changed the input schema for the same job, but I'm not seeing getting updated on the UI. -• Why is there only ever 1 input schema version. Every change I make in input schema, I only see output schema has multiple versions but only 1 version for input schema. -• Is there a reason why can't we see the input schema till the COMPLETE event is posted? -I have used the examples from here. https://openlineage.io/getting-started/ -curl -X POST <http://localhost:5000/api/v1/lineage> \ - -H 'Content-Type: application/json' \ - -d '{ - "eventType": "START", - "eventTime": "2020-12-28T19:52:00.001+10:00", - "run": { - "runId": "d46e465b-d358-4d32-83d4-df660ff614dd" - }, - "job": { - "namespace": "my-namespace", - "name": "my-job" - }, - "inputs": [{ - "namespace": "my-namespace", - "name": "my-input" - }], - "producer": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client>" - }' -curl -X POST <http://localhost:5000/api/v1/lineage> \ - -H 'Content-Type: application/json' \ - -d '{ - "eventType": "COMPLETE", - "eventTime": "2020-12-28T20:52:00.001+10:00", - "run": { - "runId": "d46e465b-d358-4d32-83d4-df660ff614dd" - }, - "job": { - "namespace": "my-namespace", - "name": "my-job" - }, - "outputs": [{ - "namespace": "my-namespace", - "name": "my-output", - "facets": { - "schema": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client>", - "_schemaURL": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/spec/OpenLineage.json#/definitions/SchemaDatasetFacet>", - "fields": [ - { "name": "a", "type": "VARCHAR"}, - { "name": "b", "type": "VARCHAR"} - ] - } - } - }], - "producer": "<https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client>" - }' -Changing the inputs schema for START doesn't change the schema input version and doesn't update the UI. -Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-29 05:29:52
-
-

*Thread Reply:* Reading dataset - which input dataset implies - does not mutate the dataset 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-29 05:30:14
-
-

*Thread Reply:* If you change the dataset, it would be represented as some other job with this datasets in the outputs list

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Raj Mishra - (hax0755@gmail.com) -
-
2022-08-29 12:42:55
-
-

*Thread Reply:* So, changing the input dataset will always create new output data versions? Sorry I have trouble understanding this, but if the input is changing, shouldn't the input data set will have different versions?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-01 08:35:42
-
-

*Thread Reply:* @Raj Mishra if input is changing, there should be something else in your data infrastructure that changes this dataset - and it should emit this dataset as output

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-08-29 12:21:52
-
-

Hi Everyone, new here. i went thourhg the docs and examples. cant seem to understand how can i model views on top of base tables if not from a data processing job but rather via modeling something static that is coming from some software internals. i.e. i want to issue the lineage my self rather it will learn it dynamically from some Airflow DAG or spark DAG

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-29 12:35:32
-
-

*Thread Reply:* I think you want to emit raw events using python or java client: https://openlineage.io/docs/client/python

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-29 12:35:46
-
-

*Thread Reply:* (docs in progress 😉)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-08-30 02:07:02
-
-

*Thread Reply:* can you give a hind what should i look for for modeling a dataset on top of other dataset? potentially also map columns?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-08-30 02:12:50
-
-

*Thread Reply:* i can only see that i can have a dataset as input to a job run and not for another dataset

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-01 08:34:35
-
-

*Thread Reply:* Not sure I understand - jobs process input datasets into output datasets. There is always something that can be modeled into a job that consumes input and produces output.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-01 10:30:51
-
-

*Thread Reply:* so openlineage force me to put a job between datasets? does not fit our use case

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-01 10:31:09
-
-

*Thread Reply:* unless we can some how easily hide the process that does that on the graph.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-29 20:41:19
-
-

QQ, I saw that spark Column level lineage start with open lineage 0.9.+ version with spark 3.+, Does it mean it needs to run lower than open lineage 0.9 if our spark is 2.3 or 2.4?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-30 04:44:06
-
-

*Thread Reply:* I don't think it will work for Spark 2.X.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-30 13:42:20
-
-

*Thread Reply:* Is there have plan to support spark 2.x?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-08-30 14:00:38
-
-

*Thread Reply:* Nope - on the other hand we plan to drop any support for it, as it's unmaintained for quite a bit and vendors are dropping support for it too - afaik Databricks in April 2023.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-30 17:19:43
-
-

*Thread Reply:* I see. Thanks. Amazon Emr still support spark 2.x

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-08-30 01:15:10
-
-

Spark Integration: Handling Data Source V2 API datasets

- -

Is it expected that a DataSourceV2 relation has a start event with inputs and outputs but a complete event with only outputs? Based on @Michael Collado’s previous comments, I think it's fair to say YES this is expected and we just need to handle it. https://openlineage.slack.com/archives/C01CK9T7HKR/p1645037070719159?thread_ts=1645036515.163189&cid=C01CK9T7HKR

- -

@Hanna Moazam and I noticed this behavior when we looked at the Cosmos Db visitor and then reproduced it for the Iceberg visitor. We traced it down to the fact that the AbstractQueryPlanInputDatasetBuilder (which is the parent of DataSourceV2RelationInputDatasetBuilder) has an isDefinedAt that only includes SparkListenerJobStart and SparkListenerSQLExecutionStart

- -

This means an Iceberg COMPLETE event will NEVER contain inputs because the isDefinedAt will always be false (since COMPLETE only fires for JobEnd and ExecutionEnd events). Does that sound correct (@Paweł Leszczyński)?

- -

It seems that Delta tables (or at least Delta on Databricks) does not follow this same code path and as a result our complete events includes outputs AND inputs.

-
- - -
- - - } - - Michael Collado - (https://openlineage.slack.com/team/U01NNCBCP6K) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-01 05:56:13
-
-

*Thread Reply:* At least for Iceberg I've done it, since I want to emit DatasetVersionDatasetFacet for input dataset only at START - and after I finish writing the dataset might have different version than before writing.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-01 05:58:59
-
-

*Thread Reply:* Same should be for output AFAIK - output version should be emitted only on COMPLETE, since the version changes after I finish writing.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-01 09:52:30
-
-

*Thread Reply:* Ah! Okay, so this still requires us to truly combine START and COMPLETE to get a TOTAL picture of the entire run. Is that fair?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-01 10:30:41
-
-

*Thread Reply:* Yes

- - - -
- 👍 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-01 10:31:21
-
-

*Thread Reply:* As usual, thank you Maciej for the responses and insights!

- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-31 22:19:44
-
-

QQ team, I use spark sql with openlineage namespace weblog: spark.sql(“select ** from weblog where dt=‘1’”).write.orc(“…”) there have two issues 1, there have no upstream dataset weblog on Marquez UI. 2, there have new namespace s3-cdp-prod-hive created. It should the bucket of s3. Am I missing something? Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-09-07 14:13:34
-
-

*Thread Reply:* Anyone can help for it? Does I miss something

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason - (shzhan@coupang.com) -
-
2022-08-31 22:21:57
-
-

Here is the Marquez UI

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-01 07:34:24
-
-

Hi everyone, I’m opening up a vote on this month’s OpenLineage release. 3 +1s from committers will authorize. Additions include support for KustoRelationHandler in Kusto (Azure Data Explorer) and for ABFSS and Hadoop Logical Relation, both in the Spark integration. All commits can be found here: https://github.com/OpenLineage/OpenLineage/compare/0.13.1...HEAD. Thanks in advance!

- - - -
- ➕ Maciej Obuchowski, Ross Turk, Paweł Leszczyński, Will Johnson, Hanna Moazam -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-01 13:18:59
-
-

*Thread Reply:* Thanks. The release is authorized. It will be initiated within 2 business days.

- - - -
- 🙌 Will Johnson, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-05 07:57:02
-
-

Is there a reference on how to deploy openlineage on a Non AWS infrastructure ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-08 10:31:44
-
-

*Thread Reply:* Which integration are you looking to implement?

- -

And what environment are you looking to deploy it on? The Cloud? On-Prem?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-08 10:40:11
-
-

*Thread Reply:* We are planning to deploy on premise with Kerberos as authentication for postgres

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-08 11:27:06
-
-

*Thread Reply:* Ah! Are you planning on running Marquez as well and that is your main concern or are you planning on building your own store of OpenLineage Events and using the SQL integration to generate those events?

- -

https://github.com/OpenLineage/OpenLineage/tree/main/integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-08 11:33:44
-
-

*Thread Reply:* I am looking to deploy Marquez on-prem with onprem postgres as back-end with Kerberos authentication.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-08 11:34:32
-
-

*Thread Reply:* Is the the right forum for Marquez as well or there is different slack channel for Marquez available

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-08 11:46:35
-
-

*Thread Reply:* https://bit.ly/MarquezSlack

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-08 11:47:14
-
-

*Thread Reply:* There is another slack channel just for Marquez! That might be a better spot with more dedicated Marquez developers.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-06 15:52:32
-
-

@channel -OpenLineage 0.14.0 is now available! -We added: -• Support ABFSS and Hadoop Logical Relation in Column-level lineage #1008 @wjohnson -• Add Kusto relation visitor #939 @hmoazam -• Add ColumnLevelLineage facet doc #1020 @julienledem -• Include symlinks dataset facet #935 @pawel-big-lebowski -• Add support for dbt 1.3 beta’s metadata changes #1051 @mobuchowski -• Support Flink 1.15 #1009 @mzareba382 -• Add Redshift dialect to the SQL integration #1066 @mobuchowski -We changed: -• Make the timeout configurable in the Spark integration #1050 @tnazarew -We fixed: -• Add a dialect parameter to Great Expectations SQL parser calls #1049 @collado-mike -• Fix Delta 2.1.0 with Spark 3.3.0 #1065 @pawel-big-lebowski -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.14.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.13.1...0.14.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- ❤️ Willy Lulciuc, Howard Yoo, Alexander Wagner, Hanna Moazam, Minkyu Park, Grayson Stream, Paweł Leszczyński, Maciej Obuchowski, Conor Beverland, Jason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-09-06 15:54:30
-
-

*Thread Reply:* Thanks for breaking up the changes in the release! Love the new format 💯

- - - -
- 🙌 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-07 09:05:35
-
-

Hello all, I’m requesting a patch release to fix a bug in the Spark integration. Currently, OpenlineageSparkListener fails when no openlineage.timeout is provided. PR #1069 by @Paweł Leszczyński, merged today, will fix it. As per our policy here, 3 +1s from committers will authorize an immediate release.

- - - -
- ➕ Paweł Leszczyński, Maciej Obuchowski, Howard Yoo, Willy Lulciuc, Ross Turk, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-09-07 10:00:11
-
-

*Thread Reply:* Is PR #1069 all that’s going in 0.14.1 ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-07 10:27:39
-
-

*Thread Reply:* There’s also 1058. 1069 is urgently needed. We can technically wait…

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-07 10:30:31
-
-

*Thread Reply:* (edited prior message because I’m not sure how accurately I was describing the issue)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-09-07 10:39:32
-
-

*Thread Reply:* Thanks for clarifying!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-07 10:50:29
-
-

*Thread Reply:* Thanks, all. The release is authorized.

- - - -
- ❤️ Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-07 11:04:39
-
-

*Thread Reply:* 1058 also fixes some bugs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-08 01:55:41
-
-

Hello all, question: Views on top of base table is also a use case for lineage and there is no job in between. i dont seem to find a way to have a dataset on top of others to represent a view on top of tables. is there a way to do that without a job in between?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-08 04:41:07
-
-

*Thread Reply:* Usually there is something creating the view, for example dbt materialization: https://docs.getdbt.com/docs/building-a-dbt-project/building-models/materializations

- -

Besides that, there is this proposal that did not get enough love yet https://github.com/OpenLineage/OpenLineage/issues/323

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-08 04:53:23
-
-

*Thread Reply:* but we are not working iwth dbt. we try to model lineage of our internal view/tables hirarchy which is related to a propriety application of ours. so we like OpenLineage that lets me explicily model stuff and not only via scanning some DW. but in that case we dont want a job in between.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-08 04:58:47
-
-

*Thread Reply:* this PR does not seem to support lineage between datasets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-08 12:49:48
-
-

*Thread Reply:* This is something core to the OpenLineage design - the lineage relationships are defined as dataset-job-dataset, not dataset-dataset.

- -

In OpenLineage, something observes the lineage relationship being created.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-08 12:50:13
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
- 🙌 Will Johnson, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-08 12:51:15
-
-

*Thread Reply:* It’s a bit different from some other lineage approaches, but OL is intended to be a push model. A job is observed as it runs, metadata is pushed to the backend.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-08 12:54:27
-
-

*Thread Reply:* so in this case, according to openlineage 🙂, the job would be whatever runs within the pipeline that creates the view. very operational point of view.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-11 12:27:42
-
-

*Thread Reply:* but what about the view definition use case? u have lineage of columns in view/base table relation ships

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-11 12:28:05
-
-

*Thread Reply:* how would you model that in OpenLineage? would you create a dummy job ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-11 12:31:57
-
-

*Thread Reply:* would you say that because this is my use case i might better choose some other lineage tool?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-11 12:33:04
-
-

*Thread Reply:* for the context: i am not talking about some view and table definitions in some warehouse e.g. SF but its internal data processing mechanism with propriety view/tables definition (in Flink SQL) and we want to push this metadata for visibility

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-12 17:20:13
-
-

*Thread Reply:* Ah, gotcha. Yeah, I would say it’s probably best to create a job in this case. You can send the view definition using a sourcecodefacet, so it will be collected as well. You’d want to send START and STOP events for it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-12 17:22:03
-
-

*Thread Reply:* regarding the PR linked before, you are right - I wonder if someday the spec should have a way to express “the system was made aware that these datasets are related, but did not observe the relationship being created so it can’t tell you i.e. how long it took or whether it changed over time”

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-09-09 10:25:21
-
-

@channel -OpenLineage 0.14.1 is now available! -We fixed: -• Fix Spark integration issues including error when no openlineage.timeout #1069 @pawel-big-lebowski -Bug fixes were also included in this release. -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.14.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.14.0...0.14.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Maciej Obuchowski, Willy Lulciuc, Howard Yoo, Francis McGregor-Macdonald, AMRIT SARKAR -
- -
-
-
-
- - - - - -
-
- - - - -
- -
data_fool - (data.fool.me@gmail.com) -
-
2022-09-09 13:52:39
-
-

Hello, any future plans for integrating Airbyte with openlineage?

- - - -
- 👋 Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-09-09 14:01:13
-
-

*Thread Reply:* Hey, @data_fool! Not in the near term. but of course we’d love to see this happen. We’re open to having an Airbyte integration driven by the community. Want to open an issue to start the discussion?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
data_fool - (data.fool.me@gmail.com) -
-
2022-09-09 15:36:20
-
-

*Thread Reply:* hey @Willy Lulciuc, Yep, will open an issue. Thanks!

- - - -
- 🙌 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hubert Dulay - (hubert.dulay@gmail.com) -
-
2022-09-10 22:00:10
-
-

Hi can you create lineage across namespaces? Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-09-12 19:26:25
-
-

*Thread Reply:* yes!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-26 10:31:56
-
-

*Thread Reply:* Any example or ticket on how to lineage across namespace

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-09-12 02:27:49
-
-

Hello, Does OpenLineage support column level lineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-12 04:56:13
-
-

*Thread Reply:* Yes https://openlineage.io/blog/column-lineage/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-09-22 02:18:45
-
-

*Thread Reply:* • More details on Spark & Column level lineage integration: https://openlineage.io/docs/integrations/spark/spark_column_lineage -• Proposal on how to implement column level lineage in Marquez (implementation is currently work in progress): https://github.com/MarquezProject/marquez/blob/main/proposals/2045-column-lineage-endpoint.md -@Iftach Schonbaum let us know if you find the information useful.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-12 15:29:12
-
-

where can i find docs on just simply using extractors? without marquez. for example, a basic BashOperator on Airflow 1.10.15

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-12 15:30:08
-
-

*Thread Reply:* or is it automatic for anything that exists in extractors/?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howard.yoo@astronomer.io) -
-
2022-09-12 15:30:16
-
-

*Thread Reply:* Yes

- - - -
- 👍 Paul Lee -
- -
- :gratitude_thank_you: Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-12 15:31:12
-
-

*Thread Reply:* so anything i add to extractors directory with the same name as the operator will automatically extract the metadata from the operator is that correct?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howard.yoo@astronomer.io) -
-
2022-09-12 15:31:31
-
-

*Thread Reply:* Well, not entirely

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howard.yoo@astronomer.io) -
-
2022-09-12 15:31:47
-
-

*Thread Reply:* please take a look at the source code of one of the extractors

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howard.yoo@astronomer.io) -
-
2022-09-12 15:32:13
-
-

*Thread Reply:* also, there are docs available at openlineage.io/docs

- - - -
- 🙏 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-12 15:33:45
-
-

*Thread Reply:* ok, i'll take a look. i think one thing that would be helpful is having a custom setup without marquez. a lot of the docs or videos i found were integrated with marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howard.yoo@astronomer.io) -
-
2022-09-12 15:34:29
-
-

*Thread Reply:* I see. Marquez is a openlineage backend that stores the lineage data, so many examples do need them.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howard.yoo@astronomer.io) -
-
2022-09-12 15:34:47
-
-

*Thread Reply:* If you do not want to run marquez but just test out the openlineage, you can also take a look at OpenLineage Proxy.

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-12 15:35:14
-
-

*Thread Reply:* awesome thanks Howard! i'll take a look at these resources and come back around if i need to

- - - -
- 👍 Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-12 16:01:45
-
-

*Thread Reply:* http://openlineage.io/docs/integrations/airflow/extractor - this is the doc you might want to read

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🎉 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-12 17:08:49
-
-

*Thread Reply:* yeah, saw that doc earlier. thanks @Maciej Obuchowski appreciate it 🙏

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jay - (sanjay.sudhakaran@trovemoney.co.nz) -
-
2022-09-21 20:55:24
-
-

Hey team! I’m pretty new to the field in general

- -

In the real world, I would be running pyspark scripts on AWS EMR. Could you explain to me how the metadata is sent to Marquez from my pyspark script, and where it’s persisted?

- -

Would I need to set up an S3 bucket to store the lineage data?

- -

I’m also unsure about how I would run the Marquez UI on AWS - Would I need to have an EC2 instance running permanently in order to access that UI?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jay - (sanjay.sudhakaran@trovemoney.co.nz) -
-
2022-09-21 20:57:39
-
-

*Thread Reply:* In my head, I have:

- -

Pyspark script -> Store metadata in S3 -> Marquez UI gets data from S3 and displays it

- -

I suspect this is incorrect?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-09-22 02:14:50
-
-

*Thread Reply: It’s more like: you add openlineage jar to Spark job, configure it what to do with the events. Popular options are: - * sent to rest endpoint (like Marquez), - * send as an event onto Kafka, - * print it onto console -There is no S3 in between Spark & Marquez by default. -Marquez serves both as an API where events are sent and UI to investigate them.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jay - (sanjay.sudhakaran@trovemoney.co.nz) -
-
2022-09-22 17:36:10
-
-

*Thread Reply:* Yeah S3 was just an example for a storage option.

- -

I actually found the answer I was looking for, turns out I had to look at Marquez documentation: -https://marquezproject.ai/resources/deployment/

- -

The answer is that Marquez uses a postgres instance to persist the metadata it is given. Thanks for your time though! I appreciate the effort 🙂

- - - -
- 👍 Kevin Adams -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-09-25 17:06:41
-
-

Hello team, -For the OpenLineage Spark, even when I processed one Spark sql query (CTAS Create Table As Select), I will received multiple events back (2+ Start events, 2 Complete events). -I try to understand why OpenLineage need to send back that much events, and what is the primary difference between Start VS Start events, Start VS Complete events? -Do we have any doc can help me understand more on it? -Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-26 00:27:05
-
-

*Thread Reply:* The Spark execution model follows:

- -
  1. Spark SQL Execution Start event
  2. Spark Job Start event
  3. Spark Job End event
  4. Spark SQL Execution End event -As a result, OpenLineage tracks all of those execution and jobs. There is a proposed plan to distinguish between those events (e.g. you wouldn't get two starts but one Start and one Job Start or something like that).
  5. -
- -

You should collect all of these events in order to be sure you are receiving all the data since each event may contain a subset of the complete facets that represent what occurred in the job.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-09-26 15:16:26
-
-

*Thread Reply:* Thanks @Will Johnson -Can I get an example of how the proposed plan can be used to distinguish between start and job start events? -Because I compare the 2 starts events I got, only the event_time is different, all other information are the same.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-09-26 15:30:34
-
-

*Thread Reply:* One followup question, if I process multiple queries in one command, for example (Drop + Create Table + Insert Overwrite), should I expected for -(1). 1 Spark SQL execution start event -(2). 3 Spark job start event (Each query has a job start event ) -(3). 3 Spark job end event (Each query has a job end event ) -(4). 1 Spark SQL execution end event

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-27 10:25:47
-
-

*Thread Reply:* Re: Distinguish between start and job start events. There was a proposal to differentiate the two (https://github.com/OpenLineage/OpenLineage/issues/636) but the current discussion is here: https://github.com/OpenLineage/OpenLineage/issues/599 As it currently stands, there is not a way to tell which one is which (I believe). The design of OpenLineage is such that you should consume ALL events under the same run id and job name / namespace.

- -

Re: Multiple Queries in One Command: This is where Spark's execution model comes into play. I believe each one of those commands are executed sequentially and as a result, you'd actually get three execution start and three execution end. If you chose DROP + Create Table As Select, that would be only two commands and thus only two execution start events.

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-09-27 16:49:37
-
-

*Thread Reply:* Thanks a lot for your help 🙏 @Will Johnson, -For multiple queries in one command, I still have a confused place why Drop + CreateTable and Drop + CreateTableAsSelect act different.

- -

When I test Drop + Create Table -Query: -DROP TABLE IF EXISTS shadow_test.test_sparklineage_4; CREATE TABLE IF NOT EXISTS shadow_test.test_sparklineage_4 (val INT, region STRING) PARTITIONED BY ( ds STRING ) STORED AS PARQUET; -I only received 1 start + 1 complete event -And the events only contains DropTableCommandVisitor/DropTableCommand. -I expected we should also received start and complete events for CreateTable query with CreateTableCommanVisitor/CreateTableComman .

- -

But when I test Drop + Create Table As Select -Query: -DROP TABLE IF EXISTS shadow_test.test_sparklineage_5; CREATE TABLE IF NOT EXISTS shadow_test.test_sparklineage_5 AS SELECT ** from shadow_test.test_sparklineage where ds &gt; '2022-08-24'" -I received 1 start + 1 complete event with DropTableCommandVisitor/DropTableCommand -And 2 start + 2 complete events with CreateHiveTableAsSelectCommandVisitor/CreateHiveTableAsSelectCommand

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-27 22:03:38
-
-

*Thread Reply:* @Hanbing Wang are you running this on Databricks with a hive metastore that is defaulting to Delta by any chance?

- -

I THINK there are some gaps in OpenLineage because of the way Databricks Delta handles things and now there is Unity catalog that is causing some hiccups as well.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-28 09:18:48
-
-

*Thread Reply:* > For multiple queries in one command, I still have a confused place why Drop + CreateTable and Drop + CreateTableAsSelect act different. -@Hanbing Wang That's basically why we capture all the events (SQL Execution, Job) instead of one of them. We're just inconsistently notified of them by Spark.

- -

Some computations emit SQL Execution events, some emit Job events, I think majority emits both. This also differs by spark version.

- -

The solution OpenLineage assumes is having cumulative model of job execution, where your backend deals with possible duplication of information.

- -

> I THINK there are some gaps in OpenLineage because of the way Databricks Delta handles things and now there is Unity catalog that is causing some hiccups as well. -@Will Johnson would be great if you created issue with some complete examples

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-09-28 15:44:45
-
-

*Thread Reply:* @Will Johnson and @Maciej Obuchowski Thanks a lot for your help -We are not running on Databricks. -We implemented the OpenLineage Spark listener, and custom the Event Transport which emitting the events to our own events pipeline with a hive metastore. -We are using Spark version 3.2.1 -OpenLineage version 0.14.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-29 15:16:28
-
-

*Thread Reply:* Ooof! @Hanbing Wang then I'm not certain why you're not receiving the extra event 😞 You may need to run your spark cluster in debug mode to step through the Spark Listener.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-29 15:17:08
-
-

*Thread Reply:* @Maciej Obuchowski - I'll add it to my list!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanbing Wang - (doris.wang200902@gmail.com) -
-
2022-09-30 15:34:01
-
-

*Thread Reply:* @Will Johnson Thanks a lot for your help. Let us debug and continue investigating on this issue.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yujia Yang - (yujia@tubi.tv) -
-
2022-09-26 03:46:19
-
-

Hi team, I find Openlineage posts a lot for run events to the backend.

- -

eg. I submit jar to Spark cluster with computations like

- -
  1. count from table1. --> this will have more than one run events inputs:[table1], outputs:[]
  2. count from table2 --> this will have more than one run events inputs:[table2], outputs:[]
  3. write Seq[(t1, count1), (t2, count2)) to table3. --> this may give inputs:[] outputs [table3] -can I just get one post with a summary telling me, inputs:[table1, table2], outputs:[table3] alongside with a merged columnareLineage?
  4. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-09-28 08:34:20
-
-

*Thread Reply:* One of assumptions was to create a stateless integration model where multiple events can be sent for a single job run. This has several advantages like sending events for jobs which suddenly fail, sending events immediately, etc.

- -

The events can be merged then at the backend side. The behavior, you describe, can be then achieved by using backends like Marquez and Marquez API to obtain combined data.

- -

Currently, we’re developing column-lineage dedicated endpoint in Marquez according to the proposal: https://github.com/MarquezProject/marquez/blob/main/proposals/2045-column-lineage-endpoint.md -This will allow you to request whole column lineage graph based on multiple jobs.

-
- - - - - - - - - - - - - - - - -
- - - -
- :gratitude_thank_you: Yujia Yang -
- -
- 👀 Yujia Yang -
- -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-28 09:47:55
-
-

Is there a provision to include additional MDC properties as part of openlineage ? -Or something like sparkSession.sparkContext().setLocalProperties("key","value")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-09-29 14:30:37
-
-

*Thread Reply:* Hello @srutikanta hota, could you elaborate a bit on your use case? I'm not sure what you are trying to achieve. Possibly @Paweł Leszczyński will know.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-09-29 15:24:26
-
-

*Thread Reply:* @srutikanta hota - Not sure what MDC properties stands for but you might take inspiration from the DatabricksEnvironmentHandler Facet Builder: https://github.com/OpenLineage/OpenLineage/blob/65a5f021a1ba3035d5198e759587737a05[…]ark/agent/facets/builder/DatabricksEnvironmentFacetBuilder.java

- -

You can create a facet that could extract out the properties that you might set from within the spark session.

- -

I don't think OpenLineage / a Spark Listener can affect the SparkSession itself so you wouldn't be able to SET the properties in the listener.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-30 04:56:25
-
-

*Thread Reply:* Many thanks for the details. My usecase is simple, I like to default the sparkgroupjob Id as openlineage parent runid if there is no parent run Id set. -sc.setJobGroup("myjobgroupid", "job description goes here") -This set the value in spark as -setLocalProperty(SparkContext.SPARKJOBGROUPID, group_id)

- -

I like to use myjobgroup_id as openlineage parent run id

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-09-30 05:01:08
-
-

*Thread Reply:* MDC is an ability to add extra key -> value pairs to a log entry, while not doing this within message body. So the question here is (I believe): how to add custom entries / custom facets to OpenLineage events?

- -

@srutikanta hota What information would you like to include? There is great chance we already have some fields for that. If not it’s still worth putting in in write place like: is this info job specific, run specific or relates to some of input / output datasets?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-09-30 05:04:34
-
-

*Thread Reply:* @srutikanta hota sounds like you want to set up -spark.openlineage.parentJobName -spark.openlineage.parentRunId -https://openlineage.io/docs/integrations/spark/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-09-30 05:15:18
-
-

*Thread Reply:* @… we are having a long-running spark context(the context may run for a week) where we submit jobs. Settings the parentrunid at beginning won't help. We are submitting the job with sparkgroupid. I like to use the group Id as parentRunId

- -

https://spark.apache.org/docs/1.6.1/api/R/setJobGroup.html

- - - -
- 🤔 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Trevor Swan - (trevor.swan@matillion.com) -
-
2022-09-29 13:59:20
-
-

Hi team - I am from Matillion and we would like to build support for openlineage. Who would be best placed to move the conversation with my product team?

- - - -
- 🙌 Will Johnson, Maciej Obuchowski, Francis McGregor-Macdonald -
- -
- 🎉 Michael Robinson -
- -
- 👍 Ernie Ostic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-09-29 14:22:06
-
-

*Thread Reply:* Hi Trevor, thank you for reaching out. I’d be happy to discuss with you how we can help you support OpenLineage. Let me send you an email.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jarek Potiuk - (jarek@potiuk.com) -
-
2022-09-29 15:58:35
-
-

cccccbctlvggfhvrcdlbbvtgeuredtbdjrdfttbnldcb

- - - -
- 🐈 Julien Le Dem, Jakub Dardziński, Maciej Obuchowski, Paweł Leszczyński -
- -
- 🐈‍⬛ Julien Le Dem, Maciej Obuchowski, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Petr Hajek - (petr.hajek@profinit.eu) -
-
2022-09-30 02:52:51
-
-

Hi Everyone! Would anybody be interested in participation in MANTA Open Lineage connector testing? We are specially looking for an environment with rich Airflow implementation but we will be happy to test on any other OL Producer technology. Send me a direct message for more information. Thanks, Petr

- - - -
- 🙌 Michael Robinson, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-09-30 14:34:45
-
-

Question about Apache Airflow that I think folks here would know, because doing a web search has failed me:

- -

Is there a way to interact with Apache Airflow to retrieve the contents of the files in the sql directory, but NOT to run them?

- -

(the APIs all seem to run sql, and when I search I just get “how to use the airflow API to run queries”)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-30 14:38:34
-
-

*Thread Reply:* Is this in the context of an OpenLineage extractor?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-09-30 14:40:47
-
-

*Thread Reply:* Yes! I was specifically looking at the PostgresOperator

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-09-30 14:41:54
-
-

*Thread Reply:* (as Snowflake lineage can be retrieved from their internal ACCESS_HISTORY tables, we wouldn’t need to use Airflow’s SnowflakeOperator to get lineage, we’d use the method on the openlineage blog)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-30 14:43:08
-
-

*Thread Reply:* The extractor for the SQL operators gets the query like this: -https://github.com/OpenLineage/OpenLineage/blob/45fda47d8ef29dd6d25103bb491fb8c443[…]gration/airflow/openlineage/airflow/extractors/sql_extractor.py

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-30 14:43:48
-
-

*Thread Reply:* let me see if I can find the corresponding part of the Airflow API docs...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-09-30 14:45:00
-
-

*Thread Reply:* aha! I’m not so far behind the times, it was only put in during July https://github.com/OpenLineage/OpenLineage/pull/907

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-30 14:47:28
-
-

*Thread Reply:* Hm. The PostgresOperator seems to extend BaseOperator directly: -https://github.com/apache/airflow/blob/029ebacd9cbbb5e307a03530bdaf111c2c3d4f51/airflow/providers/postgres/operators/postgres.py#L58

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-09-30 14:48:01
-
-

*Thread Reply:* yeah 😞 I couldn’t find a way to make that work as an end-user.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-30 14:48:08
-
-

*Thread Reply:* perhaps that can't be assumed for all operators that deal with SQL. I know that @Maciej Obuchowski has spent a lot of time on this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-09-30 14:49:14
-
-

*Thread Reply:* I don't know enough about the airflow internals 😞

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-09-30 14:50:00
-
-

*Thread Reply:* No worries. In case it saves you work, I also had a look at https://github.com/apache/airflow/blob/029ebacd9cbbb5e307a03530bdaf111c2c3d4f51/airflow/providers/common/sql/operators/sql.py - which also extends BaseOperator but not with a way to just get the SQL.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2022-09-30 15:22:24
-
-

*Thread Reply:* that's more of an Airflow question indeed. As far as I understand you need to read file with SQL statement within Airflow Operator and do something but run the query (like pass as an XCom)? SQLExtractors we have get same SQL that operators render and uses it to extract additional information like table schema straight from database

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-09-30 14:36:18
-
-

(I’m also ok with a way to get the SQL that has been run - but from Airflow, not the data source - I’m looking for a db-neutral way to do this, otherwise I can just parse query logs on any specific db system)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-30 18:45:09
-
-

👋 are there any docs on how the listener hooks in and gets run with openlineage-airflow? trying to write some unit tests but no docs seem to exist on the flow.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-09-30 19:06:47
-
-

*Thread Reply:* There's a design doc linked from the PR: https://github.com/apache/airflow/pull/20443 -https://docs.google.com/document/d/1L3xfdlWVUrdnFXng1Di4nMQYQtzMfhvvWDR9K4wXnDU/edit

-
- - - - - - - -
-
Labels
- area:scheduler/executor, area:dev-tools, area:plugins, type:new-feature, full tests needed -
- -
-
Comments
- 3 -
- - - - - - - - - - -
- - - -
- 👀 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-09-30 19:18:47
-
-

*Thread Reply:* amazing thank you I will take a look

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-03 11:32:52
-
-

@channel -Hello everyone, I’m opening up a vote on releasing OpenLineage 0.15.0, including -• an improved development experience in the Airflow integration -• updated proposal and integration templates -• a change to the BigQuery client in the Airflow integration -• plus bug fixes across the project. -3 +1s from committers will authorize an immediate release. For all the commits, see: https://github.com/OpenLineage/OpenLineage/compare/0.14.0...HEAD. Note: this will be the last release to support Airflow 1.x! -Thanks!

- - - -
- 🎉 Paul Lee, Howard Yoo, Minkyu Park, Michael Collado, Paweł Leszczyński, Maciej Obuchowski, Harel Shein -
- -
- 👍 Michael Collado, Julien Le Dem, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-03 11:33:30
-
-

*Thread Reply:* Hey @Michael Robinson. Removal of Airflow 1.x support is planned for next release after 0.15.0

- - - -
- 👍 Jakub Dardziński, Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-03 11:37:03
-
-

*Thread Reply:* 0.15.0 would be the last release supporting Airflow 1.x

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-03 11:37:07
-
-

*Thread Reply:* just caught this myself. I’ll make the change

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 11:40:33
-
-

*Thread Reply:* we’re still on 1.10.15 at the moment so i guess our team would have to rely on <=0.15.0?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-03 11:49:47
-
-

*Thread Reply:* Is this something you want to continue doing or do you want to migrate relatively soon?

- -

We want to remove 1.10 integration because for multiple PRs, maintaining compatibility with it takes a lot of time; the code is littered with checks like this. -if parse_version(AIRFLOW_VERSION) &gt;= parse_version("2.0.0"):

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 12:03:40
-
-

*Thread Reply:* hey Maciej, we do have plans to migrate in the coming months but for right now we need to stay on 1.10.15.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-04 09:39:11
-
-

*Thread Reply:* Thanks, all. The release is authorized, and you can expect it by Thursday.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 17:56:08
-
-

👋 what would be a possible reason for the built in airflow backend being utilized instead of a custom wrapper over airflow.lineage.Backend ? double checked the [lineage] key in our airflow.cfg

- -

there doesn't seem to be any errors being thrown and the object loads 🤔

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 17:56:36
-
-

*Thread Reply:* running airflow 2.3.4 with openlineage-airflow 0.14.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-03 18:03:03
-
-

*Thread Reply:* if you're talking about LineageBackend, it is used in Airflow 2.1-2.2. It did not have functionality where you can be notified on task start or failure, so we wanted to expand the functionality: https://github.com/apache/airflow/issues/17984

- -

Consensus of Airflow maintainers wasn't positive about changing this interface, so we went with another direction: https://github.com/apache/airflow/pull/20443

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-03 18:06:58
-
-

*Thread Reply:* Why nothing happens? https://github.com/OpenLineage/OpenLineage/blob/895160423643398348154a87e0682c3ab5c8704b/integration/airflow/openlineage/lineage_backend/__init__.py#L91

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 18:30:32
-
-

*Thread Reply:* ah hmm ok, i will double check. i commented that part out so technically it should run but maybe i missed something

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 18:30:42
-
-

*Thread Reply:* thank you for your fast response @Maciej Obuchowski ! i appreciate it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 18:31:13
-
-

*Thread Reply:* it seems like it doesn't use my custom wrapper but instead uses the openlineage implementation.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-03 20:11:15
-
-

*Thread Reply:* @Maciej Obuchowski ok, after checking we are emitting events with our custom backend but an odd thing is an attempt is always made with the openlineage backend. is there something obvious i am perhaps missing 🤔

- -

ends up with requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url immediately after task start. but by the end on task success/failure it emits the event with our custom backend both RunState.COMPLETE and RunState.START into our own pipeline.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-04 06:19:06
-
-

*Thread Reply:* If you're on 2.3 and trying to use some wrapped LineageBackend, what I think is happening is OpenLineagePlugin that automatically registers via setup.py entrypoint https://github.com/OpenLineage/OpenLineage/blob/65a5f021a1ba3035d5198e759587737a05b242e1/integration/airflow/openlineage/airflow/plugin.py#L30

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-04 06:23:48
-
-

*Thread Reply:* I think if you want to extend it with proprietary code there are two good options.

- -

First, if your code only needs to touch HTTP client side - which I guess is the case due to 401 error - then you can create custom Transport.

- -

Second, is that you fork OL code and create your own package, without entrypoint script or with adding your own if you decide to extend OpenLineagePlugin instead of LineageBackend

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-04 14:23:33
-
-

*Thread Reply:* amazing thank you for your help. i will take a look

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-04 14:49:47
-
-

*Thread Reply:* @Maciej Obuchowski is there a way to extend the plugin like how we can wrap the custom backend with 2.2? or would it be necessary to fork it.

- -

we're trying to not fork and instead opt with extending.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-05 04:55:05
-
-

*Thread Reply:* I think it's best to fork, since it's getting loaded by Airflow as an entrypoint: https://github.com/OpenLineage/OpenLineage/blob/133110300e8ea4e42e3640608cfed459683d5a8d/integration/airflow/setup.py#L70

-
- - - - - - - - - - - - - - - - -
- - - -
- 🙏 Paul Lee -
- -
- :gratitude_thank_you: Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-05 13:29:24
-
-

*Thread Reply:* got it. and in terms of the openlineage.yml and defining a custom transport is there a way i can define where openlineage-python should look for the custom transport? e.g. different path

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-05 13:30:04
-
-

*Thread Reply:* because from the docs i. can't tell except for the file i'm supposed to copy and implement.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-05 14:18:19
-
-

*Thread Reply:* @Paul Lee you should derive from Transport base class and register type as full python import path to your custom transport, for example https://github.com/OpenLineage/OpenLineage/blob/f8533266491acea2159f602f782a99a4f8a82cca/client/python/tests/openlineage.yml#L2

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-05 14:20:48
-
-

*Thread Reply:* your custom transport should have also define custom class Config , and this class should implement from_dict method

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-05 14:20:56
-
-

*Thread Reply:* the whole process is here: https://github.com/OpenLineage/OpenLineage/blob/a62484ec14359a985d283c639ac7e8b9cfc54c2e/client/python/openlineage/client/transport/factory.py#L47

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-05 14:21:09
-
-

*Thread Reply:* and I know we need to document this better 🙂

- - - -
- 🙏 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-05 15:35:31
-
-

*Thread Reply:* amazing, thanks for all your help 🙂 +1 to the docs, if i have some time when done i will push up some docs to document what i've done

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-05 15:50:29
-
-

*Thread Reply:* https://github.com/openlineage/docs/ - let me know and I'll review 🙂

-
- - - - - - - -
-
Website
- <https://openlineage.io/docs> -
- -
-
Stars
- 4 -
- - - - - - - - -
- - - -
- 🎉 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-04 12:39:59
-
-

@channel -Hi everyone, opening a vote on a release (0.15.1) to add #1131 to fix the release process on CI. 3 +1s from committers will authorize an immediate release. Thanks. More details are here: -https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
- ✅ Michael Collado, Maciej Obuchowski, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-04 14:25:49
-
-

*Thread Reply:* Thanks, all. The release is authorized.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-05 10:46:46
-
-

@channel -OpenLineage 0.15.1 is now available! -We added: -• Airflow: improve development experience #1101 @JDarDagran -• Documentation: update issue templates for proposal & add new integration template #1116 @rossturk -• Spark: add description for URL parameters in readme, change overwriteName to appName #1130 @tnazarew -We changed: -• Airflow: lazy load BigQuery client #1119 @mobuchowski -Many bug fixes were also included in this release. -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.15.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.14.1...0.15.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Maciej Obuchowski, Jakub Dardziński, Howard Yoo, Harel Shein, Paul Lee, Paweł Leszczyński -
- -
- 🎉 Howard Yoo, Harel Shein, Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-06 07:35:00
-
-

Is there a topic you think the community should discuss at the next OpenLineage TSC meeting? Reply or DM with your item, and we’ll add it to the agenda.

- - - -
- 🌟 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-06 13:29:30
-
-

*Thread Reply:* would love to add improvement in docs :) for newcomers

- - - -
- 👏 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-06 13:31:07
-
-

*Thread Reply:* also, what’s TSC?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-06 15:20:23
-
-

*Thread Reply:* Technical Steering Committee, but it’s open to everyone

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-06 15:20:45
-
-

*Thread Reply:* and we encourage newcomers to attend

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-06 13:49:00
-
-

has anyone seen their COMPLETE/FAILED listeners not firing on Airflow 2.3.4 but START events do emit? using openlineage-airflow 0.14.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2022-10-06 14:39:27
-
-

*Thread Reply:* is there any error/warn message logged maybe?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-06 14:40:53
-
-

*Thread Reply:* none that i'm seeing on our workers. i do see that our custom http transport is being utilized on START.

- -

but on SUCCESS nothing fires.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-06 14:41:21
-
-

*Thread Reply:* which makes me believe the listeners themselves aren't being utilized? 🤔

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2022-10-06 16:37:54
-
-

*Thread Reply:* uhm, any chance you're experiencing this with custom extractors?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2022-10-06 16:38:13
-
-

*Thread Reply:* I'd be happy to jump on a quick call if you wish

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2022-10-06 16:38:40
-
-

*Thread Reply:* but in more EU friendly hours 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-07 16:19:47
-
-

*Thread Reply:* no custom extractors, its usingt he base extractor. a call would be 👍. let me look at my calendar and EU hours.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-06 15:23:27
-
-

@channel The next OpenLineage Technical Steering Committee meeting is on Thursday, October 13 at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom -All are welcome! -Agenda:

- -
  1. Announcements
  2. Recent Release 0.15.1
  3. Project roadmap review
  4. Open discussion -Notes: https://bit.ly/OLwiki -Is there a topic you think the community should discuss at this or a future meeting? Reply or DM me to add items to the agenda.
  5. -
- - - -
- 🙌 Paul Lee, Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Srinivasa Raghavan - (gsrinir@gmail.com) -
-
2022-10-07 06:52:42
-
-

hello all. I am trying to run the airflow example from here -I changed the Marquez web port from 5000 to 15000 but when I start the docker images, it seems to always default to port 5000 and therefore when I go to localhost:3000, the jobs don't load up as they are not able to connect to the marquez app running in 15000. I've overriden the values in docker-compose.yml and in openLineage.env but it seems to be picking up the 5000 value from some other location. -This is what I see in the logs. Any pointers on this or please redirect me to the appropriate channel. Thanks! -INFO [2022-10-07 10:48:58,022] org.eclipse.jetty.server.AbstractConnector: Started application@782fd504{HTTP/1.1, (http/1.1)}{0.0.0.0:5000} -INFO [2022-10-07 10:48:58,034] org.eclipse.jetty.server.AbstractConnector: Started admin@1537c744{HTTP/1.1, (http/1.1)}{0.0.0.0:5001}

- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Srinivasa Raghavan - (gsrinir@gmail.com) -
-
2022-10-20 05:11:09
-
-

*Thread Reply:* Apparently the value is hard coded in the code somewhere that I couldn't figure out but at-least learnt that in my Mac where this port 5000 is being held up can be freed by following the below simple step.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-10-10 18:00:17
-
-

Hi #general - @Will Johnson and I are working on adding support for Snowflake to OL, and as we were going to specify the package under the compileOnly dependencies in gradle, we had some doubts looking at the existing dependencies. Taking bigQuery as an example - we see it's included as a dependency in both the shared build.gradle file, and in the app build.gradle file. We're a bit confused about the following:

- -
  1. Why do we need to have the bigQuery package in shared's dependencies? App of course contains the bigQueryNodeVisitor but we couldn't spot where it's being used within shared.
  2. For all the dependencies in the shared gradle file, the versions for Scala and Spark are fixed (Scala 2.11, Spark 2.4.8), whereas for app, the versionsMap allows for different combinations of spark and scala versions. Why is this so?
  3. How do the dependencies between app and shared interact? Does one or the other take precedence for which version of the bigQuery connector is compiled? -We'd appreciate any guidance!
  4. -
- -

Thank you in advance!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-10-11 03:47:31
-
-

*Thread Reply:* Hi @Hanna Moazam,

- -

Within recent PR https://github.com/OpenLineage/OpenLineage/pull/1111, I removed BigQuery dependencies from spark2, spark32 and spark3 subprojects. It has to stay in sharedbecause of BigQueryNodeVisitor. The usage of BigQueryNodeVisitor is tricky as we never know if bigquery classes are available on runtime or not. The check is done in io.openlineage.spark.agent.lifecycle.BaseVisitorFactory -if (BigQueryNodeVisitor.hasBigQueryClasses()) { - list.add(new BigQueryNodeVisitor(context, factory)); - } -Regarding point 2, there were some Spark versions which allowed two Scala versions (2.11 and 2.12). Then it makes sense to make it configurable. On the other hand, for Spark 3.2 we only support 2.12 which is hardcoded in build.gradle.

- -

The idea of app project is let's create a separate project to aggregate all the dependecies and run integration tests on it . Subprojects spark2, spark3, etc. do depend on shared . Putting integration tests in shared would create additional opposite-way dependency, which we wanted to avoid.

-
- - - - - - - -
-
Labels
- bug, documentation, integration/spark, integration/bigquery -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-10-11 09:20:44
-
-

*Thread Reply:* So, if we wanted to add Snowflake, we would need to:

- -
  1. Pick a version of snowflake's spark library
  2. Pick a version of scala that we target (i.e. we are only going to support Snowflake in Spark 3.2 so scala 2.12 will be hard coded)
  3. Add the visitor code to Shared
  4. Add the dependencies to app (ONLY if there is an integration test in app?? This is the confusing part still)
  5. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-10-12 03:51:54
-
-

*Thread Reply:* Yes. Please note that snowflake library will not be included in target OpenLineage jar. So you may test it manually against multiple Snowflake library versions or even adjust code in case of minor differences.

- - - -
- 👍 Hanna Moazam, Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-10-12 05:20:17
-
-

*Thread Reply:* Thank you Pawel!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-12 12:18:16
-
-

*Thread Reply:* Basically the same pattern you've already done with Kusto 😉 -https://github.com/OpenLineage/OpenLineage/blob/a96ecdabe66567151e7739e25cd9dd03d6[…]va/io/openlineage/spark/agent/lifecycle/BaseVisitorFactory.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-10-12 12:26:35
-
-

*Thread Reply:* We actually used only reflection for Kusto and were hoping to do it the 'better' way with the package itself for snowflake - if it's possible :)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Akash r - (akashrn25@gmail.com) -
-
2022-10-11 02:04:28
-
-

Hi Community,

- -

I was going through the code of dbt integration with Open lineage, Once the events has been emitted from client code , I wanted to check the server code where the events are read and the lineage is formed. Where can I find that code ?

- -

Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-11 05:03:26
-
-

*Thread Reply:* Reference implementation of OpenLineage consumer is Marquez: https://github.com/MarquezProject/marquez

-
- - - - - - - -
-
Website
- <https://marquezproject.ai> -
- -
-
Stars
- 1187 -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-10-12 11:59:55
-
-

This month’s OpenLineage TSC meeting is tomorrow at 10 am PT! https://openlineage.slack.com/archives/C01CK9T7HKR/p1665084207602369

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2022-10-13 12:05:17
-
-

Is there anyone in the Open Lineage community in San Diego? I’ll be there Nov 1-3 and would love to meet some of y’all in person

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-20 13:49:39
-
-

👋 is there a way to define a base extractor to be defaulted to? for example, i'd like to have all our operators (50+) default to my custom base extractor instead of having a list of 50+ operators in get_operator_classnames

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Howard Yoo - (howard.yoo@astronomer.io) -
-
2022-10-20 13:53:55
-
-

I don't think that's possible yet, as the extractor checks are based on the class name... and it wouldn't check which parent operator has it inherited from.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-20 14:05:38
-
-

😢 ok, i would contribute upstream but unfortunately we're still on 1.10.15. looking like we might have to hardcode for a bit.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-20 14:06:01
-
-

is this the correct assumption? we're still on 0.14.1 ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-10-20 14:33:49
-
-

If you'll move to 2.x series and OpenLineage 0.16, you could use this feature: https://github.com/OpenLineage/OpenLineage/pull/1162

-
- - - - - - - -
-
Labels
- integration/airflow, extractor -
- - - - - - - - - - -
- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-10-20 14:46:36
-
-

thanks @Maciej Obuchowski we're working on it. hoping we'll land on 2.3.4 in the coming month.

- - - -
- 🔥 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Austin Poulton - (austin.poulton@equalexperts.com) -
-
2022-10-26 05:31:07
-
-

👋 Hi everyone!

- - - -
- 👋 Jakub Dardziński, Maciej Obuchowski, Michael Robinson, Ross Turk, Willy Lulciuc, Paweł Leszczyński, Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-10-26 15:22:22
-
-

*Thread Reply:* Hey @Austin Poulton, welcome! 👋

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Austin Poulton - (austin.poulton@equalexperts.com) -
-
2022-10-31 06:09:41
-
-

*Thread Reply:* thanks Harel 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-01 09:44:18
-
-

@channel -Hi everyone, I’m opening a vote to release OpenLineage 0.16.0, featuring: -• support for boolean arguments in the DefaultExtractor -• a more efficient get_connection_uri method in the Airflow integration -• a reorganized, Rust-based SQL integration (easing the addition of language interfaces in the future) -• bug fixes and more. -3 +1s from committers will authorize an immediate release. Thanks. More details are here: -https://github.com/OpenLineage/OpenLineage/compare/0.15.1...HEAD

- - - -
- 🙌 Howard Yoo, Paweł Leszczyński, Maciej Obuchowski -
- -
- 👍 Ross Turk, Paweł Leszczyński, Maciej Obuchowski -
- -
- ➕ Willy Lulciuc, Mandy Chessell, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-01 13:37:54
-
-

*Thread Reply:* Thanks, all! The release is authorized. We will initiate it within 48 hours.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Iftach Schonbaum - (iftach.schonbaum@hunters.ai) -
-
2022-11-02 08:45:20
-
-

Anybody with a success use-case of ingesting column-level lineage into amundsen?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-02 09:19:43
-
-

*Thread Reply:* I think amundsen-openlineage dataloader precedes column-level lineage in OL by a bit, so I doubt this works

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-11-02 15:54:31
-
-

*Thread Reply:* do you want to open up an issue for it @Iftach Schonbaum?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-02 12:36:22
-
-

Hi everyone, you might notice Dependabot opening PRs to update dependencies now that it’s been configured and turned on (https://github.com/OpenLineage/OpenLineage/pull/1182). There will probably be a large number of PRs to start with, but this shouldn’t always be the case and we can change the tool’s behavior, as well. (Some background: this will help us earn the OSSF Silver badge for the project, which will help us advance in the LFAI.)

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-03 07:53:31
-
-

@channel -I’m opening a vote to release OpenLineage 0.16.1 to fix an issue in the SQL integration. This release will also include all the commits announced for 0.16.0. -3 +1s from committers will authorize an immediate release. Thanks.

-
- - - - - - - -
-
Labels
- integration/sql -
- - - - - - - - - - -
- - - -
- ➕ Maciej Obuchowski, Hanna Moazam, Jakub Dardziński, Ross Turk, Paweł Leszczyński, Jarek Potiuk, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-03 12:25:29
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated shortly.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-03 13:46:58
-
-

@channel -OpenLineage 0.16.1 is now available, featuring: -Additions: -• Airflow: add dag_run information to Airflow version run facet #1133 @fm100 -• Airflow: add LoggingMixin to extractors #1149 @JDarDagran -• Airflow: add default extractor #1162 @mobuchowski -• Airflow: add on_complete argument in DefaultExtractor #1188 @JDarDagran -• SQL: reorganize the library into multiple packages #1167 @StarostaGit @mobuchowski -Changes: -• Airflow: move get_connection_uri as extractor’s classmethod #1169 @JDarDagran -• Airflow: change get_openlineage_facets_on_start/complete behavior #1201 @JDarDagran -Bug fixes and more! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.16.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.15.1...0.16.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Maciej Obuchowski, Francis McGregor-Macdonald, Eric Veleker -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Phil Chen - (phil@gpr.com) -
-
2022-11-03 13:59:29
-
-

Are there any tutorial and documentation how to create an Openlinage connector. For example, what if we Argo workflow instead of Apache airflow for orchestrating ETL jobs? How are we going to create Openlinage Argo workflow connector? How much efforts, roughly? And can people contribute such connectors to the community if they create one?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-04 06:34:27
-
-

*Thread Reply:* > Are there any tutorial and documentation how to create an Openlinage connector. -We have somewhat of a start of a doc: -https://openlineage.io/docs/development/developing/

- -

Here we have an example of using Python OL client to emit OL events: https://openlineage.io/docs/client/python#start-docker-and-marquez

- -

> How much efforts, roughly? -I'm not familiar with Argo workflows, but usually the effort needed depends on extensibility of the underlying system. From the first look, Argo looks like it has sufficient mechanisms for that: https://argoproj.github.io/argo-workflows/executor_plugins/#examples-and-community-contributed-plugins

- -

Then, it depends if you can get the information that you need in that plugin. Basic need is to have information from which datasets the workflow/job is reading and to which datasets it's writing.

- -

> And can people contribute such connectors to the community if they create one? -Definitely! And if you need help with anything OpenLineage feel free to write here on Slack

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-03 17:57:37
-
-

Is there a topic you think the community should discuss at the next OpenLineage TSC meeting? Reply or DM with your item, and we’ll add it to the agenda.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-03 18:03:18
-
-

@channel -This month’s OpenLineage TSC meeting is next Thursday, November 10th, at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom. All are welcome! -On the tentative agenda:

- -
  1. Recent release overview [Michael R.]
  2. Update on LFAI & Data Foundation progress [Michael R.]
  3. Proposal: Defining “implementing OpenLineage” [Julien]
  4. Update from MANTA on their OpenLineage integration [Eric and/or Petr from MANTA]
  5. Linking CMF (a common ML metadata framework) and OpenLineage [Suparna and AnnMary from HP Enterprise]
  6. Open discussion
  7. -
- - - -
- 👍 Luca Soato, Maciej Obuchowski, Paul Lee, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Kenton (swiple.io) - (kknoxparton@gmail.com) -
-
2022-11-08 04:47:41
-
-

Hi all 👋 I’m Kenton — a Software Engineer and founder of Swiple. I’m looking forward to working with OpenLineage and its community to integrate data lineage and data observability. -https://swiple.io

-
-
swiple.io
- - - - - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski, Jakub Dardziński, Michael Robinson, Ross Turk, John Thomas, Julien Le Dem, Willy Lulciuc, Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-11-08 10:22:15
-
-

*Thread Reply:* Welcome Kenton! Happy to help 👍

- - - -
- 👍 Kenton (swiple.io) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Deepika Prabha - (deepikaprabha@gmail.com) -
-
2022-11-08 05:35:03
-
-

Hi everyone, -We wanted to pass some dynamic metadata from spark job that we can catch up in OpenLineage event and use it for processing. Presently I have seen that we have few conf parameters like openlineage params that we can send only with Spark conf. Is there any other option we have where we can send some information dynamically from the spark jobs.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-08 10:06:10
-
-

*Thread Reply:* What kind of data? My first feeling is that you need to extend the Spark integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Deepika Prabha - (deepikaprabha@gmail.com) -
-
2022-11-09 00:35:29
-
-

*Thread Reply:* Yes, we wanted to add information like user/job description that we can use later with rest of openlineage event fields in our system

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Deepika Prabha - (deepikaprabha@gmail.com) -
-
2022-11-09 00:41:35
-
-

*Thread Reply:* I can see in this PR https://github.com/OpenLineage/OpenLineage/pull/490 that env values can be captured which we can use to add some custom metadata but it seems it is specific to Databricks only.

-
- - - - - - - -
-
Comments
- 8 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-09 05:14:50
-
-

*Thread Reply:* I think it makes sense to have something like that, but generic, if you want to contribute it

- - - -
- 👍 Will Johnson, Deepika Prabha -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-11-14 03:28:35
-
-

*Thread Reply:* @Maciej Obuchowski Do you mean adding something like -spark.openlineage.jobFacet.FacetName.Key=Value to the spark conf should add a new job facet like -"FacetName": { - "Key": "Value" -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-14 05:56:02
-
-

*Thread Reply:* We can argue about name of that key, but yes, something like that. Just notice that while it's possible to attach something to run and job facets directly, it would be much harder to do this with datasets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
slackbot - -
-
2022-11-09 11:15:49
-
-

This message was deleted.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-11-10 02:22:18
-
-

*Thread Reply:* Hi @Varun Singh, what version of openlineage-spark where you using? Are you able to copy lineage event here?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-09 12:31:10
-
-

@channel -This month’s TSC meeting is tomorrow at 10 am PT! https://openlineage.slack.com/archives/C01CK9T7HKR/p1667512998061829

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 💥 Willy Lulciuc, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Hanna Moazam - (hannamoazam@microsoft.com) -
-
2022-11-11 11:32:54
-
-

Hi #general, quick question: do we plan to disable spark 2 support in the near future?

- -

Longer question: -I've recently made a PR (https://github.com/OpenLineage/OpenLineage/pull/1231) to support capturing lineage from Snowflake, but it fails at a specific integration test due to what we think is a dependency mismatch for guava. I've tried to exclude any transient dependencies which may cause the problem but no luck with that so far.

- -

Just wondering if:

- -
  1. It makes sense to spend more time trying to ensure that test passes? Especially if we plan to remove spark 2 support soon.
  2. Assuming we do want to make sure to pass the test, does anyone have any other ideas for where to look/modify to prevent the error? -Here's the test failure message: -```io.openlineage.spark.agent.lifecycle.LibraryTest testRdd(SparkSession) FAILED (16s)
  3. -
- -

java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat - at io.openlineage.spark.agent.lifecycle.LibraryTest.testRdd(LibraryTest.java:113) ``` -Thanks in advance!

-
- - - - - - - -
-
Labels
- documentation, integration/spark, spec -
- -
-
Comments
- 4 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-11 16:28:07
-
-

*Thread Reply:* What if we just not include it in the BaseVisitorFactory but only in the Spark3 visitor factories?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-11-11 14:52:19
-
-

quick question: how do i get the &lt;&lt;non-serializable Time...to show in the extraction? or really any object that gets passed in.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-11 16:24:30
-
-

*Thread Reply:* You might look here: https://github.com/OpenLineage/OpenLineage/blob/f7049c599a0b1416408860427f0759624326677d/client/python/openlineage/client/serde.py#L51

-
- - - - - - - - - - - - - - - - -
- - - -
- :gratitude_thank_you: Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-11-14 01:12:45
-
-

Is there a way I can update the detaset description and the column description. While generating the open lineage spark events and columns

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-11-15 02:09:25
-
-

*Thread Reply:* I don’t think this is possible at the moment.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2022-11-15 15:47:49
-
-

Hey all, I'd like to ask for a release for OpenLineage. #1256 fixes bug in DefaultExtractor. This blocks people from migrating code from custom extractors to get_openlineage_facets methods.

- - - -
- ➕ Michael Robinson, Howard Yoo, Maciej Obuchowski, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-16 09:13:17
-
-

*Thread Reply:* Thanks, all. The release is authorized.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-16 10:41:07
-
-

*Thread Reply:* The PR for the changelog updates: https://github.com/OpenLineage/OpenLineage/pull/1306

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-11-16 03:34:01
-
-

Hi, small question: Is it possible to disable the /api/{version}/lineage suffix that gets added to every url automatically? Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-16 12:27:12
-
-

*Thread Reply:* I think we had similar request before, but nothing was implemented.

- - - -
- 👍 Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-11-16 12:23:54
-
-

@channel -OpenLineage 0.17.0 is now available, featuring: -Additions: -• Spark: support latest Spark 3.3.1 #1183 @pawel-big-lebowski -• Spark: add Kinesis Transport and support config Kinesis in Spark integration #1200 @yogyang -• Spark: disable specified facets #1271 @pawel-big-lebowski -• Python: add facets implementation to Python client #1233 @pawel-big-lebowski -• SQL: add Rust parser interface #1172 @StarostaGit @mobuchowski -• Proxy: add helm chart for the proxy backend #1068 @wslulciuc -• Spec: include possible facets usage in spec #1249 @pawel-big-lebowski -• Website: publish YML version of spec to website #1300 @rossturk -• Docs: update language on nominating new committers #1270 @rossturk -Changes: -• Website: publish spec into new website repo location #1295 @rossturk -• Airflow: change how pip installs packages in tox environments #1302 @JDarDagran -Removals: -• Deprecate HttpTransport.Builder in favor of HttpConfig #1287 @collado-mike -Bug fixes and more! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.17.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.16.1...0.17.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Howard Yoo, Maciej Obuchowski, Ross Turk, Aphra Bloomfield, Harel Shein, Kengo Seki, Paweł Leszczyński, pankaj koti, Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Diego Cesar - (dcesar@krakenrobotik.de) -
-
2022-11-18 05:40:53
-
-

Hi everyone,

- -

I'm trying to get the lineage of a dataset per version. I initially had something like

- -

Dataset A -&gt; Dataset B -&gt; DataSet C (version 1)

- -

then:

- -

Dataset D -&gt; Dataset E -&gt; DataSet C (version 2)

- -

I can get the graph for version 2 without problems, but I'm wondering if there's any way to retrieve the entire graph for DataSet C version 1.

- -

Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-22 13:40:44
-
-

*Thread Reply:* It's kind of a hard problem UI side. Backend can express that relationship

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Diego Cesar - (dcesar@krakenrobotik.de) -
-
2022-11-22 13:48:58
-
-

*Thread Reply:* Thanks for replying. Could you please point me to the API that allows me to do that? I've been calling GET /lineage with dataset in the node ID, e g., nodeId=dataset:my_dataset . Where could I specify the version of my dataset?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-11-18 17:55:24
-
-

👋 how do we get the actual values from macros? e.g. a schema name is passed in with {{params.table_name}} and thats what shows in lineage instead of the actual table name

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2022-11-19 04:54:13
-
-

*Thread Reply:* Templated fields are rendered before generating lineage data. Do you have some sample code or logs preferrably?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-22 13:40:11
-
-

*Thread Reply:* If you're on 1.10 then I think it won't work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-11-28 12:50:39
-
-

*Thread Reply:* @Maciej Obuchowski we are still on airflow 1.10.15 unfortunately.

- -

cc. @Eli Schachar @Allison Suarez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-11-28 12:50:49
-
-

*Thread Reply:* is there no workaround we can make work?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2022-11-28 12:51:01
-
-

*Thread Reply:* @Jakub Dardziński is this for airflow versions 2.0+?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2022-11-21 07:07:10
-
-

Hey, quick question: I see there is Kafka transport in the java client, but it's not supported in the spark integration, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-21 07:28:04
- -
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-11-22 13:03:41
-
-

How can we auto instrument a dataset owner at Java agent level? Is there any spark property available?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-11-22 16:47:37
-
-

Is there a way if we are running a job with business day as yesterday to capture the information. Just think if I am running yesterday missing job today. Or Friday's file on Monday as we received file late from vendor etc..

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-11-22 18:45:48
-
-

*Thread Reply:* I think that's what NominalTimeFacet covers

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-24 09:15:45
-
-

hello Team, i wanna to use data lineage using airflow but not getting understand from docs please let me know if someone have pretty docs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-11-28 10:29:58
-
-

*Thread Reply:* Hey @Rahul Sharma, what version of Airflow are you running?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 10:30:14
-
-

*Thread Reply:* i am using airflow 2.x

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 10:30:27
-
-

*Thread Reply:* can we connect if you have time ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-11-28 11:11:58
-
-

*Thread Reply:* did you see these docs before? https://openlineage.io/integration/apache-airflow/#airflow-20

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 11:12:22
-
-

*Thread Reply:* yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 11:12:36
-
-

*Thread Reply:* i already set configuration in airflow.cfg file

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-11-28 11:12:57
-
-

*Thread Reply:* where are you sending the events to?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 11:13:24
-
-

*Thread Reply:* i have a docker machine on which marquez is working

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-11-28 11:13:47
-
-

*Thread Reply:* so, what is the issue you are seeing?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 11:15:37
-
-

*Thread Reply:* there is no error

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 11:16:01
-
-

*Thread Reply:* ```[lineage]

- -

what lineage backend to use

- -

backend =openlineage.lineage_backend.OpenLineageBackend

- -

MARQUEZ_URL=http://10.36.37.178:3000

- -

MARQUEZ_NAMESPACE=airflow

- -

MARQUEZBACKEND=HTTP -MARQUEZURL=http://10.36.37.178:5000

- -

MARQUEZAPIKEY=[YOURAPIKEY]

- -

MARQUEZ_NAMESPACE=airflow```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 11:16:09
-
-

*Thread Reply:* above config i have set

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul Sharma - (panditrahul151197@gmail.com) -
-
2022-11-28 11:16:22
-
-

*Thread Reply:* please let me know any other thing need to do

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mohamed Nabil H - (m.nabil.hafez@gmail.com) -
-
2022-11-24 14:02:27
-
-

hey i wonder if somebody can link me to the lineage ( table lineage ) event schema ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-11-25 02:20:40
-
-

*Thread Reply:* please have a look at openapi definition of the event: https://openlineage.io/apidocs/openapi/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Murali Krishna - (vmurali.krishnaraju@genpact.com) -
-
2022-11-30 02:34:51
-
-

Hello Team, I am from Genpact Data Analytics team, we are looking for demo of your product

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Conor Beverland - (conorbev@gmail.com) -
-
2022-11-30 14:10:10
-
-

*Thread Reply:* hey, I'll DM you.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-12-01 15:00:28
-
-

Hello all, I’m calling for a vote on releasing OpenLineage 0.18.0, including: -• improvements to the Spark integration, -• extractors for Sagemaker operators and SFTPOperator in the Airflow integration, -• a change to the Databricks integration to support Databricks Runtime 11.3, -• new governance docs, -• bug fixes, -• and more. -Three +1s from committers will authorize an immediate release.

- - - -
- ➕ Maciej Obuchowski, Will Johnson, Bramha Aelem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-12-06 13:56:17
-
-

*Thread Reply:* Thanks, all. The release is authorized will be initiated within two business days.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-12-01 15:11:10
-
-

@channel -This month’s OpenLineage TSC meeting is next Thursday, December 8th, at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom. All are welcome! -On the tentative agenda:

- -
  1. an overview of the new Rust implementation of the SQL integration
  2. a pesentation/discussion of what it actually means to “implement” OpenLineage
  3. open discussion.
  4. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Scott Anderson - (scott.anderson@alteryx.com) -
-
2022-12-02 13:57:07
-
-

Hello everyone! General question here, aside from ‘consumer’ orgs/integrations (dbt/dagster/manta), is anyone aware of any enterprise organizations that are leveraging OpenLineage today? Example lighthouse brands?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-12-02 15:21:20
-
-

*Thread Reply:* Microsoft https://openlineage.io/blog/openlineage-microsoft-purview/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-12-05 13:54:06
-
-

*Thread Reply:* I think we can share that we have over 2,000 installs of that Microsoft solution accelerator using OpenLineage.

- -

That means we have thousands of companies having experimented with OpenLineage and Microsoft Purview.

- -

We can't name any customers at this point unfortunately.

- - - -
- 🎉 Conor Beverland, Kengo Seki -
- -
- 👍 Scott Anderson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-12-07 12:03:06
-
-

@channel -This month’s TSC meeting is tomorrow at 10 am PT. All are welcome! https://openlineage.slack.com/archives/C01CK9T7HKR/p1669925470878699

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2022-12-07 14:22:58
-
-

*Thread Reply:* For open discussion, I'd like to ask the team for an overview of how the different gradle files are working together for the Spark implementation. I'm terribly confused on where dependencies need to be added (whether it's in shared, app, or a spark version specific folder). Maybe @Maciej Obuchowski...?

- - - -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2022-12-07 14:25:12
-
-

*Thread Reply:* Unfortunately I'll be unable to attend the meeting @Will Johnson 😞

- - - -
- 😭 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-12-08 13:03:08
-
-

*Thread Reply:* This is starting now. CC @Will Johnson

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-12-09 19:24:15
-
-

*Thread Reply:* @Will Johnson Check the notes and the recording. @Michael Collado did a pass at explaining the relationship between shared, app and the versions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-12-09 19:24:30
-
-

*Thread Reply:* feel free to follow up here as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-12-09 19:39:37
-
-

*Thread Reply:* ascii art to the rescue! (top “depends on” bottom)

- -
              /   \
-             / / \ \
-            / /   \ \
-           / /     \ \
-          / /       \ \
-         / |         | \
-        /  |         |  \
-       /   |         |   \
-      /    |         |    \
-     /     |         |     \
-    /      |         |      \
-   /       |         |       \
-spark2   spark3   spark32   spark33
-   \        |        |       /
-    \       |        |      /
-     \      |        |     /
-      \     |        |    /
-       \    |        |   /
-        \   |        |  /
-         \  |        | /
-          \ |       / /
-           \ \     / /
-            \ \   / /
-             \ \ / /
-              \   /
-               \ /
-             share
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-12-09 19:40:05
-
-

*Thread Reply:* 😍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2022-12-09 19:41:13
-
-

*Thread Reply:* (btw, we should have written datakin to output ascii art; it’s obviously the superior way to generate graphs 😜)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-14 05:18:53
-
-

*Thread Reply:* Hi, is there a recording for this meeting?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Christian Lundgren - (christian@lunit.io) -
-
2022-12-07 20:33:19
-
-

Hi! I have a basic question about the naming conventions for blob storage. The spec is not totally clear to me. Is the convention to use (1) namespace=bucket name=bucket+path or (2) namespace=bucket name=path?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-12-07 22:05:25
-
-

*Thread Reply:* The namespace is the bucket and the dataset name is the path. Is there a blob storage provider in particular you are thinking of?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Christian Lundgren - (christian@lunit.io) -
-
2022-12-07 23:13:41
-
-

*Thread Reply:* Thanks, that makes sense. We use GCS, so it is already covered by the naming conventions documented. I was just not sure if I was understanding the document correctly or not.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2022-12-07 23:34:33
-
-

*Thread Reply:* No problem. Let us know if you have suggestions on the wording to make the doc clearer

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2022-12-08 11:44:49
-
-

@channel -OpenLineage 0.18.0 is available now, featuring: -• Airflow: support SQLExecuteQueryOperator #1379 @JDarDagran -• Airflow: introduce a new extractor for SFTPOperator #1263 @sekikn -• Airflow: add Sagemaker extractors #1136 @fhoda -• Airflow: add S3 extractor for Airflow operators #1166 @fhoda -• Spec: add spec file for ExternalQueryRunFacet #1262 @howardyoo -• Docs: add a TSC doc #1303 @merobi-hub -• Plus bug fixes. -Thanks to all our contributors, including new contributor @Faisal Hoda! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.18.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.17.0...0.18.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🚀 Willy Lulciuc, Minkyu Park, Kengo Seki, Enrico Rotundo, Faisal Hoda -
- -
- 🙌 Howard Yoo, Minkyu Park, Kengo Seki, Enrico Rotundo, Faisal Hoda -
- -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-12-09 01:42:59
-
-

1) Is there a specifications to capture dataset dependency. ds1 is dependent on ds2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-12-09 11:51:16
-
-

*Thread Reply:* Dataset dependencies are represented through common relationship with a Job - e.g., the task that performed the transformation.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
srutikanta hota - (srutikanta.hota@gmail.com) -
-
2022-12-11 09:01:19
-
-

*Thread Reply:* Is it possible to populate table level dependency without any transformation using open lineage specifications? Like to define dataset 1 is dependent of table 1 and table 2 which can be represented as separate datasets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-12-13 15:24:20
-
-

*Thread Reply:* Not explicitly, in today's spec. The guiding principle is that something created that dependency, and the dependency changes over time in a way that is important to study.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-12-13 15:25:12
-
-

*Thread Reply:* I say this to explain why it is the way it is - but the spec can change over time to serve new uses cases, certainly!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-14 05:18:10
-
-

Hi everyone, I'd like to use openlineage to capture column level lineage for spark. I would also like to capture a few custom environment variables along with the column lineage. May I know how this can be done? Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-12-14 09:56:22
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, you could start with column-lineage & spark workshop available here -> https://github.com/OpenLineage/workshops/tree/main/spark

- - - -
- ❤️ Ricardo Gaspar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-14 10:05:54
-
-

*Thread Reply:* Hi @Paweł Leszczyński Thanks for the link! But this does not really answer the concern.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-14 10:06:08
-
-

*Thread Reply:* I am already able to capture column lineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-14 10:06:33
-
-

*Thread Reply:* What I would like is to capture some extra environment variables, and send it to the server along with the lineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-12-14 11:22:59
-
-

*Thread Reply:* i remember we already have a facet for that: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/main/java/io/openlineage/spark/agent/facets/EnvironmentFacet.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-12-14 11:24:07
-
-

*Thread Reply:* but it is only used at the moment to capture some databricks environment attributes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-12-14 11:28:29
-
-

*Thread Reply:* so you can contribute to project and add a feature which adds specified/al environment variables to lineage event.

- -

you can also have a look at extending section of spark integration docs (https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#extending) and create a class thats add run facet builder according to your needs.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2022-12-14 11:29:28
-
-

*Thread Reply:* the third way is to create an issue related to this bcz being able to send selected/all environment variables in OL event seems to be really cool feature.

- - - -
- 👍 Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-14 21:49:19
-
-

*Thread Reply:* That is great! Thank you so much! This really helps!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-15 01:44:42
-
-

*Thread Reply:* List&lt;String&gt; dbPropertiesKeys = - Arrays.asList( - "orgId", - "spark.databricks.clusterUsageTags.clusterOwnerOrgId", - "spark.databricks.notebook.path", - "spark.databricks.job.type", - "spark.databricks.job.id", - "spark.databricks.job.runId", - "user", - "userId", - "spark.databricks.clusterUsageTags.clusterName", - "spark.databricks.clusterUsageTags.azureSubscriptionId"); - dbPropertiesKeys.stream() - .forEach( - (p) -&gt; { - dbProperties.put(p, jobStart.properties().getProperty(p)); - }); -It seems like it is obtaining these env variable information from the jobStart obj, but not capturing from the env directly?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2022-12-15 01:57:05
-
-

*Thread Reply:* I have opened an issue in the community here: https://github.com/OpenLineage/OpenLineage/issues/1419

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-01 02:24:39
-
-

*Thread Reply:* Hi @Paweł Leszczyński I have opened a PR for helping to add this use case. Please do help to see if we can merge it in. Thanks! -https://github.com/OpenLineage/OpenLineage/pull/1545

-
- - - - - - - -
-
Labels
- integration/spark -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- 👀 Maciej Obuchowski, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-02 11:45:52
-
-

*Thread Reply:* Hey @Anirudh Shrinivason, sorry for late reply, but I reviewed the PR.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-06 03:06:42
-
-

*Thread Reply:* Hey thanks a lot! I have made the requested changes! Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-06 03:06:49
-
-

*Thread Reply:* @Maciej Obuchowski ^ 🙂

- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-06 09:09:34
-
-

*Thread Reply:* Hey @Anirudh Shrinivason, took a look at it but it unfortunately fails integration tests (throws NPE), can you take a look again?

- -

23/02/06 12:18:39 ERROR AsyncEventQueue: Listener OpenLineageSparkListener threw an exception - java.lang.NullPointerException - at io.openlineage.spark.agent.EventEmitter.&lt;init&gt;(EventEmitter.java:39) - at io.openlineage.spark.agent.OpenLineageSparkListener.initializeContextFactoryIfNotInitialized(OpenLineageSparkListener.java:276) - at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:80) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1433) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-07 04:17:02
-
-

*Thread Reply:* Hi yeah my bad. It should be fixed in the latest push. But I think the tests are not running in the CI because of some GCP environment issue? I am not really sure how to fix it...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-07 04:18:46
-
-

*Thread Reply:* I can make them run, it's just that running them on forks is disabled. We need to make it more clear I suppose

- - - -
- 👍 Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-07 04:24:38
-
-

*Thread Reply:* Ahh I see thanks! Also, some of the tests are failing on my local, such as https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/test/java/io/openlineage/spark/agent/lifecycle/DeltaDataSourceTest.java. Is this expected behaviour?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-07 07:20:11
-
-

*Thread Reply:* tests failing isn't expected behaviour 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-08 03:37:23
-
-

*Thread Reply:* Ahh yeap it was a local ide issue on my side. I added some tests to verify the presence of env variables too.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-08 03:47:22
-
-

*Thread Reply:* @Anirudh Shrinivason let me know then when you'll push fixed version, I can run full tests then

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-08 03:49:35
-
-

*Thread Reply:* I have pushed just now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-08 03:49:39
-
-

*Thread Reply:* You can run the tests

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-08 04:13:07
-
-

*Thread Reply:* @Maciej Obuchowski mb I pushed again rn. Missed out a closing bracket.

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-10 00:47:04
-
-

*Thread Reply:* @Maciej Obuchowski Hi, could we merge this PR in? I'd like to see if we can have these changes in the new release...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2022-12-15 17:14:02
-
-

Hi All- I am sending lineage from ADF for each activity which i am performing. But the individual activities are representing correctly. How can I represent task1 as a parent to task2. can someone please share the sample json request for it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-12-16 13:29:44
-
-

*Thread Reply:* Hi 👋 this would require a series of JSON calls:

- -
  1. start the first task
  2. end the first task, specify output dataset
  3. start the second task, specify input dataset
  4. end the second task
  5. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-12-16 13:32:08
-
-

*Thread Reply:* in OpenLineage relationships are typically Job -> Dataset -> Job, so -• you create a relationship between datasets by referring to them in the same job - i.e., this task ran that read from these datasets and wrote to those datasets -• you create a relationship between tasks by referring to the same datasets across both of them - i.e., this task wrote that dataset and this other task read from it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-12-16 13:35:06
-
-

*Thread Reply:* @Bramha Aelem if you look in this directory, you can find example start/complete JSON calls that show how to specify input/output datasets.

- -

(it’s an airflow workshop, but those examples are for a part of the workshop that doesn’t involve airflow)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2022-12-16 13:35:46
-
-

*Thread Reply:* (these can also be found in the docs)

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2022-12-16 14:49:30
-
-

*Thread Reply:* @Ross Turk - Thanks for the details. will try and get back to you on it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2022-12-17 19:53:21
-
-

*Thread Reply:* @Ross Turk - Good Evening, It worked as expected. I am able to replicate the scenarios which I am looking for.

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2022-12-17 19:53:48
-
-

*Thread Reply:* @Ross Turk - Thanks for your response.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-01-12 13:23:56
-
-

*Thread Reply:* @Ross Turk - First activity : I am making HTTP Call to pull the lookup data and store it in ADLS. -Second Activity : After the completion of first activity I am making Azure databricks call to use the lookup file and generate the output tables. How I can refer the databricks generated tables facets as an input to the subsequent activities in the pipeline. -When I refer it's as an input the spark tables metadata is not showing up. How can this be achievable. -After the execution of each activity in ADF Pipeline I am sending start and complete/fail event lineage to Marquez.

- -

Can someone please guide me on this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2022-12-15 17:19:34
-
-

I am not using airflow in my Process. pls suggest

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2022-12-19 12:40:26
-
-

Hi All - Good Morning, how the column lineage of data source when it ran by different teams and jobs in openlineage.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Al (Koii) - (al@koii.network) -
-
2022-12-20 14:26:57
-
-

Hey folks! I'm al from Koii.network, very happy to have heard about this project :)

- - - -
- 👋 Willy Lulciuc, Maciej Obuchowski, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2022-12-20 14:27:59
-
-

*Thread Reply:* welcome! let’s us know if you have any questions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Matt Menzenski - (matt@payitgov.com) -
-
2022-12-29 08:22:26
-
-

Hello! I found the OpenLineage project today after searching for “OpenTelemetry” in the dbt Slack.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2022-12-29 10:47:00
-
-

*Thread Reply:* Hey Matt! Happy to have you here! Feel free to reach out if you have any questions

- - - -
- :gratitude_thank_you: Matt Menzenski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Max - (maxime.broussard@gmail.com) -
-
2022-12-30 05:33:40
-
-

Hi guys - I am really excited to test open lineage. -I had a quick question, sorry if this is not the right place for it. -We are testing dbt-ol with airflow and I was hoping this would by default push the number of rows updated/created in that dbt transformation to marquez. -It runs fine on airflow, but when I check in marquez there doesn't seem to be a 'dataset' created, only 'jobs' with job level metadata. -When i check here I see that the dataset facets should have it though https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md -Does anyone know if creating a dataset & sending row counts to OL is out of the box on dbt-ol or if I need to build another script to get that number from my snowflake instance and push it to OL as another step in my process? -Thanks a lot!

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Viraj Parekh - (vmpvmp94@gmail.com) -
-
2023-01-03 13:20:14
-
-

*Thread Reply:* @Ross Turk maybe you can help with this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-03 13:34:23
-
-

*Thread Reply:* hmm, I believe the dbt-ol integration does capture bytes/rows, but only for some data sources: https://github.com/OpenLineage/OpenLineage/blob/6ae1fd5665d5fd539b05d044f9b6fb831ce9d475/integration/common/openlineage/common/provider/dbt.py#L567

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-03 13:34:58
-
-

*Thread Reply:* I haven't personally tried it with Snowflake in a few versions, but the code suggests that it's one of them.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-03 13:35:42
-
-

*Thread Reply:* @Max you say your dbt-ol run is resulting in only jobs and no datasets emitted, is that correct?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-03 13:38:06
-
-

*Thread Reply:* if so, I'd say something rather strange is going on because in my experience each model should result in a Job and a Dataset.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kuldeep - (kuldeep.marathe@affirm.com) -
-
2023-01-03 00:41:09
-
-

Hi All, Curious to see if there is an openlineage integration with luigi or any open source projects working on it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kuldeep - (kuldeep.marathe@affirm.com) -
-
2023-01-03 01:53:10
-
-

*Thread Reply:* I was looking for something similar to the airflow integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Viraj Parekh - (vmpvmp94@gmail.com) -
-
2023-01-03 13:21:18
-
-

*Thread Reply:* hey @Kuldeep - i don't think there's something for Luigi right now - is that something you'd potentially be interested in?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kuldeep - (kuldeep.marathe@affirm.com) -
-
2023-01-03 13:23:53
-
-

*Thread Reply:* @Viraj Parekh Yes this is something we are interested in! There are a lot of projects out there that use luigi

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-03 11:05:48
-
-

Hello all, I’m opening a vote to release OpenLineage 0.19.0, including: -• new extractors for Trino and S3FileTransformOperator in the Airflow integration -• a new, standardized run facet in the Airflow integration -• a new NominalTimeRunFacet and OwnershipJobFacet in the Airflow integration -• Postgres support in the dbt integration -• a new client-side proxy (skeletal version) -• a new, improved mechanism for passing conf parameters to the OpenLineage client in the Spark integration -• a new ExtractionErrorRunFacet to reflect internal processing errors for the SQL parser -• testing improvements, bug fixes and more. -As always, three +1s from committers will authorize an immediate release. Thanks in advance!

- - - -
- ➕ Willy Lulciuc, Maciej Obuchowski, Paweł Leszczyński, Jakub Dardziński, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-01-03 23:07:59
-
-

*Thread Reply:* Hi @Michael Robinson a new, improved mechanism for passing conf parameters to the OpenLineage client in the Spark integration -Would it be possible to have more details on what this entails please? Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-04 09:21:46
-
-

*Thread Reply:* @Tomasz Nazarewicz might explain this better

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2023-01-04 10:04:22
-
-

*Thread Reply:* @Anirudh Shrinivason until now If you wanted to add new property to OL client, you had to also implement it in the integration because it had to parse all properties, create appropriate objects etc. New implementation makes client properties transparent to integration, they are only passed through and parsing happens inside the client.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-04 13:02:39
-
-

*Thread Reply:* Thanks, all. The release is authorized and will commence shortly 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-01-04 22:00:55
-
-

*Thread Reply:* @Tomasz Nazarewicz Ahh I see. Okay thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-05 10:37:09
-
-

@channel -This month’s OpenLineage TSC meeting is next Thursday, January 12th, at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom. All are welcome! -On the tentative agenda:

- -
  1. Recent release overview @Michael Robinson
  2. Column lineage update @Maciej Obuchowski
  3. Airflow integration improvements @Jakub Dardziński
  4. Discussions: -• Real-world implementation of OpenLineage (What does it really mean?) @Sheeri Cabral (Collibra) -• Using namespaces @Michael Robinson
  5. Open discussion -Notes: https://bit.ly/OLwiki -Is there a topic you think the community should discuss at this or a future meeting? Reply or DM me to add items to the agenda.
  6. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-01-05 23:45:38
-
-

*Thread Reply:* @Michael Robinson Will there be a recording?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-06 09:10:50
-
-

*Thread Reply:* @Anirudh Shrinivason Yes, and the recording will be here: https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-05 13:00:01
-
-

OpenLineage 0.19.2 is available now, including: -• Airflow: add Trino extractor #1288 @sekikn -• Airflow: add S3FileTransformOperator extractor #1450 @sekikn -• Airflow: add standardized run facet #1413 @JDarDagran -• Airflow: add NominalTimeRunFacet and OwnershipJobFacet #1410 @JDarDagran -• dbt: add support for postgres datasources #1417 @julienledem -• Proxy: add client-side proxy (skeletal version) #1439 #1420 @fm100 -• Proxy: add CI job to publish Docker image #1086 @wslulciuc -• SQL: add ExtractionErrorRunFacet #1442 @mobuchowski -• SQL: add column-level lineage to SQL parser #1432 #1461 @mobuchowski @StarostaGit -• Spark: pass config parameters to the OL client #1383 @tnazarew -• Plus bug fixes and testing and CI improvements. -Thanks to all the contributors, including new contributor Saurabh (@versaurabh) -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.19.2 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.18.0...0.19.2 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- ❤️ Julien Le Dem, Howard Yoo, Willy Lulciuc, Maciej Obuchowski, Kengo Seki, Harel Shein, Jarek Potiuk, Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-01-06 01:07:18
-
-

Question on Spark Integration and External Hive Metastores

- -

@Hanna Moazam and I are working with a team using OpenLineage and wants to extract out the server name of the hive metastore they're using when writing to a Hive table through Spark.

- -

For example, the hive metastore is an Azure SQL database and the table name is sales.transactions.

- -

OpenLineage will give something like /usr/hive/warehouse/sales.db/transactions for the name.

- -

However, this is not a complete picture since sales.db/transactions is defined like this for a given hive metastore. In Hive, you'd define the fully qualified name as sales.transactions@sqlservername.database.windows.net .

- -

Has anyone else come across this before? If not, we plan on raising an issue and suggesting we extract out the spark.hadoop.javax.jdo.option.ConnectionURL in the DatabricksEnvironmentFacetBuilder but ideally there would be a better way of extracting this.

- -

https://learn.microsoft.com/en-us/azure/databricks/data/metastores/external-hive-metastore#set-up-an-external-metastore-using-the-ui

- -

There was an issue by @Maciej Obuchowski or @Paweł Leszczyński that talked about providing a facet of the alias of a path but I can't find it at this point :(

-
-
learn.microsoft.com
- - - - - - - - - - - - - - - - - -
- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-01-09 02:28:43
-
-

*Thread Reply:* Hi @Hanna Moazam, we've written Jupyter notebook to demo dataset symlinks feature: -https://github.com/OpenLineage/workshops/blob/main/spark/dataset_symlinks.ipynb

- -

For scenario you describe, there should be symlink facet sent similar to: -{ - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.15.1/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet>", - "identifiers": [ - { - "namespace": "<hive://metastore>", - "name": "default.some_table", - "type": "TABLE" - } - ] -} -Within Openlineage Spark integration code, symlinks are included here: -https://github.com/OpenLineage/OpenLineage/blob/0.19.2/integration/spark/shared/src/main/java/io/openlineage/spark/agent/util/PathUtils.java#L75

- -

and they are added only when spark catalog is hive and metastore URI in spark conf is present.

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
- ➕ Maciej Obuchowski -
- -
- 🤯 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-01-09 14:21:10
-
-

*Thread Reply:* This is so awesome, @Paweł Leszczyński - Thank you so much for sharing this! I'm wondering if we could extend this to capture the hive JDBC Connection URL. I will explore this and put in an issue and PR to try and extend it. Thank you for the insights!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-11 12:00:02
-
-

@channel -Friendly reminder: this month’s OpenLineage TSC meeting is tomorrow at 10am, and all are welcome. https://openlineage.slack.com/archives/C01CK9T7HKR/p1672933029317449

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Maciej Obuchowski, Will Johnson, John Bagnall, AnnMary Justine, Willy Lulciuc, Minkyu Park, Paweł Leszczyński, Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2023-01-12 06:37:56
-
-

Hi, are there any plans to add an Azure EventHub transport similar to the Kinesis one?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-01-12 17:31:12
-
-

*Thread Reply:* @Varun Singh why not just use the KafkaTransport and the Event Hub's Kafka endpoint?

- -

https://github.com/yogyang/OpenLineage/blob/2b7fa8bbd19a2207d54756e79aea7a542bf7bb[…]/main/java/io/openlineage/client/transports/KafkaTransport.java

- -

https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-kafka-stream-analytics

-
-
learn.microsoft.com
- - - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-12 09:01:24
-
-

Following up on last month’s discussion (), I created the <#C04JPTTC876|spec-compliance> channel for further discussion

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-01-12 17:43:55
-
-

*Thread Reply:* @Julien Le Dem is there a channel to discuss the community call / ask follow-up questions on the communiyt call topics? For example, I wanted to ask more about the AirflowFacet and if we expected to introduce more tool specific facets into the spec. Where's the right place to ask that question? On the PR?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-17 15:11:05
-
-

*Thread Reply:* I think asking in #general is the right place. If there’s a specific github issue/PR, his is a good place as well. You can tag the relevant folks as well to get their attention

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-01-12 18:37:24
-
-

@here I am using the Spark listener and whenever a query like INSERT OVERWRITE TABLE gets executed it looks like I can see some outputs, but there are no symlinks for the output table. The operation type being executed is InsertIntoHadoopFsRelationCommand . I am not sure why I cna see symlinks for all the input tables but not the output tables. Anyone know the reason behind this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-01-13 02:30:37
-
-

*Thread Reply:* Hello @Allison Suarez, in case of InsertIntoHadoopFsRelationCommand, Spark Openlineage implementation uses method: -DatasetIdentifier di = PathUtils.fromURI(command.outputPath().toUri(), "file"); -(https://github.com/OpenLineage/OpenLineage/blob/0.19.2/integration/spark/shared/sr[…]ark/agent/lifecycle/plan/InsertIntoHadoopFsRelationVisitor.java)

- -

If the dataset identifier is constructed from a path, then no symlinks are added. That's the current behaviour.

- -

Calling io.openlineage.spark.agent.util.DatasetIdentifier#withSymlink(io.openlineage.spark.agent.util.DatasetIdentifier.Symlink) on DatasretIdentifier in InsertIntoHadoopFsRelationVisitor -could be a remedy to that.

- -

Do you have some Spark code snippet to reproduce this issue?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-01-22 10:04:56
-
-

*Thread Reply:* @Allison Suarez it would also be good to know what compute engine you're using to run your code on? On-Prem Apache Spark? Azure/AWS/GCP Databricks?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-13 18:18:52
-
-

*Thread Reply:* I created a custom visitor and fixed the issue that way, thank you!

- - - -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2023-01-13 11:44:19
-
-

Hi, I am trying to use kafka transport in spark for sending events to an EventHub but it requires me to set a property sasl.jaas.config which needs to have semicolons (;) in its value. But this gives an error about being unable to convert Array to a String. I think this is due to this line which splits property values into an array if they have a semicolon https://github.com/OpenLineage/OpenLineage/blob/92adbc877f0f4008928a420a1b8a93f394[…]pp/src/main/java/io/openlineage/spark/agent/ArgumentParser.java -Does this seem like a bug or is it intentional?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-01-13 14:39:51
-
-

*Thread Reply:* seems like a bug to me, but tagging @Tomasz Nazarewicz / @Paweł Leszczyński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2023-01-13 15:22:19
-
-

*Thread Reply:* So we needed a generic way of passing parameters to client and made an assumption that every field with ; will be treated as an array

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2023-01-14 02:00:04
-
-

*Thread Reply:* Thanks for the confirmation, should I add a condition to split only if it's a key that can have array values? We can have a list of such keys like facets.disabled

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2023-01-14 02:28:41
-
-

*Thread Reply:* We thought about this solution but it forces us to know the structure of each config and we wanted to avoid that as much as possible

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tomasz Nazarewicz - (tomasz.nazarewicz@getindata.com) -
-
2023-01-14 02:34:06
-
-

*Thread Reply:* Maybe the condition could be having ; and [] in the value

- - - -
- 👍 Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2023-01-15 08:14:14
-
-

*Thread Reply:* Makes sense, I can add this check. Thanks @Tomasz Nazarewicz!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Varun Singh - (varuntestaz@outlook.com) -
-
2023-01-16 01:15:19
-
-

*Thread Reply:* Created issue https://github.com/OpenLineage/OpenLineage/issues/1506 for this

-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-17 12:00:02
-
-

Hi everyone, I’m excited to share some good news about our progress in the LFAI & Data Foundation: we’ve achieved Incubation status! This required us to earn a Silver Badge from the OpenSSF, get 300+ stars on GitHub (which was NBD as we have over 1100 already), and win the approval of the LFAI & Data’s TAC. Now that we’ve cleared this hurdle, we have access to additional services from the foundation, including assistance with creative work, marketing and communication support, and event-planning assistance. Graduation from the program, which will earn us a voting seat on the TAC, is on the horizon. Stay tuned for updates on our progress with the foundation.

- -

LF AI & Data is an umbrella foundation of the Linux Foundation that supports open source innovation in artificial intelligence (AI) and data. LF AI & Data was created to support open source AI and data, and to create a sustainable open source AI and data ecosystem that makes it easy to create AI and data products and services using open source technologies. They foster collaboration under a neutral environment with an open governance in support of the harmonization and acceleration of open source technical projects.

- -

For more info about the foundation and other LFAI & Data projects, visit their website.

- - - -
- ❤️ Julien Le Dem, Paweł Leszczyński, Maciej Obuchowski, Ross Turk, Jakub Dardziński, Minkyu Park, Howard Yoo, Jarek Potiuk, Danilo Mota, Willy Lulciuc, Kengo Seki, Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-17 15:53:12
-
-

if you want to share this news (and I hope you do!) there is a blog post here: https://openlineage.io/blog/incubation-stage-lfai/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-17 15:54:07
-
-

and I'll add a quick shoutout of @Michael Robinson, who has done a whole lot of work to make this happen 🎉 thanks, man, you're awesome!

- - - -
- 🙌 Howard Yoo, Maciej Obuchowski, Jarek Potiuk, Minkyu Park, Willy Lulciuc, Kengo Seki, Paweł Leszczyński, Varun Singh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-17 15:56:38
-
-

*Thread Reply:* Thank you, Ross!! I appreciate it. I might have coordinated it, but it’s been a team effort. Lots of folks shared knowledge and time to help us check all the boxes, literally and figuratively (lots of boxes). ;)

- - - -
- ☑️ Willy Lulciuc, Paweł Leszczyński, Viraj Parekh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jarek Potiuk - (jarek@potiuk.com) -
-
2023-01-17 16:03:36
-
-

Congrats @Michael Robinson and @Ross Turk - > major step for Open Lineage!

- - - -
- 🙌 Michael Robinson, Maciej Obuchowski, Jakub Dardziński, Julien Le Dem, Ross Turk, Willy Lulciuc, Kengo Seki, Viraj Parekh, Paweł Leszczyński, Anirudh Shrinivason, Robert -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhir Nune - (sudhir.nune@kraftheinz.com) -
-
2023-01-18 11:15:02
-
-

Hi all, I am new to the https://openlineage.io/integration/dbt/, I followed the steps on Windows Laptop. But the dbt-ol does not get executed.

- -

'dbt-ol' is not recognized as an internal or external command, -operable program or batch file.

- -

I see the following Packages installed too -openlineage-dbt==0.19.2 -openlineage-integration-common==0.19.2 -openlineage-python==0.19.2

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-18 11:17:14
-
-

*Thread Reply:* What are the errors?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhir Nune - (sudhir.nune@kraftheinz.com) -
-
2023-01-18 11:18:09
-
-

*Thread Reply:* 'dbt-ol' is not recognized as an internal or external command, -operable program or batch file.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-19 11:11:09
-
-

*Thread Reply:* Hm, I think this is due to different windows conventions around scripts.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-19 14:26:35
-
-

*Thread Reply:* I have not tried it on Windows before myself, but on mac/linux if you make a Python virtual environment in venv/ and run pip install openlineage-dbt, the script winds up in ./venv/bin/dbt-ol.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-19 14:27:04
-
-

*Thread Reply:* (maybe that helps!)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-19 14:38:23
-
-

*Thread Reply:* This might not work, but I think I have an idea that would allow it to run as python -m dbt-ol run ...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-19 14:38:27
-
-

*Thread Reply:* That needs one fix though

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhir Nune - (sudhir.nune@kraftheinz.com) -
-
2023-01-19 14:40:52
-
-

*Thread Reply:* Hi @Maciej Obuchowski, thanks for the input, when I try to use python -m dbt-ol run, I see the below error :( -\python.exe: No module named dbt-ol

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-24 13:23:56
-
-

*Thread Reply:* We’re seeing a similar issue with the Great Expectations integration at the moment. This is purely a guess, but what happens when you try with openlineage-dbt 0.18.0?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-24 13:24:36
-
-

*Thread Reply:* @Michael Robinson GE issue is on Windows?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-24 13:24:49
-
-

*Thread Reply:* No, not Windows

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-01-24 13:24:55
-
-

*Thread Reply:* (that I know of)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhir Nune - (sudhir.nune@kraftheinz.com) -
-
2023-01-24 13:46:39
-
-

*Thread Reply:* @Michael Robinson - I see the same error. I used 2 Combinations

- -
  1. Python 3.8.10 with openlineage-dbt 0.18.0 & Latest
  2. Python 3.9.7 with openlineage-dbt 0.18.0 & Latest
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-24 13:49:19
-
-

*Thread Reply:* Hm. You should be able to find the dbt-ol command wherever pip is installing the packages. In my case, that's usually in a virtual environment.

- -

But if I am not in a virtual environment, it installs the packages in my PYTHONPATH. You might try this to see if the dbt-ol script can be found in one of the directories in sys.path.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-24 13:58:38
-
-

*Thread Reply:* this can help you verify that your PYTHONPATH and PATH are correct - installing an unrelated python command-line tool and seeing if you can execute it:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-24 13:59:42
-
-

*Thread Reply:* Again, I think this is windows issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-24 14:00:54
-
-

*Thread Reply:* @Maciej Obuchowski you think even if dbt-ol could be found in the path, that might not be the issue?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhir Nune - (sudhir.nune@kraftheinz.com) -
-
2023-01-24 14:15:13
-
-

*Thread Reply:* Hi @Ross Turk - I could not find the dbt-ol in the site-packages.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-24 14:16:48
-
-

*Thread Reply:* Hm 😕 then perhaps @Maciej Obuchowski is right and there is a bigger issue here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhir Nune - (sudhir.nune@kraftheinz.com) -
-
2023-01-24 14:31:15
-
-

*Thread Reply:* @Ross Turk & @Maciej Obuchowski I see the issue event when I do the install using the https://pypi.org/project/openlineage-dbt/#files - openlineage-dbt-0.19.2.tar.gz.

- -

For some reason, I see only the following folder created

- -
  1. openlineage
  2. openlineage_dbt-0.19.2.dist-info
  3. openlineageintegrationcommon-0.19.2.dist-info
  4. openlineage_python-0.19.2.dist-info -and not brining in the openlineage-dbt-0.19.2, which has the scripts/dbt-ol
  5. -
- -

If it helps I am using pip 21.2.4

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2023-01-18 18:40:32
-
-

@Paul Villena @Stephen Said and Vishwanatha Nayak published an AWS blog Automate data lineage on Amazon MWAA with OpenLineage

-
-
Amazon Web Services
- - - - - - - - - - - - - - - - - -
- - - -
- 👀 Ross Turk, Peter Hicks, Willy Lulciuc -
- -
- 🔥 Ross Turk, Willy Lulciuc, Michael Collado, Peter Hicks, Minkyu Park, Julien Le Dem, Kengo Seki, Anirudh Shrinivason, Paweł Leszczyński, Maciej Obuchowski, Harel Shein, Paul Wilson Villena -
- -
- ❤️ Willy Lulciuc, Minkyu Park, Julien Le Dem, Kengo Seki, Paweł Leszczyński, Viraj Parekh -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-18 18:54:57
-
-

*Thread Reply:* This is excellent! May we promote it on openlineage and marquez social channels?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-01-18 18:55:30
-
-

*Thread Reply:* This is an amazing write up! 🔥 💯 🚀

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Francis McGregor-Macdonald - (francis@mc-mac.com) -
-
2023-01-18 19:49:46
-
-

*Thread Reply:* Happy to have it promoted. 😄 -Vish posted on LinkedIn: https://www.linkedin.com/posts/vishwanatha-nayak-b8462054automate-data-lineage-on-amazon-mwaa-with-activity-7021589819763945473-yMHF?utmsource=share&utmmedium=memberios|https://www.linkedin.com/posts/vishwanatha-nayak-b8462054automate-data-lineage-on-amazon-mwaa-with-activity-7021589819763945473-yMHF?utmsource=share&utmmedium=memberios if you want something to repost there.

-
-
linkedin.com
- - - - - - - - - - - - - - - - - -
- - - -
- ❤️ Willy Lulciuc, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-01-19 00:13:26
-
-

Hi guys, I am trying to build the openlineage jar locally for spark. I ran ./gradlew shadowJar in the /integration/spark directory. However, I am getting this issue: -** What went wrong: -A problem occurred evaluating project ':app'. -&gt; Could not resolve all files for configuration ':app:spark33'. - &gt; Could not resolve io.openlineage:openlineage_java:0.20.0-SNAPSHOT. - Required by: - project :app &gt; project :shared - &gt; Could not resolve io.openlineage:openlineage_java:0.20.0-SNAPSHOT. - &gt; Unable to load Maven meta-data from <https://astronomer.jfrog.io/artifactory/maven-public-libs-snapshot/io/openlineage/openlineage-java/0.20.0-SNAPSHOT/maven-metadata.xml>. - &gt; Could not GET '<https://astronomer.jfrog.io/artifactory/maven-public-libs-snapshot/io/openlineage/openlineage-java/0.20.0-SNAPSHOT/maven-metadata.xml>'. Received status code 401 from server: Unauthorized -It used to work a few weeks ago...May I ask if anyone would know what the reason might be? Thanks! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-01-19 03:58:42
-
-

*Thread Reply:* Hello @Anirudh Shrinivason, you need to build your openlineage-java package first. Possibly you built in some time ao in different version

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-01-19 03:59:28
-
-

*Thread Reply:* ./gradlew clean build publishToMavenLocal -in /client/java should help.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-01-19 04:34:33
-
-

*Thread Reply:* Ahh yeap this works thanks! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-01-19 09:17:01
-
-

Are there any resources to explain the differences between lineage with Apache Atlas vs. lineage using OpenLineage? we have discussions with customers and partners, and some of them are looking into which is more “ready for industry”.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-19 11:03:39
-
-

*Thread Reply:* It's been a while since I looked at Atlas, but does it even now supports something else than very Java Apache-adjacent projects like Hive and HBase?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-19 13:10:11
-
-

*Thread Reply:* To directly answer your question @Sheeri Cabral (Collibra): I am not aware of any resources currently that explain this 😞 but I would welcome the creation of one & pitch in where possible!

- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-01-20 17:00:25
-
-

*Thread Reply:* I don’t know enough about Atlas to make that doc.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Justine Boulant - (justine.boulant@seenovate.com) -
-
2023-01-19 10:43:18
-
-

Hi everyone, I am currently working on a project and we have some questions to use OpenLineage with Apache Airflow : -• How does it work : ux vs code/script? How can we implement it? a schema of its architecture for example -• What are the visual outputs available? -• Is the lineage done from A to Z? if there are multiple intermediary transformations for example? -• Is the lineage done horizontally across the organization or vertically on different system levels? or both? -• Can we upgrade it to industry-level? -• Does it work with Python and/or R? -• Does it read metadata or scripts? -Thanks a lot if you can help 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-19 11:00:54
-
-

*Thread Reply:* I think most of your questions will be answered by this video: https://www.youtube.com/watch?v=LRr-ja8_Wjs

-
-
YouTube
- -
- - - } - - Astronomer - (https://www.youtube.com/@Astronomer) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-19 13:10:58
-
-

*Thread Reply:* I agree - a lot of the answers are in that overview video. You might also take a look at the docs, they do a pretty good job of explaining how it works.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-19 13:19:34
-
-

*Thread Reply:* More explicitly: -• Airflow is an interesting platform to observe because it runs a large variety of workloads and lineage can only be automatically extracted for some of them -• In general, OpenLineage is essentially a standard and data model for lineage. There are integrations for various systems, including Airflow, that cause them to emit lineage events to an OpenLineage compatible backend. It's a push model. -• Marquez is one such backend, and the one I recommend for testing & development -• There are a few approaches for lineage in Airflow: - ◦ Extractors, which pair with Operators to extract and emit lineage - ◦ Manual inlets/outlets on a task, defined by a developer - useful for PythonOperator and other cases where an extractor can't do it auto - ◦ Orchestration of an underlying OpenLineage integration, like openlineage-dbt -• IDK about "A to Z", that depends on your environment. The goal is to capture every transformation. Depending on your pipeline, there may be a set of integrations that give you the coverage you need. We often find that there are gaps. -• It works with Python. You can use the openlineage-python client to emit lineage events to a backend. This is useful if there isn't an integration for something your pipeline does. -• It describes the pipeline by observing running jobs and the way they affect datasets, not the organization. I don't know what you mean by "industry-level". -• I am not aware of an integration that parses source code to determine lineage at this time. -• The openlineage-dbt integration consumes the various metadata that dbt leaves behind to construct lineage. Dunno if that's what you mean by "read metadata".

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-19 13:23:33
-
-

*Thread Reply:* FWIW I did a workshop on openlineage and airflow a while back, and it's all in this repo. You can find slides + a quick Python example + a simple Airflow example in there.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Justine Boulant - (justine.boulant@seenovate.com) -
-
2023-01-20 03:44:22
-
-

*Thread Reply:* Thanks a lot!! Very helpful!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-20 11:42:43
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-01-20 15:28:06
-
-

Hey folks, my team is working on a solution that would support the OL standard with column level lineage. I'm working through the architecture now and I'm wondering if everyone uses the standard rest api backed by a db or if other teams found success using other technologies such as webhooks, streams, etc in order to capture and process lineage events. I'd be very curious to connect on the topic

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-20 19:45:55
-
-

*Thread Reply:* Hello Brad, on top of my head:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-20 19:47:15
-
-

*Thread Reply:* • Marquez uses the API HTTP Post. so does Astro -• Egeria and Purview prefer consuming through a Kafka topic. There is a ProxyBackend that takes HTTP Posts and writes to Kafka. The client can also be configured to write to Kafka

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-20 19:48:09
-
-

*Thread Reply:* @Will Johnson @Mandy Chessell might have opinions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-20 19:49:10
-
-

*Thread Reply:* The Microsoft Purview approach is documented here: https://learn.microsoft.com/en-us/samples/microsoft/purview-adb-lineage-solution-accelerator/azure-databricks-to-purview-lineage-connector/

-
-
learn.microsoft.com
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-20 19:49:47
-
-

*Thread Reply:* There’s a blog post about Egeria here: https://openlineage.io/blog/openlineage-egeria/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-01-22 10:00:56
-
-

*Thread Reply:* @Brad Paskewitz at Microsoft, the solution that Julien linked above, we are using the HTTP Transport (REST API) as we are consuming the OpenLineage Events and transforming them to Apache Atlas / Microsoft Purview.

- -

However, there is a good deal of interest in using the kafka transport instead and that's our future roadmap.

- - - -
- 👍 Ross Turk, Brad Paskewitz -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Quentin Nambot - (qnambot@gmail.com) -
-
2023-01-25 09:59:13
-
-

❓ Hi everyone, I am trying to use openlineage with Databricks (using 11.3 LTS runtime, and openlineage 0.19.2) -Using this documentation I managed to install openlineage and send events to marquez -However marquez did not received all COMPLETE events, it seems like databricks cluster is shutdown immediatly at the end of the job. It is not the first time that i see this with databricks, last year I tried to use spline and we noticed that Databricks seems to not wait that spark session is nicely closed before shutting down instances (see this issue) -My question is: has anyone faced the same issue? Does somebody know a workaround? 🙏

-
-
spline
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2023-01-25 12:04:48
-
-

*Thread Reply:* Hmm, if Databricks is shutting the process down without waiting for the ListenerBus to clear, I don’t know that there’s a lot we can do. The best thing is to somehow delay the main application thread from exiting. One thing you could try is to subclass the OpenLineageSparkListener and generate a lock for each SparkListenerSQLExecutionStart and release it when the accompanying SparkListenerSQLExecutionEnd event is processed. Then, in the main application, block until all such locks are released. If you try it and it works, let us know!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Quentin Nambot - (qnambot@gmail.com) -
-
2023-01-26 05:46:35
-
-

*Thread Reply:* Ok thanks for the idea! I'll tell you if I try this and if it works 🤞

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Petr Hajek - (petr.hajek@profinit.eu) -
-
2023-01-25 10:12:42
-
-

Hi, would anybody be able and willing to help us configure S3 and Snowflake extractors within Airflow integration for one of our clients? Our trouble is that Airflow integration returns valid OpenLineage .json files but it lacks any information about input and output DataSets. Thanks in advance 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-01-25 10:38:03
-
-

*Thread Reply:* Hey Petr. Please DM me or describe the issue here 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-27 15:24:47
-
-

Hello.. I am trying to play with openlineage spark integration with Kafka and currently trying to just use the config as part of the spark submit command but I run into errors. Details in the 🧵

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-27 15:25:04
-
-

*Thread Reply:* Command -spark-submit --packages "io.openlineage:openlineage_spark:0.19.+" \ - --conf "spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener" \ - --conf "spark.openlineage.transport.type=kafka" \ - --conf "spark.openlineage.transport.topicName=topicname" \ - --conf "spark.openlineage.transport.localServerId=Kafka_server" \ - file.py

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-27 15:25:14
-
-

*Thread Reply:* 23/01/27 17:29:06 ERROR AsyncEventQueue: Listener OpenLineageSparkListener threw an exception -java.lang.NullPointerException - at io.openlineage.client.transports.TransportFactory.build(TransportFactory.java:44) - at io.openlineage.spark.agent.EventEmitter.&lt;init&gt;(EventEmitter.java:40) - at io.openlineage.spark.agent.OpenLineageSparkListener.initializeContextFactoryIfNotInitialized(OpenLineageSparkListener.java:278) - at io.openlineage.spark.agent.OpenLineageSparkListener.onApplicationStart(OpenLineageSparkListener.java:267) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:55) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-27 15:25:31
-
-

*Thread Reply:* I would appreciate any pointers on getting started with using openlineage-spark with Kafka.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-27 16:15:00
-
-

*Thread Reply:* Also this might seem a little elementary but the kafka topic itself, should it be hosted on the spark cluster or could it be any kafka topic?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 08:37:07
-
-

*Thread Reply:* 👀 Could I get some help on this, please?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 09:07:08
-
-

*Thread Reply:* I think any NullPointerException is clearly our bug, can you open issue on OL GitHub?

- - - -
- 👍 Susmitha Anandarao -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 09:30:51
-
-

*Thread Reply:* @Maciej Obuchowski Another interesting thing is if I use 0.19.2 version specifically, I get -23/01/30 14:28:33 INFO RddExecutionContext: RDDs are empty: skipping sending OpenLineage event

- -

I am trying to print to console at the moment. I haven't been able to get Kafka transport type working though.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 09:41:12
-
-

*Thread Reply:* Are you getting events printed on the console though? This log should not affect you if you're running, for example Spark SQL jobs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 09:42:28
-
-

*Thread Reply:* I am trying to run a python file using pyspark. 23/01/30 14:40:49 INFO RddExecutionContext: RDDs are empty: skipping sending OpenLineage event -I see this and don't see any events on the console.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 09:55:41
-
-

*Thread Reply:* Any logs filling pattern -log.warn("Unable to access job conf from RDD", nfe); -or -<a href="http://log.info">log.info</a>("Found job conf from RDD {}", jc); -before?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 09:57:20
-
-

*Thread Reply:* ```23/01/30 14:40:48 INFO DAGScheduler: Submitting ShuffleMapStage 0 (PairwiseRDD[2] at reduceByKey at /tmp/spark-20487725-f49b-4587-986d-e63a61890673/statusapidemo.py:47), which has no missing parents -23/01/30 14:40:49 WARN RddExecutionContext: Unable to access job conf from RDD -java.lang.NoSuchFieldException: Field is not instance of HadoopMapRedWriteConfigUtil - at io.openlineage.spark.agent.lifecycle.RddExecutionContext.lambda$setActiveJob$0(RddExecutionContext.java:117) - at java.util.Optional.orElseThrow(Optional.java:290) - at io.openlineage.spark.agent.lifecycle.RddExecutionContext.setActiveJob(RddExecutionContext.java:115) - at java.util.Optional.ifPresent(Optional.java:159) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$onJobStart$9(OpenLineageSparkListener.java:148) - at java.util.Optional.ifPresent(Optional.java:159) - at io.openlineage.spark.agent.OpenLineageSparkListener.onJobStart(OpenLineageSparkListener.java:145) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

- -

23/01/30 14:40:49 INFO RddExecutionContext: Found job conf from RDD Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-rbf-default.xml, hdfs-site.xml, hdfs-rbf-site.xml, resource-types.xml

- -

23/01/30 14:40:49 INFO RddExecutionContext: Found output path null from RDD PythonRDD[5] at collect at /tmp/spark-20487725-f49b-4587-986d-e63a61890673/statusapidemo.py:48 -23/01/30 14:40:49 INFO RddExecutionContext: RDDs are empty: skipping sending OpenLineage event``` -I see both actually.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 10:03:35
-
-

*Thread Reply:* I think this is same problem as this: https://github.com/OpenLineage/OpenLineage/issues/1521

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 10:04:14
-
-

*Thread Reply:* and I think I might have solution on a branch for it, just need to polish it up to release

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 10:13:37
-
-

*Thread Reply:* Aah got it. I will give it a try with SQL and a jar.

- -

Do you have a ETA on when the python issue would be fixed?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 10:37:51
-
-

*Thread Reply:* @Maciej Obuchowski Well I run into the same errors if I run spark-submit on a jar.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 10:38:44
-
-

*Thread Reply:* I think that has nothing to do with python

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 10:39:16
-
-

*Thread Reply:* BTW, which Spark version are you using?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 10:41:22
-
-

*Thread Reply:* We are on 3.3.1

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-30 11:38:24
-
-

*Thread Reply:* @Maciej Obuchowski Do you have a estimated release date for the fix. Our team is specifically interested in using the Emitter to write out to Kafka.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-01-30 11:46:30
-
-

*Thread Reply:* I think we plan to release somewhere in the next week

- - - -
- :gratitude_thank_you: Susmitha Anandarao -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-06 09:21:25
-
-

*Thread Reply:* @Susmitha Anandarao PR fixing this has been merged, release should be today

- - - -
- :gratitude_thank_you: Susmitha Anandarao -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-01-27 16:31:45
-
-

👋 -what would be the reason conn_id on something like SQLCheckOperator ends up being None when OpenLineage attempts to extract metadata but is fine on task execution?

- -

i'm using OpenLineage for Airflow 0.14.1 on 2.3.4 and i'm getting an error about connid not being found. it's a SQLCheckOperator where the check runs fine but the task fails because when OpenLineage goes to extract task metadata it attempts to grab the connid but at that moment it finds it to be None.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-01-27 18:38:40
-
-

*Thread Reply:* hmmm, I am not sure. perhaps @Benji Lampel can help, he’s very familiar with those operators.

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-01-27 18:46:15
-
-

*Thread Reply:* @Benji Lampel any help would be appreciated!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-01-30 09:01:34
-
-

*Thread Reply:* Hey Paul, the SQLCheckExtractors were written with the intent that they would be used by a provider that inherits for them - they are all treated as a sort of base class. What is the exact error message you're getting? And what is the operator code? -Could you try this with a PostgresCheckOperator ? -(Also, only the SqlColumnCheckOperator and SqlTableCheckOperator will provide data quality facets in their output, those functions are not implementable in the other operators at this time)

- - - -
- 👀 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-01-31 14:36:07
-
-

*Thread Reply:* @Benji Lampel here is the error message. i am not sure what the operator code is.

- -

3-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - Traceback (most recent call last): -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - self.run() -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/usr/lib/python3.8/threading.py", line 870, in run -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - self._target(**self._args, ****self._kwargs) -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/code/venvs/venv/lib/python3.8/site-packages/openlineage/airflow/listener.py", line 99, in on_running -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - task_metadata = extractor_manager.extract_metadata(dagrun, task) -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/code/venvs/venv/lib/python3.8/site-packages/openlineage/airflow/extractors/manager.py", line 28, in extract_metadata -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - extractor = self._get_extractor(task) -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/code/venvs/venv/lib/python3.8/site-packages/openlineage/airflow/extractors/manager.py", line 96, in _get_extractor -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - self.task_to_extractor.instantiate_abstract_extractors(task) -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/code/venvs/venv/lib/python3.8/site-packages/openlineage/airflow/extractors/extractors.py", line 118, in instantiate_abstract_extractors -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - task_conn_type = BaseHook.get_connection(task.conn_id).conn_type -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/code/venvs/venv/lib/python3.8/site-packages/airflow/hooks/base.py", line 67, in get_connection -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - conn = Connection.get_connection_from_secrets(conn_id) -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - File "/code/venvs/venv/lib/python3.8/site-packages/airflow/models/connection.py", line 430, in get_connection_from_secrets -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - raise AirflowNotFoundException(f"The conn_id `{conn_id}` isn't defined") -[2023-01-31, 00:32:38 UTC] {logging_mixin.py:115} WARNING - airflow.exceptions.AirflowNotFoundException: The conn_id `None` isn't defined

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-01-31 14:37:06
-
-

*Thread Reply:* and above that

- -

[2023-01-31, 00:32:38 UTC] {connection.py:424} ERROR - Unable to retrieve connection from secrets backend (EnvironmentVariablesBackend). Checking subsequent secrets backend. -Traceback (most recent call last): - File "/code/venvs/venv/lib/python3.8/site-packages/airflow/models/connection.py", line 420, in get_connection_from_secrets - conn = secrets_backend.get_connection(conn_id=conn_id) - File "/code/venvs/venv/lib/python3.8/site-packages/airflow/secrets/base_secrets.py", line 91, in get_connection - value = self.get_conn_value(conn_id=conn_id) - File "/code/venvs/venv/lib/python3.8/site-packages/airflow/secrets/environment_variables.py", line 48, in get_conn_value - return os.environ.get(CONN_ENV_PREFIX + conn_id.upper())

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-01-31 14:39:31
-
-

*Thread Reply:* sorry, i should mention we're wrapping over the CheckOperator as we're still migrating from 1.10.15 @Benji Lampel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-01-31 15:09:51
-
-

*Thread Reply:* What do you mean by wrapping the CheckOperator? Like how so, exactly? Can you show me the operator code you're using in the DAG?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-01-31 17:38:45
-
-

*Thread Reply:* like so

- -

class CustomSQLCheckOperator(CheckOperator): -....

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-01-31 17:39:30
-
-

*Thread Reply:* i think i found the issue though, we have our own get_hook function and so we don't follow the traditional Airflow way of setting CONN_ID which is why CONN_ID is always None and that path only gets called through OpenLineage which doesn't ever get called with our custom wrapper

- - - -
- ✅ Benji Lampel -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-01-30 03:50:39
-
-

Hi everyone, I am using openlineage to capture column level lineage from spark databricks. I noticed that the environment variables captured are only present in the start event, but are not present in the complete event. Is there a reason why it is implemented like this? It seems more intuitive that whatever variables are present in the start event should also be present in the complete event...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-31 08:30:37
-
-

Hi everyone.. Does the DBT integration provide an option to emit events to a Kafka topic similar to the Spark integration? I could not find anything regarding this in the documentation and I wanted to make sure if only http transport type is supported. Thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-31 12:57:47
-
-

*Thread Reply:* The dbt integration uses the python client, you should be able to do something similar than with the java client. See here: https://github.com/OpenLineage/OpenLineage/tree/main/client/python#kafka

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-01-31 13:26:33
-
-

*Thread Reply:* Thank you for this!

- -

I created a openlineage.yml file with the following data to test out the integration locally. -transport: - type: "kafka" - config: { 'bootstrap.servers': 'localhost:9092', } - topic: "ol_dbt_events" -However, I run into a no module named 'confluent_kafka' error from this code. -Running OpenLineage dbt wrapper version 0.19.2 -This wrapper will send OpenLineage events at the end of dbt execution. -Traceback (most recent call last): - File "/Users/susmithaanandarao/.pyenv/virtualenvs/dbt-examples-domain-repo/3.9.8/bin/dbt-ol", line 168, in &lt;module&gt; - main() - File "/Users/susmithaanandarao/.pyenv/virtualenvs/dbt-examples-domain-repo/3.9.8/bin/dbt-ol", line 94, in main - client = OpenLineageClient.from_environment() - File "/Users/susmithaanandarao/.pyenv/virtualenvs/dbt-examples-domain-repo/3.9.8/lib/python3.9/site-packages/openlineage/client/client.py", line 73, in from_environment - return cls(transport=get_default_factory().create()) - File "/Users/susmithaanandarao/.pyenv/virtualenvs/dbt-examples-domain-repo/3.9.8/lib/python3.9/site-packages/openlineage/client/transport/factory.py", line 37, in create - return self._create_transport(yml_config) - File "/Users/susmithaanandarao/.pyenv/virtualenvs/dbt-examples-domain-repo/3.9.8/lib/python3.9/site-packages/openlineage/client/transport/factory.py", line 69, in _create_transport - return transport_class(config_class.from_dict(config)) - File "/Users/susmithaanandarao/.pyenv/virtualenvs/dbt-examples-domain-repo/3.9.8/lib/python3.9/site-packages/openlineage/client/transport/kafka.py", line 43, in __init__ - import confluent_kafka as kafka -ModuleNotFoundError: No module named 'confluent_kafka' -Manually installing confluent-kafka worked. But I am curious why it was not automatically installed and if I am missing any config.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-02 14:39:29
-
-

*Thread Reply:* @Susmitha Anandarao It's not installed because it's large binary package. We don't want to install for every user something giant majority won't use, and it's 100x bigger than rest of the client.

- -

We need to indicate this way better, and do not throw this error directly at user thought, both in docs and code.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-01-31 11:28:53
-
-

~Hey, would love to see a release of OpenLineage~

- - - -
- ➕ Michael Robinson, Jakub Dardziński, Ross Turk, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-01-31 12:51:44
-
-

Hello, I have been working on a proposal to bring an OpenLineage provider to -Airflow. I am currently looking for feedback on a draft AIP. See the thread here: https://lists.apache.org/thread/2brvl4ynkxcff86zlokkb47wb5gx8hw7

- - - -
- 🔥 Maciej Obuchowski, Viraj Parekh, Jakub Dardziński, Enrico Rotundo, Harel Shein, Paweł Leszczyński -
- -
- 👀 Enrico Rotundo -
- -
- 🙌 Will Johnson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-01-31 14:02:21
-
-

@Willy Lulciuc, - Any updates on - https://github.com/OpenLineage/OpenLineage/discussions/1494

-
- - - - - - - -
-
Category
- Ideas -
- -
-
Comments
- 3 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Natalie Zeller - (natalie.zeller@naturalint.com) -
-
2023-02-02 08:26:38
-
-

Hello, -While trying to use OpenLineage with spark, I've noticed that sometimes the query execution is missing or already got closed (here is the relevant code). As a result, some of the events are skipped. Is this a known issue? Is there a way to overcome it?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-02 08:39:34
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/999#issuecomment-1209048556

- -

Does this fit your experience?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-02 08:39:59
-
-

*Thread Reply:* We sometimes experience this in context of very small, quick jobs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Natalie Zeller - (natalie.zeller@naturalint.com) -
-
2023-02-02 08:43:24
-
-

*Thread Reply:* Yes, my scenarios are dealing with quick jobs. -Good to know that we will be able to solve it with future spark versions. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-02 11:09:13
-
-

@channel -This month’s OpenLineage TSC meeting is next Thursday, February 9th, at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom. All are welcome! -On the tentative agenda:

- -
  1. Recent release overview @Michael Robinson
  2. AIP: OpenLineage in Airflow
  3. Discussions: -• Real-world implementation of OpenLineage (What does it really mean?) @Sheeri Cabral (Collibra) (continued) -• Using namespaces @Michael Robinson
  4. Open discussion -Notes: https://bit.ly/OLwiki -Is there a topic you think the community should discuss at this or a future meeting? Reply or DM me to add items to the agenda.
  5. -
- - - -
- 🔥 Maciej Obuchowski, Bramha Aelem, Viraj Parekh, Brad Paskewitz, Harel Shein -
- -
- 👍 Bramha Aelem, Viraj Parekh, Enrico Rotundo, Daniel Henneberger -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-03 13:22:51
-
-

Hi folks, I’m opening a vote to release OpenLineage 0.20.0, featuring: -• Airflow: add new extractor for GCSToGCSOperator - Adds a new extractor for this operator. -• Proxy: implement lineage event validator for client proxy
- Implements logic in the proxy (which is still in development) for validating and handling lineage events. -• A fix of a breaking change in the common integration and other bug fixes in the DBT, Airflow, Spark, and SQL integrations and in the Java and Python clients. -As per the policy here, three +1s from committers will authorize. Thanks in advance.

- - - -
- ➕ Willy Lulciuc, Maciej Obuchowski, Julien Le Dem, Jakub Dardziński, Howard Yoo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-03 13:24:03
-
-

*Thread Reply:* exciting to see the client proxy work being released by @Minkyu Park 💯

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-03 13:35:38
-
-

*Thread Reply:* This was without a doubt among the fastest release votes we’ve ever had 😉 . Thank you! You can expect the release to happen on Monday.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2023-02-03 14:02:52
-
-

*Thread Reply:* Lol the proxy is still in development and not ready for use

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-03 14:03:26
-
-

*Thread Reply:* Good point! Let’s make that clear in the release / docs?

- - - -
- 👍 Michael Robinson, Minkyu Park -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2023-02-03 14:03:33
-
-

*Thread Reply:* But it doesn’t block anything anyway, so happy to see the release

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Minkyu Park - (minkyu@datakin.com) -
-
2023-02-03 14:04:38
-
-

*Thread Reply:* We can celebrate that the proposal for the proxy is merged. I’m happy with that 🥳

- - - -
- 🎊 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Joanes - (djoanes@gmail.com) -
-
2023-02-06 00:01:49
-
-

Hey 👋 From what I gather, there's no solution to getting column level lineage from spark streaming jobs. Is there a issue I can follow to keep track?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-02-06 14:47:15
-
-

*Thread Reply:* Hey @Daniel Joanes! thanks for the question.

- -

I am not aware of an issue that captures this. Column-level lineage is a somewhat new facet in the spec, and implementations across the various integrations are in varying states of readiness.

- -

I invite you to create the issue - that way it's attributed to you, which makes sense because you're the one who first raised it. But I'm happy to create it for you & give you the PR# if you'd rather, just let me know 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Joanes - (djoanes@gmail.com) -
-
2023-02-06 14:50:59
-
-

*Thread Reply:* Go for it, once it's created i'll add a watch

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Joanes - (djoanes@gmail.com) -
-
2023-02-06 14:51:13
-
-

*Thread Reply:* Thanks Ross!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-02-06 23:10:30
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/1581

-
- - - - - - - -
-
Labels
- integration/spark, column-level-lineage -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-07 18:46:50
-
-

@channel -OpenLineage 0.20.4 is now available, including: -Additions: -• Airflow: add new extractor for GCSToGCSOperator #1495 @sekikn -• Flink: resolve topic names from regex, support 1.16.0 #1522 @pawel-big-lebowski -• Proxy: implement lineage event validator for client proxy #1469 @fm100 -Changes: -• CI: use ruff instead of flake8, isort, etc., for linting and formatting #1526 @mobuchowski -Plus many bug fixes & doc changes. -Thank you to all our contributors! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.20.4 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.19.2...0.20.4 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🎉 Kengo Seki, Harel Shein, Willy Lulciuc, Nadav Geva -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-08 15:31:32
-
-

@channel -Friendly reminder: this month’s OpenLineage TSC meeting is tomorrow at 10am, and all are welcome. https://openlineage.slack.com/archives/C01CK9T7HKR/p1675354153489629

- - - -
- ❤️ Minkyu Park, Kengo Seki, Paweł Leszczyński, Harel Shein, Sheeri Cabral (Collibra), Enrico Rotundo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-02-09 10:50:07
-
-

Hey, can we please schedule a release of OpenLineage? I would like to have a release that includes the latest fixes for Async Operator on Airflow and some dbt bug fixes.

- - - -
- ➕ Michael Robinson, Maciej Obuchowski, Benji Lampel, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-09 10:50:49
-
-

*Thread Reply:* Thanks for requesting a release. 3 +1s from committers will authorize an immediate release.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-09 11:15:35
-
-

*Thread Reply:* 0.20.5 ?

- - - -
- ➕ Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-02-09 11:28:20
-
-

*Thread Reply:* @Michael Robinson auth'd

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-09 11:32:06
-
-

*Thread Reply:* 👍 the release is authorized

- - - -
- ❤️ Sheeri Cabral (Collibra), Willy Lulciuc, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Avinash Pancham - (avinashpancham@outlook.com) -
-
2023-02-09 15:57:58
-
-

Hi all, I have been experimenting with OpenLineage for a few days and it's great! I successfully setup the openlineage-spark listener on my Databricks cluster and that pushes openlineage data to our Marquez backend. That was all pretty easy to do 🙂

- -

Now for my challenge: I would like to actually extend the metadata that my cluster pushes with custom values (you can think of spark config settings, commit hash of the executed code, or maybe even runtime defined values). I browsed through some documentation and found custom facets one can define. The link below describes how to use Python to push custom metadata to a backend, but I was actually hoping that there was a way to do this automatically in Spark. So ideally I would like to write my own OpenLineage.json (that has my custom facet) and tell Spark to use that Openlineage spec instead of the default one. In that way I hope my custom metadata will be forwarded automatically.

- -

I just do not know how to do that (and whether that is even possible), since I could not find any tutorials on that topic. Any help on this would be greatly appreciated!

- -

https://openlineage.io/docs/spec/facets/custom-facets

-
-
openlineage.io
- - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-02-09 16:23:36
-
-

*Thread Reply:* I am also exploring something similar, but writing to kafka, and would want to know more on how we could add custom metadata from spark.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-02-10 02:23:40
-
-

*Thread Reply:* Hi @Avinash Pancham @Susmitha Anandarao, it's great to hear about successful experimenting on your side.

- -

Although Openlineage spec provides some built-in facets definition, a facet object can be anything you want (https://openlineage.io/apidocs/openapi/#tag/OpenLineage/operation/postRunEvent). The example metadata provided in this chat could be put into job or run facets I believe.

- -

There is also a way to extend Spark integration to collect custom metadata described here (https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#extending). One needs to create own JAR with DatasetFacetBuilders, RunFacetsBuilder (whatever is needed). openlineage-spark integration will make use of those bulders.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-02-10 09:09:10
-
-

*Thread Reply:* (I would love to see what your specs are! I’m not with Astronomer, just a community member, but I am finding that many of the customizations people are making to the spec are valuable ones that we should consider adding to core)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-02-14 16:51:28
-
-

*Thread Reply:* Are there any examples out there of customizations already done in Spark? An example would definitely help!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-15 08:43:08
-
-

*Thread Reply:* I think @Will Johnson might have something to add about customization

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-02-15 23:58:36
-
-

*Thread Reply:* Oh man... Mike Collado did a nice write up on Slack of how many different ways there are to customize / extend OpenLineage. I know we all talked about doing a blog post at one point!

- -

@Susmitha Anandarao - You might take a look at https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/[…]ark/agent/facets/builder/DatabricksEnvironmentFacetBuilder.java which has a hard coded set of properties we are extracting.

- -

It looks like Avinash's changes were accepted as well: https://github.com/OpenLineage/OpenLineage/pull/1545

- -
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-10 12:42:24
-
-

@channel -OpenLineage 0.20.6 is now available, including: -Additions -• Airflow: add new extractor for FTPFileTransmitOperator #1603 @sekikn -Changes -• Airflow: make extractors for async operators work #1601 @JDarDagran -Thanks to all our contributors! -For the bug fixes and details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.20.6 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.20.4...0.20.6 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🥳 Minkyu Park, Willy Lulciuc, Kengo Seki, Paweł Leszczyński, Anirudh Shrinivason, pankaj koti, Maciej Obuchowski -
- -
- ❤️ Minkyu Park, Ross Turk, Willy Lulciuc, Kengo Seki, Paweł Leszczyński, Anirudh Shrinivason, pankaj koti -
- -
- 🎉 Minkyu Park, Willy Lulciuc, Kengo Seki, Anirudh Shrinivason, pankaj koti -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-13 14:20:26
-
-

Hi everyone, in case you missed the announcement at the most recent community meeting, our first-ever meetup will be held on March 9th in Providence, RI. Join us there to learn more about the present and future of OpenLineage, meet other members of the ecosystem, learn about the project’s goals and fundamental design, and participate in a discussion about the future of the project. -Food will be provided, and the meetup is open to all. Don’t miss this opportunity to influence the direction of this important new standard! We hope to see you there. -More information: https://openlineage.io/blog/data-lineage-meetup/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🎉 Harel Shein, Ross Turk, Maciej Obuchowski, Kengo Seki, Paweł Leszczyński, Willy Lulciuc, Sheeri Cabral (Collibra) -
- -
- 🔥 Harel Shein, Ross Turk, Maciej Obuchowski, Anirudh Shrinivason, Kengo Seki, Paweł Leszczyński, Willy Lulciuc, Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Quentin Nambot - (qnambot@gmail.com) -
-
2023-02-15 04:52:27
-
-

Hi, I opened a PR to fix the way that Athena extractor get the database, but spark integration tests failed. However I don't think that it is related to my PR, since I only updated the Airflow integration -Can anybody help me with that please? 🙏

-
- - - - - - - -
-
Labels
- integration/airflow, extractor -
- -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Quentin Nambot - (qnambot@gmail.com) -
-
2023-02-15 04:52:59
- -
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-15 07:19:39
-
-

*Thread Reply:* @Quentin Nambot this happens because we run additional integration tests against real databases (like BigQuery) which aren't ever configured on forks, since we don't want to expose our secrets. We need to figure out how to make this experience better, but in the meantime we've pushed your code using git-push-fork-to-upstream-branch and it passes all the tests.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-15 07:21:49
-
-

*Thread Reply:* Feel free to un-draft your PR if you think it's ready for review

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Quentin Nambot - (qnambot@gmail.com) -
-
2023-02-15 08:03:56
-
-

*Thread Reply:* Ok nice thanks 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Quentin Nambot - (qnambot@gmail.com) -
-
2023-02-15 08:04:49
-
-

*Thread Reply:* I think it's ready, however should I update the version somewhere?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-15 08:42:39
-
-

*Thread Reply:* @Quentin Nambot I don't think so - it's just that you opened PR as Draft , so I'm not sure if you want to add something else to it.

- - - -
- 👍 Quentin Nambot -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Quentin Nambot - (qnambot@gmail.com) -
-
2023-02-15 08:43:36
-
-

*Thread Reply:* No I don't want to add anything so I opened it 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-15 21:26:37
-
-

@here I have a question about extending the spark integration. Is there a way to use a custom visitor factory? I am trying to see if I can add a visitor for a command that is not currently covered in this integration (AlterTableAddPartitionCommand). It seems that because its not in the base visitor factory I am unable to use the visitor I created.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - - -
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-15 21:32:19
-
-

*Thread Reply:* I have that set up already like this: -public class LyftOpenLineageEventHandlerFactory implements OpenLineageEventHandlerFactory { - @Override - public Collection&lt;PartialFunction&lt;LogicalPlan, List&lt;OutputDataset&gt;&gt;&gt; - createOutputDatasetQueryPlanVisitors(OpenLineageContext context) { - Collection&lt;PartialFunction&lt;LogicalPlan, List&lt;OutputDataset&gt;&gt;&gt; visitors = new ArrayList&lt;PartialFunction&lt;LogicalPlan, List&lt;OutputDataset&gt;&gt;&gt;(); - visitors.add(new LyftInsertIntoHadoopFsRelationVisitor(context)); - visitors.add(new AlterTableAddPartitionVisitor(context)); - visitors.add(new AlterTableDropPartitionVisitor(context)); - return visitors; - } -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-15 21:33:35
-
-

*Thread Reply:* do I just add a constructor? the visitorFactory is private so I wasn't sure if that's something that was intended to change

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-15 21:34:30
-
-

*Thread Reply:* .

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-15 21:34:49
-
-

*Thread Reply:* @Michael Collado

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2023-02-15 21:35:14
-
-

*Thread Reply:* The VisitorFactory is only used by the internal EventHandlerFactory. It shouldn’t be needed for your custom one

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2023-02-15 21:35:48
-
-

*Thread Reply:* Have you added the file to the META-INF folder of your jar?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-16 11:01:56
-
-

*Thread Reply:* yes, I am able to use my custom event handler factory with a list of visitors but for some reason I cant access the visitors for some commands (AlterTableAddPartitionCommand) is one

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-16 11:02:29
-
-

*Thread Reply:* so even if I set up everything correctly I am unable to reach the code for that specific visitor

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-16 11:05:22
-
-

*Thread Reply:* and my assumption is I can reach other commands but not this one because the command is not defined in the BaseVisitorFactory but maybe im wrong @Michael Collado

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2023-02-16 15:05:19
-
-

*Thread Reply:* the VisitorFactory is loaded by the InternalEventHandlerFactory here. However, the createOutputDatasetQueryPlanVisitors should contain a union of everything defined by the VisitorFactory as well as your custom visitors: see this code.

- - - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Collado - (collado.mike@gmail.com) -
-
2023-02-16 15:09:21
-
-

*Thread Reply:* there might be a conflict with another visitor that’s being matched against that command. Can you turn on debug logging and look for this line to see what visitor is being applied to that command?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-02-16 16:54:46
-
-

*Thread Reply:* This was helpful, it works now, thank you so much Michael!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
slackbot - -
-
2023-02-16 19:08:26
-
-

This message was deleted.

- - - -
- 👋 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:09:49
-
-

*Thread Reply:* what is the curl cmd you are running? and what endpoint are you hitting? (assuming Marquez?)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:18:28
-
-

*Thread Reply:* yep -I am running curl - X curl -X POST http://localhost:5000/api/v1/namespaces/test ^ - -H 'Content-Type: application/json' ^ - -d '{ownerName:"me", description:"no description"^ - }'

- -

the weird thing is the log where I don't have a 0.0.0.0 IP (the log correspond to the equivament postman command)

- -

marquez-api | WARN [2023-02-17 00:14:32,695] marquez.logging.LoggingMdcFilter: status: 405 -marquez-api | XXX.23.0.1 - - [17/Feb/2023:00:14:32 +0000] "POST /api/v1/namespaces/test HTTP/1.1" 405 52 "-" "PostmanRuntime/7.30.0" 2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:23:08
-
-

*Thread Reply:* Marquez logs all supported endpoints (and methods) on start up. For example, here are all the supported methods on /api/v1/namespaces/{namespace} : -marquez-api | DELETE /api/v1/namespaces/{namespace} (marquez.api.NamespaceResource) -marquez-api | GET /api/v1/namespaces/{namespace} (marquez.api.NamespaceResource) -marquez-api | PUT /api/v1/namespaces/{namespace} (marquez.api.NamespaceResource) -To ADD a namespace, you’ll want to use PUT (see API docs)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:26:23
-
-

*Thread Reply:* 3rd stupid question of the night -Sorry kept on trying POST who knows why

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:26:56
-
-

*Thread Reply:* no worries! keep the questions coming!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:29:46
-
-

*Thread Reply:* well, maybe because it’s so late on your end! get some rest!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:36:25
-
-

*Thread Reply:* Yeah but I want to see how it works -Right now I have a response 200 for the creation of the names ... but it seems that nothing occurred -nor on marquez front end (localhost:3000) -nor on the database

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:37:13
-
-

*Thread Reply:* can you curl the list namespaces endpoint?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:38:14
-
-

*Thread Reply:* yep : nothing changed -only default and food_delivery

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:38:47
-
-

*Thread Reply:* can you post your server logs? you should see the request

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:40:41
-
-

*Thread Reply:* marquez-api | XXX.23.0.4 - - [17/Feb/2023:00:30:38 +0000] "PUT /api/v1/namespaces/ciro HTTP/1.1" 500 110 "-" "-" 7 -marquez-api | INFO [2023-02-17 00:32:07,072] marquez.logging.LoggingMdcFilter: status: 200

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:41:12
-
-

*Thread Reply:* the server is returning a 500 ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-16 19:41:57
-
-

*Thread Reply:* odd that LoggingMdcFilter is logging 200

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:43:24
-
-

*Thread Reply:* Bit confused because now I realize that postman is returning bad request

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:43:51
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-16 19:44:30
-
-

*Thread Reply:* You'll notice that I go to use 3000 in the url -If I use 5000 I get No host

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-02-17 01:14:50
-
-

*Thread Reply:* odd, the API should be using port 5000, have you followed our quickstart for Marquez?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-17 03:43:29
-
-

*Thread Reply:* Hello Willy -I am starting from scratch followin instruction from https://openlineage.io/docs/getting-started/ -I am on Windows -Instead of -git clone git@github.com:MarquezProject/marquez.git && cd marquez -I run the -git clone

- -

git clone <https://github.com/MarquezProject/marquez.git> -But before I had to clear the auto carriage return in git -git config --global core.autocrlf false -This avoid an error message on marquez-api when running wait-for-it.sh àt line 1 where -#!/usr/bin/env bash -is otherwise read as -#!/usr/bin/env bash\r'

- -

It turns out that when switching off the autocr, this impacts some file containing marquez password ... and I get a fail on accessing the db -to overcome this I run notepad++ and replaced ALL the \r\n with \n -And in this way I managed to run -docker\up.sh and docker\down.sh -correctly (with or without seed ... with access to the db, via pgadmin)

- - - -
- 👍 Ernie Ostic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-20 03:40:48
-
-

*Thread Reply:* The issue is related to PostMan

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-17 03:39:07
-
-

Hi, I'd like to capture column lineage from spark, but also capture how the columns are transformed, and any column operations that are done too. May I ask if this feature is supported currently, or will be supported in future based on current timeline? Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-02-17 03:54:47
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, this is a great question. We included extra fields in OpenLineage spec to contain that information: -"transformationDescription": { - "type": "string", - "description": "a string representation of the transformation applied" -}, -"transformationType": { - "type": "string", - "description": "IDENTITY|MASKED reflects a clearly defined behavior. IDENTITY: exact same as input; MASKED: no original data available (like a hash of PII for example)" -} -so the standard is ready to support it. We included two fields, so that one can contain human readable description of what is happening. However, we don't have this implemented in Spark integration.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-17 04:02:30
-
-

*Thread Reply:* Thanks a lot! That is great. Is there a potential plan in the roadmap to support this for spark?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-02-17 04:08:16
-
-

*Thread Reply:* I think there will be a growing interest in that. In general a dependency may really difficult to express if many Spark operators are used on input columns to produce output one. The simple version would be just to detect indetity operation or some kind of hashing.

- -

To sum up, we don't have yet a proposal on that but this seems to be a natural next step in enriching column lineage features.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-17 04:40:04
-
-

*Thread Reply:* Got it. Thanks! If this item potentially comes on the roadmap, then I'd be happy to work with other interested developers to help contribute! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-02-17 04:43:00
-
-

*Thread Reply:* Great to hear that. What you could perhaps start with, is come to our monthly OpenLineage meetings and ask @Michael Robinson to put this item on discussions' list. There are many strategies to address this issue and hearing your story, usage scenario and would are you trying to achieve, would be super helpful in design and implementation phase.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-17 04:44:18
-
-

*Thread Reply:* Got it! The monthly meeting might be a bit hard for me to attend live, because of the time zone. But I'll try my best to make it to the next one! thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-17 09:46:22
-
-

*Thread Reply:* Thank you for bringing this up, @Anirudh Shrinivason. I’ll add it to the agenda of our next meeting because there might be interest from others in adding this to the roadmap.

- - - -
- 👍 Anirudh Shrinivason -
- -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-17 15:12:57
-
-

Hello -how can I improve the verbosity of the marquez-api? -Regards

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-02-20 02:10:13
-
-

*Thread Reply:* Hi @thebruuu, pls take a look at logging documentation of Dropwizard (https://www.dropwizard.io/en/latest/manual/core.html#logging) - the framework Marquez is implemented in. The logging configuration section is present in marquez.yml .

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-02-20 03:29:07
-
-

*Thread Reply:* Thank You Pavel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-21 02:23:40
-
-

Hey, can we please schedule a release of OpenLineage? I would like to have the release that includes the feature to capture custom env variables from spark clusters... Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-02-21 09:12:17
-
-

*Thread Reply:* We generally schedule a release every month, next one will be in the next week - is that okay @Anirudh Shrinivason?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-02-21 11:38:50
-
-

*Thread Reply:* Yes, there’s one scheduled for next Wednesday, if that suits.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-02-21 21:45:58
-
-

*Thread Reply:* Okay yeah sure that works. Thanks

- - - -
- 🙌 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-01 10:12:45
-
-

*Thread Reply:* @Anirudh Shrinivason we’re expecting the release to happen today or tomorrow, FYI

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-01 21:22:40
-
-

*Thread Reply:* Awesome thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jingyi Chen - (jingyi@cloudshuttle.com.au) -
-
2023-02-23 23:43:23
-
-

Hello team, we used OpenLineage and Great Expectations integrated. I want to use GE to verify the table in Snowflake. I found that the configuration I added OpenLineage into GE produced this error after running. Could someone please give me some answers? 👀 -File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/great_expectations/validation_operators/validation_operators.py", line 469, in _run_actions - action_result = self.actions[action["name"]].run( - File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/great_expectations/checkpoint/actions.py", line 106, in run - return self._run( - File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/openlineage/common/provider/great_expectations/action.py", line 156, in _run - datasets = self._fetch_datasets_from_sql_source( - File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/openlineage/common/provider/great_expectations/action.py", line 362, in _fetch_datasets_from_sql_source - self._get_sql_table( - File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/openlineage/common/provider/great_expectations/action.py", line 395, in _get_sql_table - if engine.connection_string: -AttributeError: 'Engine' object has no attribute 'connection_string' -'Engine' object has no attribute 'connection_string'

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jingyi Chen - (jingyi@cloudshuttle.com.au) -
-
2023-02-23 23:44:03
-
-

*Thread Reply:* This is my checkponit configuration in GE. -```name: 'openlineagecheckpoint' -configversion: 1.0 -templatename: -modulename: greatexpectations.checkpoint -classname: Checkpoint -runnametemplate: '%Y%m%d-%H%M%S-mycheckpoint' -expectationsuitename: EMAILVALIDATION -batchrequest: -actionlist:

  • name: storevalidationresult -action: - class_name: StoreValidationResultAction
  • name: storeevaluationparams -action: - class_name: StoreEvaluationParametersAction
  • name: updatedatadocs -action: - classname: UpdateDataDocsAction - sitenames: []
  • name: openlineage -action: - classname: OpenLineageValidationAction - modulename: openlineage.common.provider.greatexpectations - openlineagehost: http://localhost:5000 - # openlineageapiKey: 12345 - openlineagenamespace: geexpectations # Replace with your job namespace; we recommend a meaningful namespace like dev or prod, etc. - jobname: gevalidation -evaluationparameters: {} -runtime_configuration: {} -validations:
  • batchrequest: - datasourcename: LANDINGDEV - dataconnectorname: defaultinferreddataconnectorname - dataassetname: 'snowpipe.pii' - dataconnectorquery: - index: -1 -expectationsuitename: EMAILVALIDATION
  • -
- -

profilers: [] -gecloudid: -expectationsuitegecloudid:```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-02-24 11:31:05
-
-

*Thread Reply:* What version of GX are you running? And is this being run directly through GX or through Airflow with the operator?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jingyi Chen - (jingyi@cloudshuttle.com.au) -
-
2023-02-26 20:05:12
-
-

*Thread Reply:* I use the latest version of Great Expectations. This error occurs either directly through Great Expectations or airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-02-27 09:10:00
-
-

*Thread Reply:* I noticed another issue in the latest version as well. Try dropping to GE version great-expectations==0.15.44 for now. That is the latest one that works for me.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-02-27 09:11:34
-
-

*Thread Reply:* You should definitely open an issue here, and you can tag me @denimalpaca in the comment

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jingyi Chen - (jingyi@cloudshuttle.com.au) -
-
2023-02-27 18:07:29
-
-

*Thread Reply:* Thanks Benji, but I still have the same problem after I drop to great-expectations==0.15.44 , this is my requirement file

- -
great_expectations==0.15.44
-sqlalchemy
-psycopg2-binary
-numpy
-pandas
-snowflake-connector-python
-snowflake-sqlalchem
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-02-28 13:34:03
-
-

*Thread Reply:* interesting... I do think this may be a GX issue so let's see if they say anything. I can also cross post this thread to their slack

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Saravanan - (saravanan@athivatech.com) -
-
2023-03-01 00:27:30
-
-

Hello Team, I’m trying to use Open Lineage with AWS Glue and Marquez. Has anyone successfully integrated AWS Workflows/ Glue ETL jobs with Open Lineage?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-01 11:47:40
-
-

*Thread Reply:* I know I’m responding to an older post - I’m not sure if this would work in your environment? https://aws.amazon.com/blogs/big-data/build-data-lineage-for-data-lakes-using-aws-glue-amazon-neptune-and-spline/ -Are you using AWS Glue with Spark jobs?

-
-
Amazon Web Services
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Saravanan - (saravanan@athivatech.com) -
-
2023-05-02 15:16:14
-
-

*Thread Reply:* This was proposed by our AWS Solution architect but we are not seeing much improvement compared to open lineage. Have you deployed the above solution to prod?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-25 11:30:44
-
-

*Thread Reply:* We are currently in the research phase, so we have not deployed to prod. We have customers with thousands of existing scripts that they don’t want to rewrite to add openlineage libraries - i would imagine that if you are already integrating OpenLineage in your code, the spark listener isn’t an improvement. Our research is on magically getting lineage from existing scripts 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-01 09:42:23
-
-

Hello everyone, I’m opening a vote to release OpenLineage 0.21.0, featuring: -• a new CustomEnvironmentFacetBuilder class and new output visitors AlterTableAddPartitionCommandVisitor and AlterTableSetLocationCommandVisitor in the Spark integration -• a Linux-ARM version of the SQL parser’s native library -• DEBUG logging of events in transports -• bug fixes and more. -Three +1s from committers will authorize an immediate release.

- - - -
- ➕ Maciej Obuchowski, Jakub Dardziński, Benji Lampel, Natalie Zeller, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-01 10:26:22
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated as soon as possible.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nigel Jones - (nigel.l.jones@gmail.com) -
-
2023-03-02 03:52:03
-
-

I’ve got some security related questions/observations. The main site suggests opening an issue to report vulnerabilities etc. I wanted to check if there is a private mailing list/DM channel to just check a few things first? I’m happy to use github issues otherwise. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Moritz E. Beber - (midnighter@posteo.net) -
-
2023-03-02 05:15:55
-
-

*Thread Reply:* GitHub has a new issue template for reporting vulnerabilities, actually. If you use a config that enables this issue template.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-02 10:21:16
-
-

Reminder: our first meetup is one week from today in Providence, RI! You can find the details in the meetup blog post. And if you’re coming, it would be great if you could RSVP. Looking forward to seeing some of you there!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
- 🎉 Kengo Seki -
- -
- 🚀 Kengo Seki -
- -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-02 16:52:50
-
-

@channel -We released OpenLineage 0.21.1, including: -Additions -• Clients: add DEBUG logging of events to transports #1633 by @mobuchowski -• Spark: add CustomEnvironmentFacetBuilder class #1545 by New contributor @Anirudh181001 -• Spark: introduce the new output visitors AlterTableAddPartitionCommandVisitor and AlterTableSetLocationCommandVisitor #1629 by New contributor @nataliezeller1 -• Spark: add column lineage for JDBC relations #1636 by @tnazarew -• SQL: add linux-aarch64 native library to Java SQL parser #1664 by @mobuchowski -Changes -• Airflow: get table database in Athena extractor #1631 by New contributor @rinzool -Removals -• Airflow: remove JobIdMapping and update macros to better support Airflow version 2+ #1645 by @JDarDagran -Thanks to all our contributors! -For the bug fixes and details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.21.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.20.6...0.21.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🎉 Kengo Seki, Harel Shein, Maciej Obuchowski -
- -
- 🚀 Kengo Seki, Harel Shein, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-03-02 19:01:23
-
-

how do you turn off the openlineage listener in airflow 2? for some reason we're seeing a Thread-2 and seeing it fire twice in tasks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-03-02 20:04:19
-
-

*Thread Reply:* Hey @Paul Lee, are you seeing this happen for Async operators?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-03-02 20:06:00
-
-

*Thread Reply:* might be related to this issue https://github.com/OpenLineage/OpenLineage/pull/1601 -that was fixed in 0.20.6

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-03-03 16:15:44
-
-

*Thread Reply:* hmm perhaps.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-03-03 16:15:55
-
-

*Thread Reply:* @Harel Shein if i want to turn off openlineage listener how do i do that? do i just remove the package?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-03-03 16:24:07
-
-

*Thread Reply:* meaning, you don’t want openlineage to collect any information from your Airflow deployment?

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-03-03 16:24:50
-
-

*Thread Reply:* in that case, you could either remove it from your requirements file, or set OPENLINEAGE_DISABLED=True in your Airflow env vars

- - - -
- 👍 Paul Lee -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paul Lee - (paullee@lyft.com) -
-
2023-03-06 14:43:56
-
-

*Thread Reply:* removed it from requirements and also the backend key in airflow config. needed both

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-02 20:29:42
-
-

@channel -This month’s OpenLineage TSC meeting is next Thursday, March 9th, at 10 am PT. Join us on Zoom: https://bit.ly/OLzoom. All are welcome! -On the tentative agenda:

- -
  1. Recent release overview
  2. A new consumer
  3. Custom env variable support in Spark
  4. Async operator support in Airflow
  5. JDBC relations support in Spark
  6. Discussion topics: -• New feature idea: column transformations/operations in the Spark integration -• Using namespaces
  7. Open discussion -Notes: https://bit.ly/OLwiki -Is there a topic you think the community should discuss at this or a future meeting? Reply or DM me to add items to the agenda.
  8. -
- - - -
- 🙌 Willy Lulciuc, Paweł Leszczyński, Maciej Obuchowski, alexandre bergere -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-02 21:48:29
-
-

Hi everyone, I noticed that Openlineage is sending each of the events twice for spark. Is this expected? Is there some way to disable this behaviour?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Will Johnson - (will@willj.co) -
-
2023-03-02 23:46:08
-
-

*Thread Reply:* Are you seeing duplicate START events or do you see two events one that is a START and one that is COMPLETE?

- -

OpenLineage's events may send partial information. You should expect to collect all events for a given RunId and merge them together to get the complete events.

- -

In addition, some data sources are really chatty like Delta tables. That may cause you to see many events that look very similar.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-03 00:45:19
-
-

*Thread Reply:* Hmm...I'm seeing 2 start events for the same runnable command

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-03 00:45:27
-
-

*Thread Reply:* And 2 complete

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-03 00:46:08
-
-

*Thread Reply:* I am currently only testing on parquet tables...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-03-03 02:31:28
-
-

*Thread Reply:* One of openlineage assumptions is the ability to merge lineage events in the backend to make client integrations stateless. So, it is possible that Spark can emit multiple events for the same job. However, sometimes it does not make any sense to send or collect some events, which happened to us some time ago with delta. In that case we decided to filter them and created filtering mechanism (https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark/shared/src/main/java/io/openlineage/spark/agent/filters) than can be extended in case of other unwanted events being generated and sent.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-05 22:59:06
-
-

*Thread Reply:* Ahh I see...okay thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Daniel Joanes - (djoanes@gmail.com) -
-
2023-03-07 00:05:48
-
-

*Thread Reply:* in general , you should build any event consumer system with at least once semantics. Even if this issue is fixed, there is a possibility of duplicates for other valid scenarios

- - - -
- ➕ Maciej Obuchowski, Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-09 14:10:47
-
-

*Thread Reply:* Hi..I compared some duplicate 'START' events just now, and noticed that they are exactly the same, with the only exception of one of them having an 'environment-properties' field... Could I just quickly check if this is a bug or a feature haha?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-10 01:18:18
-
-

*Thread Reply:* CC: @Paweł Leszczyński ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-08 11:15:48
-
-

@channel -Reminder: this month’s OpenLineage TSC meeting is tomorrow at 10am PT. All are welcome. https://openlineage.slack.com/archives/C01CK9T7HKR/p1677806982084969

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-03-08 15:51:07
-
-

Hi if we have OpenLineage listener configured as a default spark conf, is there an easy way to disable ol for a specific notebook?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-03-08 17:30:44
-
-

*Thread Reply:* if you can set up env variables for particular notebooks, you can set OPENLINEAGE_DISABLED=true

- - - -
- :gratitude_thank_you: Susmitha Anandarao -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-03-10 13:15:41
-
-

Hey all,

- -

I opened a PR (and corresponding issue) to change how naming works in OpenLineage. The idea generally is to move from Naming.md as the end-all-be-all of names for integrations, and towards JSON schemas per integration, with each schema defining very precisely what fields a name and namespace should contain, how they're connected, and how they're validated. Would really appreciate some feedback as this is a pretty big change!

-
- - - - - - - -
-
Labels
- documentation, proposal -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sunil Patil - (spatil@twilio.com) -
-
2023-03-13 17:05:56
-
-

What do i need to do to enable dag level metric capturing for airflow. I followed the instruction to install openlineage 0.21.1 on airflow 2.3.3. When i run a DAG i see metrics related to Task start, success/failure. But i dont see any metrics for Dag success/failure. Do i have to do something to enable DAG execution capturing ?

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sunil Patil - (spatil@twilio.com) -
-
2023-03-13 17:08:53
-
-

*Thread Reply:* is DAG run capturing enabled starting airflow 2.5.1 ? https://github.com/apache/airflow/pull/27113

-
- - - - - - - -
-
Labels
- area:scheduler/executor, type:new-feature -
- -
-
Comments
- 6 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-03-13 17:11:47
-
-

*Thread Reply:* you're right, only the change was included in 2.5.0

- - - -
- 🙏 Sunil Patil -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sunil Patil - (spatil@twilio.com) -
-
2023-03-13 17:43:15
-
-

*Thread Reply:* Thanks Jakub

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-14 15:37:34
-
-

Fresh on the heels of our first-ever in-person event, we’re meeting up again soon at Data Council Austin! Join us on March 30th (the same day as @Julien Le Dem’s talk) at 12:15 pm to discuss the project’s goals and design, meet other members of the data ecosystem, and help shape the future of the spec. For more info, check out the OpenLineage blog. If you haven’t registered for the conference yet, click and use promo code OpenLineage20 for a special rate. Hope to see you there!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
-
-
tickettailor.com
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-03-15 15:11:18
-
-

If someone is using airflow and DAG-docs for lineage, can they export the lineage in, say, OL format?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-03-15 15:18:22
-
-

*Thread Reply:* I don’t see it currently on the AirflowRunFacet, but probably not a big deal to add it? @Benji Lampel wdyt?

-
- - - - - - - - - - - - - - - - -
- - - -
- ❤️ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Benji Lampel - (benjamin@astronomer.io) -
-
2023-03-15 15:22:00
-
-

*Thread Reply:* Definitely could be a good thing to have--is there not some info facet that could hold this data already? I don't see an issue with adding to the AirflowRunFacet tho (full disclosure, I'm not super familiar with this facet)

- - - -
- ❤️ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-03-15 15:58:40
-
-

*Thread Reply:* Perhaps DocumentationJobFacet or DocumentationDatasetFacet?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-03-15 15:13:55
-
-

(is it https://docs.astronomer.io/learn/airflow-openlineage ? )

-
-
docs.astronomer.io
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-03-17 12:31:02
-
-

Happy Friday 👋 I am looking for some help setting the parent information for a dbt run. I have set the namespace variable in the openlineage.yml but doesn't seem to take effect and ends up using the default value of dbt. Also using openlineage.yml to set the transport properties for emitting to kafka. Is there a way to set parent namespace, name and run id in the yml file? Thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-03-18 12:09:23
-
-

*Thread Reply:* dbt-ol does not read from openlineage.yml so you need to pass this information in OPENLINEAGE_NAMESPACE environment variable

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-03-20 15:17:03
-
-

*Thread Reply:* Hmmm. Interesting! I thought that it used client = OpenLineageClient.from_environment(), I’ll do some testing with Kafka backends.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Susmitha Anandarao - (susmitha.anandarao@gmail.com) -
-
2023-03-20 15:22:07
-
-

*Thread Reply:* Thank you for the hint. I was able to make it work with specifying the env OPENLINEAGE_CONFIGto specify the yml file holding transport info and OPENLINEAGE_NAMESPACE

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-03-20 15:24:05
-
-

*Thread Reply:* Awesome! That’s exactly what I was going to test.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-03-20 15:25:04
-
-

*Thread Reply:* I think it also works if you put it in $HOME/.openlineage/openlineage.yml.

- - - -
- :gratitude_thank_you: Susmitha Anandarao -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-03-21 08:32:17
-
-

*Thread Reply:* @Susmitha Anandarao I might have provided misleading information. I meant that dbt-ol does not read OL namespace from openlineage.yml but from OPENLINEAGE_NAMESPACE env var instead

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-21 13:48:28
-
-

Data Council Austin, the host of our next meetup, is one week away: https://openlineage.slack.com/archives/C01CK9T7HKR/p1678822654288379

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-03-21 13:52:52
-
-

In addition to Data Council Austin next week, the hybrid Big Data Technology Warsaw Summit will be taking place on March 28th-30th, featuring three of our committers: @Maciej Obuchowski, @Paweł Leszczyński and @Ross Turk ! There’s more info here: https://bigdatatechwarsaw.eu/

-
-
Big Data Technology Warsaw Summit
- - - - - - -
-
Estimated reading time
- 6 minutes -
- - - - - - - - - - - - -
- - - -
- 🙌 Howard Yoo, Maciej Obuchowski, Jakub Dardziński, Ross Turk, Perttu Salonen -
- -
- 👍 thebruuu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-03-22 22:38:26
-
-

hey folks, is anyone capturing dataset metadata for multi-table schemas? I'm looking at the schema dataset facet: https://openlineage.io/docs/spec/facets/dataset-facets/schema but it looks like this only represents a single table so im wondering if I'll need to write a custom facet

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-03-23 04:25:19
-
-

*Thread Reply:* It should be represented by multiple datasets, unless I misunderstood what you mean by multi-table

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-03-23 10:55:58
-
-

*Thread Reply:* here at Fivetran when we sync data it is generally 1 schema with multiple tables (sometimes many) so we would want to represent all of that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-03-23 11:11:25
-
-

*Thread Reply:* So what I understand:

- -
  1. your single job represents synchronization of multiple tables
  2. you want to have precise input-output dataset lineage? -am I right?
  3. -
- -

I would model that as multiple OL jobs that describe each dataset mappings. Additionally, I'd have one "wrapping" job that represents your definition of a job. Rest of those jobs would refer to it in ParentRunFacet.

- -

This is a pattern we use for Airflow and dbt dags.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-03-23 12:57:15
-
-

*Thread Reply:* Yes your statements are correct. Thanks for sharing that model, that makes sense to me

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-03-24 15:56:27
-
-

has anyone had success creating custom facets using java? I'm following this guide: https://openlineage.io/docs/spec/facets/custom-facets and im wondering if it makes sense to manually create POJOs or if others are creating the json schema for the object and then automatically generating the java code?

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-03-27 05:26:06
-
-

*Thread Reply:* I think it's better to just create POJO. This is what we do in Spark integration, for example.

- -

For now, JSON Schema generator isn't flexible enough to generate custom facets from whatever schema we give it, so it would be unnecessary complexity

-
- - - - - - - - - - - - - - - - -
- - - -
- :gratitude_thank_you: Brad Paskewitz -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-03-27 12:29:57
-
-

*Thread Reply:* Agreed, just a POJO would work. This is using Jackson, so you would use annotations as needed. You can also use a Jackson JSONNode or even Map.

- - - -
- :gratitude_thank_you: Brad Paskewitz -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-03-27 14:01:07
-
-

One other question: I'm in the process of adding different types of facets to our base payloads and I'm wondering if we have any related guidelines / best practices / standards / conventions. For example if I add a full source schema as a schema dataset facet to every start event it seems like that could be inefficient compared to a 1-time full-source-schema followed by incremental diffs for each following sync. Curious how others are thinking about + solving these types of problems in practice

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-03-27 17:59:28
-
-

*Thread Reply:* That depends on the OL consumer, but for something like SchemaDatasetFacet it seems to be okay to assume schema stays the same if not send.

- -

For others, like OutputStatisticsOutputDatasetFacet you definitely can't assume that, as the data is unique to each run.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-03-27 19:05:14
-
-

*Thread Reply:* ok great thanks, that makes sense to me

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Saravanan - (saravanan@athivatech.com) -
-
2023-03-27 21:42:20
-
-

Hi Team, I’m seeing creating data source, dataset API’s marked as deprecated . Can anyone point me how to create datasets via API calls?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-03-28 04:47:31
-
-

*Thread Reply:* OpenLineage API: https://openlineage.io/docs/getting-started/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-28 06:08:18
-
-

Hi everyone, I recently encountered this error saying V2SessionCatalog is not supported by openlineage. May I ask if support for this will be added in near future? Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-03-28 08:05:30
-
-

*Thread Reply:* I think it would be great to support V2SessionCatalog, and it would very much help if you created GitHub issue with more explanation and examples of it's use.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-29 02:53:37
-
-

*Thread Reply:* Sure thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-03-29 05:34:37
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/1747 -I have opened an issue here. Thanks! 🙂

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-17 11:53:52
-
-

*Thread Reply:* Hi @Maciej Obuchowski Just curious, is this issue on the potential roadmap for the next Openlineage release?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-02 19:37:27
-
-

Hi all! Can anyone provide me some advice on how to solve this error: -ValueError: `emit` only accepts RunEvent class -[2023-04-02, 23:22:00 UTC] {taskinstance.py:1326} INFO - Marking task as FAILED. dag_id=etl_openlineage, task_id=send_ol_events, execution_date=20230402T232112, start_date=20230402T232114, end_date=20230402T232200 -[2023-04-02, 23:22:00 UTC] {standard_task_runner.py:105} ERROR - Failed to execute job 400 for task send_ol_events (`emit` only accepts RunEvent class; 28020) -[2023-04-02, 23:22:00 UTC] {local_task_job.py:212} INFO - Task exited with return code 1 -[2023-04-02, 23:22:00 UTC] {taskinstance.py:2585} INFO - 0 downstream tasks scheduled from follow-on schedule check -I'm trying to follow this tutorial (https://openlineage.io/blog/openlineage-snowflake/) on connecting Snowflake to OpenLineage through Apache Airflow, however, the last step (sending the OpenLineage events) returns an error.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-03 09:32:46
-
-

*Thread Reply:* The blog post is a bit old and in the meantime there were changes in OpenLineage Python Client introduced. -May I ask if you want just to test the flow or looking for any viable Snowflake data lineage solution?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-04-03 10:47:57
-
-

*Thread Reply:* I believe that this will work if you change the line to client.transport.emit()

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-04-03 10:49:05
-
-

*Thread Reply:* (this would be in the dags/lineage folder, if memory serves)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-03 10:57:23
-
-

*Thread Reply:* Ross is right, that should work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-04 12:23:13
-
-

*Thread Reply:* This works! Thank you so much!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-04 12:24:40
-
-

*Thread Reply:* @Jakub Dardziński I want to use a viable Snowflake data lineage solution alongside a Amazon DataZone Catalog 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-04-04 13:03:58
-
-

*Thread Reply:* I have been meaning to revisit that tutorial 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-03 10:52:42
-
-

Hello all, -I’d like to open a vote to release OpenLineage 0.22.0, including: -• a new properties facet in the Spark integration -• a new field in HttpConfig for passing custom headers in the Spark integration -• improved namespace generation for JDBC connections in the Spark integration -• removal of unnecessary warnings about column lineage in the Spark integration -• support for alter, truncate, and drop statements in the SQL parser -• typing hints in the SQL integration -• a new from_dict class method in the Python client to support creating it from a dictionary -• a case-insensitive env variable for disabling OpenLineage in the Python client and Airflow integration -• bug fixes, docs changes, and more. -Three +1s from committers will authorize an immediate release. For more details about the release process, see GOVERNANCE.md.

- - - -
- ➕ Maciej Obuchowski, Perttu Salonen, Jakub Dardziński, Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-03 15:39:46
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated within 48 hours.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-03 16:55:44
-
-

@channel -We released OpenLineage 0.22.0, including: -Additions: -• Spark: add properties facet #1717 by @tnazarew -• SQL: SQLParser supports alter, truncate and drop statements #1695 by @pawel-big-lebowski -• Common/SQL: provide public interface for openlineage_sql package #1727 by @JDarDagran -• Java client: add configurable headers to HTTP transport #1718 by @tnazarew -• Python client: create client from dictionary #1745 by @JDarDagran -Changes: -• Spark: remove URL parameters for JDBC namespaces #1708 by @tnazarew -• Make OPENLINEAGE_DISABLED case-insensitive #1705 by @jedcunningham -Removals: -• Spark: remove unnecessary warnings for column lineage #1700 by @pawel-big-lebowski -• Spark: remove deprecated configs #1711 by @tnazarew -Thanks to all the contributors! -For the bug fixes and details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.22.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.21.1...0.22.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Jakub Dardziński, Francis McGregor-Macdonald, Howard Yoo, 김형은, Kengo Seki, Anirudh Shrinivason, Perttu Salonen, Paweł Leszczyński, Maciej Obuchowski, Harel Shein -
- -
- 🎉 Ross Turk, 김형은, Kengo Seki, Anirudh Shrinivason, Perttu Salonen -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-04 01:49:37
-
-

Hi everyone, if I set executors to 0, and bind address to localhost, and then if I want to use openlineage to capture metadata, I seem to run into an error where the executor tries to fetch the spark jar from the driver, even though there is no executor set. Then, it fails because a connection cannot be established. This is some of the error stack trace: -INFO Executor: Fetching spark://&lt;DRIVER_IP&gt;:44541/jars/io.openlineage_openlineage-spark-0.21.1.jar with timestamp 1680506544239 -ERROR Utils: Aborting task -java.io.IOException: Failed to connect to /&lt;DRIVER_IP&gt;:44541 - at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:287) - at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218) - at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230) - at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:399) - at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:367) - at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) - at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473) - at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:366) - at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:755) - at org.apache.spark.util.Utils$.fetchFile(Utils.scala:541) - at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:953) - at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:945) - at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) - at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) - at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) - at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) - at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) - at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) - at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876) - at <a href="http://org.apache.spark.executor.Executor.org">org.apache.spark.executor.Executor.org</a>$apache$spark$executor$Executor$$updateDependencies(Executor.scala:945) - at org.apache.spark.executor.Executor.&lt;init&gt;(Executor.scala:247) - at org.apache.spark.scheduler.local.LocalEndpoint.&lt;init&gt;(LocalSchedulerBackend.scala:64) - at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132) - at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220) - at org.apache.spark.SparkContext.&lt;init&gt;(SparkContext.scala:579) - at org.apache.spark.api.java.JavaSparkContext.&lt;init&gt;(JavaSparkContext.scala:58) - at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) - at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) - at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) - at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source) - at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) - at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) - at py4j.Gateway.invoke(Gateway.java:238) - at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) - at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) - at py4j.GatewayConnection.run(GatewayConnection.java:238) - at java.base/java.lang.Thread.run(Unknown Source) -Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /&lt;DRIVER_IP&gt;:44541 -Caused by: java.net.ConnectException: Connection refused - at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) - at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) - at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) - at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) - at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702) - at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) - at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) - at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) - at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) - at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) - at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) - at java.base/java.lang.Thread.run(Unknown Source) -Just curious if anyone here has run into a similar problem before, and what the recommended way to resolve this would be...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-04 13:39:19
-
-

*Thread Reply:* Do you have small configuration and job to replicate this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-04 22:21:35
-
-

*Thread Reply:* Yeah. For configs: -spark.driver.bindAddress: "localhost" -spark.master: "local[**]" -spark.sql.catalogImplementation: "hive" - spark.openlineage.transport.endpoint: "&lt;endpoint&gt;" - spark.openlineage.transport.type: "http" - spark.sql.catalog.spark_catalog: "org.apache.spark.sql.delta.catalog.DeltaCatalog" - spark.openlineage.transport.url: "&lt;url&gt;" - spark.extraListeners: "io.openlineage.spark.agent.OpenLineageSparkListener" - and job is submitted via spark submit in client mode with number of executors set to 0. -The spark job by itself could be anything...I think the job fails before initializing the spark session itself.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-04 22:23:19
-
-

*Thread Reply:* The issue is because of the spark.jars.packages config... spark.jars config also runs into the same issue. Because the executor tries to fetch the jar from driver for some reason even though there is no executors set...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-05 05:38:55
-
-

*Thread Reply:* TBH I'm not sure if we can do anything about it. Seems like just having any SparkListener which is not in Spark jars would fall under the same problems, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-10 06:07:11
-
-

*Thread Reply:* Yeah... Actually, this was because of binding the driver ip to localhost. In that case, the executor was not able to get the jar from the driver. But yeah I don't think we could have done anything from openlienage end anyway for this. Was just an interesting error to encounter lol

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lq Dodo - (tryopenmetadata@gmail.com) -
-
2023-04-04 12:07:21
-
-

Hi, I am new to open lineage. I was able to follow https://openlineage.io/getting-started/ to create a lineage "my-input-->my-job-->my-output". I want to use "my-output" as an input dataset, and connect to the next job, thing like this "my-input-->my-job-->my-output-->my-job2-->my-final-output". How to do it? I have trouble to set eventType and runId, etc. Once the new lineages get massed up, the Marquez UI becomes blank (which is a separated issue).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-04-04 13:02:21
-
-

*Thread Reply:* In this case you would have four runevents:

- -
  1. a START event on my-job where my-input is the input and my-output is the output, with a runId you generate on the client
  2. a COMPLETE event on my-job with the same runId from #1
  3. a START event on my-job2 where the input is my-output and the output is my-final-output, with a separate runId you generate
  4. a COMPLETE event on my-job2 with the same runId from #3
  5. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lq Dodo - (tryopenmetadata@gmail.com) -
-
2023-04-04 14:53:14
-
-

*Thread Reply:* thanks for the response. I tried it but now the UI only shows like one second and then turn to blank. I has similar issue before. It seems to me every time when I added a bad lineage, the UI stops working. I have to delete the docker image:-( Not sure whether it is MacOS M1 related issue.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-04-04 16:07:06
-
-

*Thread Reply:* Hmmm, that's interesting. Not sure I've seen that before. If you happen to catch it in that state again, perhaps capture the contents of the lineage_events table so it can be replicated.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lq Dodo - (tryopenmetadata@gmail.com) -
-
2023-04-04 16:24:28
-
-

*Thread Reply:* I can fairly easy to reproduce this blank UI issue. Apparently I used the same runId for two different jobs. If I use different unId (which I should), the lineage displays correctly. Thanks again!

- - - -
- 👍 Ross Turk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Lq Dodo - (tryopenmetadata@gmail.com) -
-
2023-04-04 16:41:54
-
-

Is it possible to add column level lineage via api? Let's say I have fields A,B,C from my-input, and A,B from my-output, and B,C from my-output-s3. I want to see, filter, or query by the column name.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-05 05:35:02
-
-

*Thread Reply:* You can add https://openlineage.io/docs/spec/facets/dataset-facets/column_lineage_facet/ to your datasets.

- -

However, I don't think you can currently do any filtering over it

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-04-05 13:20:20
-
-

*Thread Reply:* you can see a good example here, @Lq Dodo: https://github.com/MarquezProject/marquez/blob/289fa3eef967c8f7915b074325bb6f8f55480030/docker/metadata.json#L430

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lq Dodo - (tryopenmetadata@gmail.com) -
-
2023-04-06 11:48:48
-
-

*Thread Reply:* those examples really help. I can at least build the lineage with column level info using the apis. thanks a lot! Ideally I'd like select one column from the UI and then show me the column level graph. Seems not possible.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@datakin.com) -
-
2023-04-06 12:46:54
-
-

*Thread Reply:* correct, right now there isn't column-level metadata on the lineage graph 😞

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Pavani - (ylpavani@gmail.com) -
-
2023-04-05 22:01:33
-
-

Is airflow mandatory, while integrating snowflake with openlineage?

- -

I am currently looking for a solution which can capture lineage details from snowflake execution

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-04-06 10:22:17
-
-

*Thread Reply:* something needs to trigger lineage collection, are you using some sort of scheduler / execution engine?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Pavani - (ylpavani@gmail.com) -
-
2023-04-06 11:26:13
-
-

*Thread Reply:* Nope... We currently don't have scheduling tool. Isn't it possible to use open lineage api and collect the details?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-06 13:12:44
-
-

@channel -This month’s OpenLineage TSC meeting is on Thursday, April 20th, at 10 am PT. Meeting info: https://openlineage.io/meetings/. All are welcome! -On the tentative agenda:

- -
  1. Announcements
  2. Updates (new!) -a. OpenLineage in Airflow AIP -b. Static lineage support -c. Reworking namespaces
  3. Recent release overview
  4. A new consumer
  5. Caching support for column lineage
  6. Discussion items -a. Snowflake tagging
  7. Open discussion -Notes: https://bit.ly/OLwiki -Is there a topic you think the community should discuss at this or a future meeting? Reply or DM me to add items to the agenda.
  8. -
-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🚀 alexandre bergere, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-06 15:27:41
-
-

Hi!

- -

I have a specific question about how OpenLineage fits in between Amazon MWAA and Marquez on AWS EKS. I guess I need to change for example the etl_openlineage DAG in this Snowflake integration tutorial and the OPENLINEAGE_URL here. However, I'm wondering how to reproduce the Docker containers airflow, airflow_scheduler, and airflow_worker here.

- -

I heard from @Ross Turk that @Willy Lulciuc and @Michael Collado are experts on the K8s integration for OpenLineage and Marquez. Could you provide me some recommendations on how to approach this integration? Or can anyone else help me?

- -

Kind regards,

- -

Tom

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-07 12:47:18
-
-

[RESOLVED]👋 Hi there, I’m doing a POC of OpenLineage for our airflow deployment. We have a ton of custom operators and I’m trying to test out extracting lineage using the get_openlineage_facets_on_start method. Currently when I’m testing I can see that the OpenLineage plugin is running via airflow plugins but am not able to see that the method is ever getting called. Do I need to do anything else to tell the default extractor to use get_openlineage_facets_on_start? This is the documentation I’m referencing: https://openlineage.io/docs/integrations/airflow/extractors/default-extractors

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-07 12:50:14
-
-

*Thread Reply:* E.g. do I need to update my custom operators to inherit from DefaultExtractor?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-07 13:18:05
-
-

*Thread Reply:* FWIW, I can tell some level of connectivity to my Marquez deployment is working since I can see it created the default namespace I defined in my OPENLINEAGE_NAMESPACE env var.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-07 18:37:44
-
-

*Thread Reply:* hey John, it is enough to add the method to your custom operator. Perhaps something breaks inside the method. Did anything show up in the logs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-07 19:03:01
-
-

*Thread Reply:* That’s the strange part. I’m not seeing anything to suggest that the method is ever getting called. I’m also expecting that the listener created by the plugin should at least be calling this log line when the task runs. However, I’m not seeing that either. I’m able to verify the plugin is registered using airflow plugins and have debug level logging enabled via AIRFLOW__LOGGING__LOGGING_LEVEL='DEBUG'. This is the output of airflow plugins

- -

name | macros | listeners | source -==================+================================================+==============================+================================================= -OpenLineagePlugin | openlineage.airflow.macros.lineage_run_id,open | openlineage.airflow.listener | openlineage-airflow==0.22.0: - | lineage.airflow.macros.lineage_parent_id | | EntryPoint(name='OpenLineagePlugin', - | | | value='openlineage.airflow.plugin:OpenLineagePlu - | | | gin', group='airflow.plugins') -Appreciate any ideas you might have!

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-11 13:09:05
-
-

*Thread Reply:* Figured this out. Just needed to run the airflow scheduler and trigger tasks through the DAGs vs. airflow tasks test …

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-04-07 16:29:03
-
-

I have a question that I believe will be very easy to answer, and I think I know the answer already, but I want to confirm my understanding of extracting OpenLineage with airflow python scripts.

- -

Extractors extract lineage from operators, so they have to be using operators, right? If someone asks if I can get lineage from their Airflow-orchestrated python scripts, and they show me their scripts but they’re not importing anything starting with airflow.operators, then I can’t use extractors and therefore can’t get lineage. Is that accurate?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-04-07 16:30:00
-
-

*Thread Reply:* (they are importing dagkit sdk stuff like Job, JobContext, ExecutionContext, and NodeContext.)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-07 18:40:39
-
-

*Thread Reply:* Do they run those scripts in PythonOperator? If so, they should receive some events but with no datasets extracted

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-04-07 21:28:25
-
-

*Thread Reply:* How can I know that? Would it be in the scripts or the airflow configuration or...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-04-08 07:13:56
-
-

*Thread Reply:* And "with no datasets extracted" that means I wouldn't have the schema of the input and output datasets? (I need the db/schema/table/column names for my purposes)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-11 02:49:07
-
-

*Thread Reply:* That really depends what is the current code but in general any custom code in Airflow does not extract any extra information, especially datasets. One can write their own extractors (more in the docs)

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-04-12 16:52:04
-
-

*Thread Reply:* Thanks! This is very helpful. Exactly what I needed.

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tushar Jain - (tujain@ivp.in) -
-
2023-04-09 12:48:04
-
-

Hi. I was exploring OpenLineage and I want to know does OpenLineage integrate with MS-SQL (Microsoft SQL Server) ? If yes, how to generate OpenLineage events for MS-SQL Views/Tables/Queries?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-12 02:30:19
-
-

*Thread Reply:* Currently there's no extractor implemented for MS-SQL. We try to update list of supported databases here: https://openlineage.io/docs/integrations/about/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-10 12:00:03
-
-

@channel -Save the date: the next OpenLineage meetup will be in New York on April 26th! More info is coming soon…

- - - -
- ✅ Sheeri Cabral (Collibra), Ross Turk, Minkyu Park -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-10 19:00:38
-
-

@channel -Due to many TSC members being on vacation this week, this month’s TSC meeting will be moved to next Thursday, April 20th. All are welcome! https://openlineage.slack.com/archives/C01CK9T7HKR/p1680801164289949

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-11 13:42:03
-
-

Hi everyone!

- -

I'm so sorry for all the messages but I'm trying to get Snowflake, OpenLineage and Marquez working for days now. Hopefully, this is my last question. -The snowflake.connector import connect package seems to be outdated here in extract_openlineage.py and is not working for airflow. Does anyone know how to rewrite this code (e.g., with SnowflakeOperator ) and extract the openlineage access history? You'd be my absolute hero!!!

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-11 17:05:35
-
-

*Thread Reply:* > The snowflake.connector import connect package seems to be outdated here in extract_openlineage.py and is not working for airflow. -What's the error?

- -

> Does anyone know how to rewrite this code (e.g., with SnowflakeOperator ) -Current extractor for SnowflakeOperator extracts lineage for SQL executed in the task, in contrast to the method above with OPENLINEAGE_ACCESS_HISTORY view

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-11 18:13:49
-
-

*Thread Reply:* Hi Maciej!Thank you so much for the reply! I managed to generate a working combination on Windows between the airflow example in the marquez git and the snowflake openlineage git. The only error I still get is: -****** Log file does not exist: /opt/bitnami/airflow/logs/dag_id=etl_openlineage/run_id=manual__2023-04-10T14:12:53.764783+00:00/task_id=send_ol_events/attempt=1.log -****** Fetching from: <http://1c8bb4a78f14:8793/log/dag_id=etl_openlineage/run_id=manual__2023-04-10T14:12:53.764783+00:00/task_id=send_ol_events/attempt=1.log> -****** !!!! Please make sure that all your Airflow components (e.g. schedulers, webservers and workers) have the same 'secret_key' configured in 'webserver' section and time is synchronized on all your machines (for example with ntpd) !!!!! -************ See more at <https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#secret-key> -************ Failed to fetch log file from worker. Client error '403 FORBIDDEN' for url '<http://1c8bb4a78f14:8793/log/dag_id=etl_openlineage/run_id=manual__2023-04-10T14:12:53.764783+00:00/task_id=send_ol_events/attempt=1.log>' -For more information check: <https://httpstatuses.com/403> -This one doesn't make sense to me. I found a workaround for the ETL examples in the OpenLineage git by manually creating a Snowflake connector in Airflow, however, the error is still present for the extract_openlineage.py file. I noticed this file is the only one that uses snowflake.connector import connect and not airflow.providers.snowflake.operators.snowflake import SnowflakeOperator like the other ETL Dags.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-12 05:35:41
-
-

*Thread Reply:* I think it's Airflow error related to getting logs from worker

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-12 05:36:07
-
-

*Thread Reply:* snowflake.connector is a Snowflake connector library that SnowflakeOperator uses underneath to connect to Snowflake

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-12 10:15:21
-
-

*Thread Reply:* Ah alright! Thanks for pointing that out! 🙂 Do you know how to solve it? Or do you have any recommendations on how to look for the solution?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-12 10:19:53
-
-

*Thread Reply:* I have no experience with Windows, and I think it's the issue: https://github.com/apache/airflow/issues/10388

- -

I would try running it in Docker TBH

-
- - - - - - - -
-
Labels
- kind:feature -
- -
-
Comments
- 22 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-12 11:47:41
-
-

*Thread Reply:* Yeah I was running Airflow in Docker but this didn't work. I'll try to use my Macbook for now because I don't think there is a solution for this in the short time. Thank you so much for the support though!!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Peter Hanssens - (peter@cloudshuttle.com.au) -
-
2023-04-13 04:55:41
-
-

Hi All, -My team and I have been building a status page based on open lineage and I did a talk about it… keen for feedback and thoughts: -https://youtu.be/nGh5_j3hXrE

-
-
YouTube
- -
- - - } - - DataEngAU - (https://www.youtube.com/@DataEngAU) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-13 11:19:57
-
-

*Thread Reply:* Very interesting!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-04-13 13:28:53
-
-

*Thread Reply:* that’s awesome 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2023-04-13 08:22:50
-
-

Hi Peter. Looks good. I like the way you introduced the premise of, and benefits of, using OpenLineage for your project. Have you also explored other integrations in addition to dbt?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Peter Hanssens - (peter@cloudshuttle.com.au) -
-
2023-04-13 08:36:01
-
-

*Thread Reply:* Thanks Ernie, I’m looking at Airflow as well as GE and would like to contribute back to the project as well… we’re close to getting a public preview release of our product done and then we want to help build out open lineage

- - - -
- ❤️ Julien Le Dem, Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-13 14:08:38
-
-

[Resolved] Has anyone seen this error before where the openlineage-airflow plugin / listener fails to deepcopy the task instance? I’m using the native airflow DAG / BashOperator objects to do a basic test of static lineage tagging. More details in 🧵

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-13 14:10:08
-
-

*Thread Reply:* The dag is basically just: -```dag = DAG( - dagid="asanaexampledag", - defaultargs=defaultargs, - scheduleinterval=None, -)

- -

samplelineagetask = BashOperator( - taskid="samplelineagetask", - bashcommand='echo $OPENLINEAGEURL', - dag=dag, - inlets=[Table(database="redshift", cluster="someschema", name="someinputtable")], - outlets=[Table(database="redshift", cluster="someotherschema", name="someoutputtable")] -)```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-13 14:11:02
-
-

*Thread Reply:* This is the error I’m getting, seems to be coming from this line: -[2023-04-13, 17:45:33 UTC] {logging_mixin.py:115} WARNING - Exception in thread Thread-1: -Traceback (most recent call last): - File "/opt/conda/lib/python3.7/threading.py", line 926, in _bootstrap_inner - self.run() - File "/opt/conda/lib/python3.7/threading.py", line 870, in run - self._target(**self._args, ****self._kwargs) - File "/opt/conda/lib/python3.7/site-packages/openlineage/airflow/listener.py", line 89, in on_running - task_instance_copy = copy.deepcopy(task_instance) - File "/opt/conda/lib/python3.7/copy.py", line 180, in deepcopy - y = _reconstruct(x, memo, **rv) - File "/opt/conda/lib/python3.7/copy.py", line 281, in _reconstruct - state = deepcopy(state, memo) - File "/opt/conda/lib/python3.7/copy.py", line 150, in deepcopy - y = copier(x, memo) - File "/opt/conda/lib/python3.7/copy.py", line 241, in _deepcopy_dict - y[deepcopy(key, memo)] = deepcopy(value, memo) - File "/opt/conda/lib/python3.7/copy.py", line 161, in deepcopy - y = copier(memo) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1156, in __deepcopy__ - setattr(result, k, copy.deepcopy(v, memo)) - File "/opt/conda/lib/python3.7/copy.py", line 150, in deepcopy - y = copier(x, memo) - File "/opt/conda/lib/python3.7/copy.py", line 241, in _deepcopy_dict - y[deepcopy(key, memo)] = deepcopy(value, memo) - File "/opt/conda/lib/python3.7/copy.py", line 161, in deepcopy - y = copier(memo) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/dag.py", line 1941, in __deepcopy__ - setattr(result, k, copy.deepcopy(v, memo)) - File "/opt/conda/lib/python3.7/copy.py", line 150, in deepcopy - y = copier(x, memo) - File "/opt/conda/lib/python3.7/copy.py", line 241, in _deepcopy_dict - y[deepcopy(key, memo)] = deepcopy(value, memo) - File "/opt/conda/lib/python3.7/copy.py", line 161, in deepcopy - y = copier(memo) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1156, in __deepcopy__ - setattr(result, k, copy.deepcopy(v, memo)) - File "/opt/conda/lib/python3.7/copy.py", line 180, in deepcopy - y = _reconstruct(x, memo, **rv) - File "/opt/conda/lib/python3.7/copy.py", line 281, in _reconstruct - state = deepcopy(state, memo) - File "/opt/conda/lib/python3.7/copy.py", line 150, in deepcopy - y = copier(x, memo) - File "/opt/conda/lib/python3.7/copy.py", line 241, in _deepcopy_dict - y[deepcopy(key, memo)] = deepcopy(value, memo) - File "/opt/conda/lib/python3.7/copy.py", line 150, in deepcopy - y = copier(x, memo) - File "/opt/conda/lib/python3.7/copy.py", line 241, in _deepcopy_dict - y[deepcopy(key, memo)] = deepcopy(value, memo) - File "/opt/conda/lib/python3.7/copy.py", line 161, in deepcopy - y = copier(memo) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1156, in __deepcopy__ - setattr(result, k, copy.deepcopy(v, memo)) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1000, in __setattr__ - self.set_xcomargs_dependencies() - File "/opt/conda/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1107, in set_xcomargs_dependencies - XComArg.apply_upstream_relationship(self, arg) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/xcom_arg.py", line 186, in apply_upstream_relationship - op.set_upstream(ref.operator) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/taskmixin.py", line 241, in set_upstream - self._set_relatives(task_or_task_list, upstream=True, edge_modifier=edge_modifier) - File "/opt/conda/lib/python3.7/site-packages/airflow/models/taskmixin.py", line 185, in _set_relatives - dags: Set["DAG"] = {task.dag for task in [**self.roots, **task_list] if task.has_dag() and task.dag} - File "/opt/conda/lib/python3.7/site-packages/airflow/models/taskmixin.py", line 185, in &lt;setcomp&gt; - dags: Set["DAG"] = {task.dag for task in [**self.roots, **task_list] if task.has_dag() and task.dag} - File "/opt/conda/lib/python3.7/site-packages/airflow/models/dag.py", line 508, in __hash__ - val = tuple(self.task_dict.keys()) -AttributeError: 'DAG' object has no attribute 'task_dict'

-
- - - - - - - - - - - - - - - - -
- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-13 14:12:11
-
-

*Thread Reply:* This is with Airflow 2.3.2 and openlineage-airflow 0.22.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-13 14:13:34
-
-

*Thread Reply:* Seems like it might be some issue like this with a circular structure? https://stackoverflow.com/questions/46283738/attributeerror-when-using-python-deepcopy

-
-
Stack Overflow
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-14 08:44:36
-
-

*Thread Reply:* Just by quick look at it, it will definitely be fixed with Airflow 2.6, as it won't need to deepcopy anything.

- - - -
- 👍 John Lukenoff -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-14 08:47:16
-
-

*Thread Reply:* I can't seem to reproduce the issue. I ran following example DAG with same Airflow and OL versions as yours: -```import datetime

- -

from airflow.lineage.entities import Table -from airflow.models import DAG -from airflow.operators.bash import BashOperator

- -

defaultargs = { - "startdate": datetime.datetime.now() -}

- -

dag = DAG( - dagid="asanaexampledag", - defaultargs=defaultargs, - scheduleinterval=None, -)

- -

samplelineagetask = BashOperator( - taskid="samplelineagetask", - bashcommand='echo $OPENLINEAGEURL', - dag=dag, - inlets=[Table(database="redshift", cluster="someschema", name="someinputtable")], - outlets=[Table(database="redshift", cluster="someotherschema", name="someoutputtable")] -)```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-14 08:53:48
-
-

*Thread Reply:* is there any extra configuration you made possibly?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-14 13:02:40
-
-

*Thread Reply:* @John Lukenoff, I was finally able to reproduce this when passing xcom as task.output -looks like this was reported here and solved by this PR (not sure if this was released in 2.3.3 or later)

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-14 13:06:59
-
-

*Thread Reply:* Ah interesting. Let me see if bumping my Airflow version resolves this. Haven’t had a chance to tinker with it much since yesterday.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-14 13:13:21
-
-

*Thread Reply:* I ran it against 2.4 and same dag works

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-14 13:15:35
-
-

*Thread Reply:* 👍 Looks like a fix for that issue was rolled out in 2.3.3. I’m gonna try that for now (my company has a notoriously difficult time with airflow major version updates 😅)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-04-14 13:17:06
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-04-17 12:29:09
-
-

*Thread Reply:* Got this working! We just monkey patched the __deepcopy__ method of the BaseOperator for now until we can get bandwidth for an airflow upgrade. Thanks for the help here!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-17 03:45:47
-
-

Hi everyone, I am facing this null pointer error: -ERROR AsyncEventQueue: Listener OpenLineageSparkListener threw an exception -java.lang.NullPointerException -java.base/java.util.concurrent.ConcurrentHashMap.putVal(Unknown Source) -java.base/java.util.concurrent.ConcurrentHashMap.put(Unknown Source) -io.openlineage.spark.agent.JobMetricsHolder.addMetrics(JobMetricsHolder.java:40) -io.openlineage.spark.agent.OpenLineageSparkListener.onTaskEnd(OpenLineageSparkListener.java:179) -org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45) -org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) -org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) -org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) -org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) -org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) -org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) -org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) -scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) -scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) -<a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) -org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) -org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1381) -org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) -Could I get some help on this pls 🙇

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-17 03:56:30
-
-

*Thread Reply:* This is the spark submit command: -spark-submit --py-files /usr/local/lib/common_utils.zip,/usr/local/lib/team_utils.zip,/usr/local/lib/project_utils.zip - --conf spark.executor.cores=16 - --conf spark.hadoop.fs.s3a.connection.maximum=100 --conf spark.sql.shuffle.partitions=1000 - --conf spark.speculation=true --conf spark.sql.adaptive.advisoryPartitionSizeInBytes=256MB - --conf spark.hadoop.fs.s3a.multiobjectdelete.enable=false --conf spark.memory.fraction=0.7 --conf spark.kubernetes.executor.label.experiment=some_label --conf spark.kubernetes.executor.label.team=team_name --conf spark.driver.memory=26112m --conf <a href="http://spark.kubernetes.executor.label.app.kubernetes.io/managed-by=pipeline_name">spark.kubernetes.executor.label.app.kubernetes.io/managed-by=pipeline_name</a> --conf spark.kubernetes.executor.label.instance-type=4xlarge --conf spark.executor.instances=10 --conf spark.kubernetes.executor.label.env=prd --conf spark.kubernetes.executor.label.job-name=job_name --conf spark.kubernetes.executor.label.owner=owner --conf spark.kubernetes.executor.label.pipeline=pipeline --conf spark.kubernetes.executor.label.platform-name=platform_name --conf spark.speculation.multiplier=10 --conf spark.memory.storageFraction=0.4 --conf spark.driver.maxResultSize=26112m --conf spark.kubernetes.executor.request.cores=15000m --conf spark.speculation.interval=1s --conf spark.executor.memory=104g --conf spark.sql.catalogImplementation=hive --conf spark.eventLog.dir=file:///logs/spark-events --conf spark.hadoop.fs.s3a.threads.max=100 --conf spark.speculation.quantile=0.75 job.py

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-17 04:09:57
-
-

*Thread Reply:* @Anirudh Shrinivason pls create an issue for this and I will look at it. Although it may be difficult to find the root cause, null pointer exception should be always avoided and this seems to be a bug.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-17 04:14:41
-
-

*Thread Reply:* Hmm yeah sure. I'll create an issue on github for this issue. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-17 05:13:54
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/1784 -Opened an issue here

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-04-17 19:32:23
-
-

Hey! Question about spark column lineage. What is the intended way to write custom code for getting column lineage? i am trying to implement CustomColumnLineageVisitor but when I try to do so I get: -io.openlineage.spark3.agent.lifecycle.plan.column.CustomColumnLineageVisitor is not public in io.openlineage.spark3.agent.lifecycle.plan.column; cannot be accessed from outside package

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-18 02:25:04
-
-

*Thread Reply:* Hi @Allison Suarez, CustomColumnLineageVisitor should be definitely public. I'll prepare a fix PR for that. We do have a test for custom column lineage visitors (CustomColumnLineageVisitorTestImpl), but they're in the same package. Thanks for bringing this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-18 03:07:11
-
-

*Thread Reply:* This PR should resolve problem: -https://github.com/OpenLineage/OpenLineage/pull/1788

-
- - - - - - - -
-
Labels
- integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-04-18 13:34:43
-
-

*Thread Reply:* Thank you so much @Paweł Leszczyński 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-04-18 13:35:46
-
-

*Thread Reply:* How does the release process work for OL? Do we have to wait a certain amount of time to get this change in a new release?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-04-18 17:34:29
-
-

*Thread Reply:* @Maciej Obuchowski ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-19 01:49:33
-
-

*Thread Reply:* 0.22.0 was released two weeks ago, so the next schedule should be in next two weeks. We can ask @Michael Robinson his opinion on releasing 0.22.1 before that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-19 09:08:58
-
-

*Thread Reply:* Hi Allison 👋, -Anyone can request a release in the #general channel. I encourage you to go this route. You’ll need three +1s (there’s more info about the process here: https://github.com/OpenLineage/OpenLineage/blob/main/GOVERNANCE.md), but I don’t know of any reasons why we can’t do a mid-cycle release. 🙂

- - - -
- 🙏 Allison Suarez -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-04-19 16:23:20
-
-

*Thread Reply:* seems like we got enough +1s

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-19 16:24:33
-
-

*Thread Reply:* We need three committers to give a +1. I’ll reach out again to see if I can recruit a third

- - - -
- 🙌 Allison Suarez -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-04-19 16:24:55
-
-

*Thread Reply:* oooh

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-19 16:32:47
-
-

*Thread Reply:* Yeah, sorry I forgot to mention that!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-20 05:02:46
-
-

*Thread Reply:* we have it now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-19 09:52:02
-
-

@channel -This month’s TSC meeting is tomorrow, 4/20, at 10 am PT: https://openlineage.slack.com/archives/C01CK9T7HKR/p1681167638153879

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Allison Suarez - (asuarezmiranda@lyft.com) -
-
2023-04-19 13:40:31
-
-

I would like to get a 0.22.1 patch release to get the issue described in this thread before the next scheduled release.

-
- - -
- - - } - - Allison Suarez - (https://openlineage.slack.com/team/U04BNREL8PM) -
- - - - - - - - - - - - - - - - - -
- - - -
- ➕ Michael Robinson, Paweł Leszczyński, Rohit Menon, Maciej Obuchowski, Julien Le Dem, Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-20 09:46:06
-
-

*Thread Reply:* The release is authorized and will be initiated within 2 business days (not including tomorrow).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-19 15:19:38
-
-

Here are the details about next week’s OpenLineage Meetup at Astronomer’s NY offices: https://openlineage.io/blog/nyc-meetup. Hope to see you there if you can make it!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Ernie Ostic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 07:38:55
-
-

Hi Team, I tried integrating openLineage with spark databricks and followed the steps as per the documentation. Installation and all looks good as the listener is enabled, but no event is getting passed to Marquez. I can see below message in log4j logs. Am I missing any configuration to be set?

- -

Running few spark commands in databricks notebook to create events.

- -

23/04/20 11:10:34 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart -23/04/20 11:10:34 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 08:57:45
-
-

*Thread Reply:* Hi Sai,

- -

Perhaps you could try within printing OpenLineage events into logs. This can be achieved with Spark config parameter: -spark.openlineage.transport.type -equal to console .

- -

This can help you determine if a problem is generating Openlineage events itself or emitting them into Marquez.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 09:18:53
-
-

*Thread Reply:* Hi @Paweł Leszczyński I passed this config as below, but could not see any changes in the logs. The events are getting generated sometimes like below:

- -

23/04/20 10:00:15 INFO ConsoleTransport: {"eventType":"START","eventTime":"2023-04-20T10:00:15.085Z","run":{"runId":"ef4f46d1-d13a-420a-87c3-19fbf6ffa231","facets":{"spark.logicalPlan":{"producer":"https://github.com/OpenLineage/OpenLineage/tree/0.22.0/integration/spark","schemaURL":"https://openlineage.io/spec/1-0-5/OpenLineage.json#/$defs/RunFacet","plan":[{"class":"org.apache.spark.sql.catalyst.plans.logical.CreateTableAsSelect","num-children":2,"name":0,"partitioning":[],"query":1,"tableSpec":null,"writeOptions":null,"ignoreIfExists":false},{"class":"org.apache.spark.sql.catalyst.analysis.ResolvedTableName","num-children":0,"catalog":null,"ident":null},{"class":"org.apache.spark.sql.catalyst.plans.logical.Project","num-children":1,"projectList":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num_children":0,"name":"workorderid","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-cl

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 09:19:37
-
-

*Thread Reply:* Ok, great. This means the issue is related to Spark <-> Marquez connection

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 09:20:33
-
-

*Thread Reply:* Some time ago Spark config has changed and here is the up-to-date-documentation: https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 09:21:10
-
-

*Thread Reply:* please note that spark.openlineage.transport.url has to be used which is different from what you have on screenshot attached

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 09:22:40
-
-

*Thread Reply:* You mean instead of "spark.openlineage.host" I need to use "spark.openlineage.transport.url"?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 09:23:04
-
-

*Thread Reply:* yes, please give it a try

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 09:23:40
-
-

*Thread Reply:* sure will give a try and let you know the outcome

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 09:23:48
-
-

*Thread Reply:* and set spark.openlineage.transport.type to http

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 09:24:04
-
-

*Thread Reply:* okay

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 09:26:42
-
-

*Thread Reply:* does these configs suffice or I need to add anything else

- -

spark.extraListeners io.openlineage.spark.agent.OpenLineageSparkListener -spark.openlineage.consoleTransport true -spark.openlineage.version v1 -spark.openlineage.transport.type http -spark.openlineage.transport.url http://<host>:5000/api/v1/namespaces/sparkintegrationpoc/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 09:27:07
-
-

*Thread Reply:* spark.openlineage.consoleTransport true this one can be removed

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-20 09:27:33
-
-

*Thread Reply:* otherwise shall be OK

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 10:01:30
-
-

*Thread Reply:* I added these configs and run, but still same issue. Now I am not able to see the events in log file as well.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-04-20 10:04:27
-
-

*Thread Reply:* 23/04/20 13:51:22 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart -23/04/20 13:51:22 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd

- -

Does this need any changes in the config side?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-04-20 13:02:23
-
-

If you are trying to get into the OpenLineage Technical Steering Committee meeting, you have to RSVP to the specific event at https://www.addevent.com/calendar/pP575215 to get the password (in the invitation to add to your calendar)

-
-
addevent.com
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-20 13:53:31
-
-

Here is a nice article I found online that briefly explains about the spark catalogs just for some context: https://www.waitingforcode.com/apache-spark-sql/pluggable-catalog-api/read -In reference to the V2SessionCatalog use case brought up in the meeting just now

-
-
waitingforcode.com
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Michael Robinson, Maciej Obuchowski, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-24 06:49:43
-
-

*Thread Reply:* @Anirudh Shrinivason Thanks for linking this as it contains a clear explanation on Spark catalogs. However, I am still unable to write a failing integration test that reproduces the scenario. Could you provide an example of Spark which is failing on V2SessionCatalog and provide more details how are you trying to read/write data?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-24 07:14:04
-
-

*Thread Reply:* Hi @Paweł Leszczyński I noticed this issue on one of our pipelines before actually. I didn't note down which pipeline the issue was occuring in unfortunately. I'll keep checking from my end to identify the spark job that ran into this error. In the meantime, I'll also try to see for which cases deltaCatalog makes use of the V2SessionCatalog to understand this better. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-26 03:44:15
-
-

*Thread Reply:* Hi @Paweł Leszczyński -''' - CREATE TABLE IF NOT EXISTS TABLE_NAME ( - SOME COLUMNS - ) USING delta - PARTITIONED BY (col) - location 's3 location' - ''' -A spark sql like this actually triggers the V2SessionCatalog

- - - -
- ❤️ Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-26 03:44:48
-
-

*Thread Reply:* Thanks @Anirudh Shrinivason, will look into that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-26 05:06:05
-
-

*Thread Reply:* which spark & delta versions are you using?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-27 02:35:50
-
-

*Thread Reply:* I am not 100% sure if this is something you described, but this was an error I was able to replicate and fix. Please look at the exception stacktrace and let me know if it is same on your side. -https://github.com/OpenLineage/OpenLineage/pull/1798

-
- - - - - - - -
-
Labels
- documentation, integration/spark -
- - - - - - - - - - -
- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-27 02:36:20
-
-

*Thread Reply:* Hi

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-27 02:36:45
-
-

*Thread Reply:* Hmm actually I am noticing this error on my local

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-27 02:37:01
-
-

*Thread Reply:* But on the prod job, I am seeing no such error in the logs...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-27 02:37:28
-
-

*Thread Reply:* Also, I was using spark 3.1.2

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-27 02:37:39
-
-

*Thread Reply:* then perhaps it's sth different :face_palm: will try to replicate on spark 3.1.2

- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-27 02:37:42
-
-

*Thread Reply:* Not too sure which delta version the prod job was using...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-28 03:30:49
-
-

*Thread Reply:* I was running on Spark 3.1.2 the following command: -spark.sql( - "CREATE TABLE t_partitioned (a int, b int) USING delta " - + "PARTITIONED BY (a) LOCATION '/tmp/delta/tbl'" - ); -and I got Openlineage event emitted with t_partitioned output dataset.

- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-28 03:31:47
-
-

*Thread Reply:* Oh... hmm... that is strange. Let me check more from my end too

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-28 03:33:01
-
-

*Thread Reply:* for spark 3.1, we're using delta 1.0.0

- - - -
- 👀 Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Cory Visi - (cvisi@amazon.com) -
-
2023-04-20 14:41:23
-
-

Hi team! I have two Spark jobs chained together to process incoming data files, and I'm using openlineage-spark-0.22.0 with Marquez to visualize. -I'm struggling to figure out the best way to use spark.openlineage.parentRunId and spark.openlineage.parentJobName. Should these values be unique for each Spark job? Should they be unique for each execution of the chain of both spark jobs? Or should they be the same for all runs? -I'm setting them to be unique to the execution of the chain and I'm getting strange results (jobs are not showing completed, and not showing at all)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-24 05:38:09
-
-

*Thread Reply:* Hi Cory, I think the definition of ParentRunFacet (https://openlineage.io/docs/spec/facets/run-facets/parent_run) contains answer to that: -Commonly, scheduler systems like Apache Airflow will trigger processes on remote systems, such as on Apache Spark or Apache Beam jobs. Those systems might have their own OpenLineage integration and report their own job runs and dataset inputs/outputs. The ParentRunFacet allows those downstream jobs to report which jobs spawned them to preserve job hierarchy. To do that, the scheduler system should have a way to pass its own job and run id to the child job. -For example, when airflow is used to run Spark job, we want Spark events to contain some information on what triggered the spark job and parameters, you ask about, are used to pass that information from airflow operator to spark job.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Cory Visi - (cvisi@amazon.com) -
-
2023-04-26 17:28:39
-
-

*Thread Reply:* Thank you for pointing me at this documentation; I did not see it previously. In my setup, the calling system is AWS Step Functions, which have no integration with OpenLineage.

- -

So I've been essentially passing non-existing parent job information to OpenLineage. It has been useful as a data point for searches and reporting though.

- -

Is there any harm in doing what I am doing? Is it causing the jobs that I see never completing?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-27 04:59:39
-
-

*Thread Reply:* I think parentRunId should be the same for Openlineage START and COMPLETE event. Is it like this in your case?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Cory Visi - (cvisi@amazon.com) -
-
2023-05-03 11:13:58
-
-

*Thread Reply:* that makes sense, and based on my configuration, i would think that it would be. however, given that i am seeing incomplete jobs in Marquez, i'm wondering if somehow the parentrunID is changing. I need to investigate

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-20 15:44:39
-
-

@channel -We released OpenLineage 0.23.0, including: -Additions: -• SQL: parser improvements to support: copy into, create stage, pivot #1742 @pawel-big-lebowski -• dbt: add support for snapshots #1787 @JDarDagran -Changes: -• Spark: change custom column lineage visitors #1788 @pawel-big-lebowski -Plus bug fixes, doc changes and more. -Thanks to all the contributors! -For the bug fixes and details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.23.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.22.0...0.23.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🎉 Harel Shein, Maciej Obuchowski, Anirudh Shrinivason, Kengo Seki, Paweł Leszczyński, Perttu Salonen -
- -
- 👍 Cory Visi, Maciej Obuchowski, Anirudh Shrinivason, Kengo Seki -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-21 05:07:30
-
-

Just curious, how long before we can see 0.23.0 over here: https://mvnrepository.com/artifact/io.openlineage/openlineage-spark

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-21 09:06:06
-
-

*Thread Reply:* I think @Michael Robinson has to manually promote artifacts

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-21 09:08:06
-
-

*Thread Reply:* I promoted the artifacts, but there is a delay before they appear in Maven. A couple releases ago, the delay was about 24 hours long

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-21 09:26:09
-
-

*Thread Reply:* Ahh I see... Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-21 10:10:38
-
-

*Thread Reply:* @Anirudh Shrinivason are you using search.maven.org by chance? Version 0.23.0 is not appearing there yet, but I do see it on central.sonatype.com.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-21 10:15:00
-
-

*Thread Reply:* Hmm I can see it now on search.maven.org actually. But I still cannot see it on https://mvnrepository.com/artifact/io.openlineage/openlineage-spark ...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-21 10:19:38
-
-

*Thread Reply:* Understood. I believe you can download the 0.23.0 jars from central.sonatype.com. For Spark, try going here: https://central.sonatype.com/artifact/io.openlineage/openlineage-spark/0.23.0/versions

-
-
Maven Central
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-22 06:11:10
-
-

*Thread Reply:* Yup. I can see it on all maven repos now haha. I think its just the delay.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-22 06:11:18
-
-

*Thread Reply:* ~24 hours ig

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-24 16:49:15
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Doe - (adarsh.pansari@tigeranalytics.com) -
-
2023-04-21 08:49:54
-
-

Hello Everyone, I am facing an issue while trying to integrate openlineage with Jupyter notebook. I am following the Docs. My containers are running and I am getting the URL for Jupyter notebook but when I try with the token in the terminal, I get invalid credentials error. Can someone please help resolve this ? Am I doing something wrong..

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Doe - (adarsh.pansari@tigeranalytics.com) -
-
2023-04-21 09:28:18
-
-

*Thread Reply:* Good news, everyone! The login worked on the second attempt after starting the Docker containers. Although it's unclear why it failed the first time.

- - - -
- 👍 Maciej Obuchowski, Anirudh Shrinivason, Michael Robinson, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Natalie Zeller - (natalie.zeller@naturalint.com) -
-
2023-04-23 23:52:34
-
-

Hi team, -I have a question regarding the customization of transport types in OpenLineage. -At my company, we are using OpenLineage to report lineage from our Spark jobs to OpenMetadata. We have created a custom OpenMetadataTransport to send lineage to the OpenMetadata APIs, conforming to the OpenMetadata format. -Currently, we are using a fork of OpenLineage, as we needed to make some changes in the core to identify the new TransportConfig. -We believe it would be more optimal for OpenLineage to support custom transport types, which would allow us to use OpenLineage JAR alongside our own JAR containing the custom transport. -I noticed some comments in the code suggesting that customizations are possible. However, I couldn't make it work without modifying the TransportFactory and the TransportConfig interface, as the transport types are hardcoded. Am I missing something? 🤔 -If custom transport types are not currently supported, we would be more than happy to contribute a PR that enables custom transports. -What are your thoughts on this?

- - - -
- ❤️ Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-24 02:32:51
-
-

*Thread Reply:* Hi Natalie, it's wonderful to hear you're planning to contribute. Yes, you're right about TransportFactory . What other transport type was in your mind? If it is something generic, then it is surely OK to include it within TransportFactory. If it is a custom feature, we could follow ServiceLoader pattern that we're using to allow including custom plan visitors and dataset builders.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Natalie Zeller - (natalie.zeller@naturalint.com) -
-
2023-04-24 02:54:40
-
-

*Thread Reply:* Hi @Paweł Leszczyński -Yes, I was planning to change TransportFactory to support custom/generic transport types using ServiceLoader pattern. After this change is done, I will be able to use our custom OpenMetadataTransport without changing anything in OpenLineage core. For now I don't have other types in mind, but after we'll add the customization support anyone will be able to create their own transport type and report the lineage to different backends

- - - -
- 👍 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-24 03:28:30
-
-

*Thread Reply:* Perhaps it's not strictly related to this particular usecase, but you may also find interesting our recent PoC about Fluentd & Openlineage integration. This will bring some cool backend features like: copy event and send it to multiple backends, send it to backends supported by fluentd output plugins etc. https://github.com/OpenLineage/OpenLineage/pull/1757/files?short_path=4fc5534#diff-4fc55343748f353fa1def0e00c553caa735f9adcb0da18baad50a989c0f2e935

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Natalie Zeller - (natalie.zeller@naturalint.com) -
-
2023-04-24 05:36:24
-
-

*Thread Reply:* Sounds interesting. Thanks, I will look into it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-24 16:37:33
-
-

Are you planning to come to the first New York OpenLineage Meetup this Wednesday at Astronomer’s offices in the Flatiron District? Don’t forget to RSVP so we know much food and drink to order!

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhar Balaji - (sudharshan.dataaces@gmail.com) -
-
2023-04-25 03:20:57
-
-

Hi, I'm new to Open data lineage and I'm trying to connect snowflake database with marquez using airflow and getting the error in etl_openlineage while running the airflow dag on local ubuntu environment and unable to see the marquez UI once it etl_openlineage has ran completed as success.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-25 08:07:36
-
-

*Thread Reply:* What's the extract_openlineage.py file? Looks like your code?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhar Balaji - (sudharshan.dataaces@gmail.com) -
-
2023-04-25 08:43:04
-
-

*Thread Reply:* import json -import os -from pendulum import datetime

- -

from airflow import DAG -from airflow.decorators import task -from openlineage.client import OpenLineageClient -from snowflake.connector import connect

- -

SNOWFLAKEUSER = os.getenv('SNOWFLAKEUSER') -SNOWFLAKEPASSWORD = os.getenv('SNOWFLAKEPASSWORD') -SNOWFLAKEACCOUNT = os.getenv('SNOWFLAKEACCOUNT')

- -

@task -def sendolevents(): - client = OpenLineageClient.from_environment()

- -
with connect(
-    user=SNOWFLAKE_USER,
-    password=SNOWFLAKE_PASSWORD,
-    account=SNOWFLAKE_ACCOUNT,
-    database='OPENLINEAGE',
-    schema='PUBLIC',
-) as conn:
-    with conn.cursor() as cursor:
-        ol_view = 'OPENLINEAGE_ACCESS_HISTORY'
-        ol_event_time_tag = 'OL_LATEST_EVENT_TIME'
-
-        var_query = f'''
-            use warehouse {SNOWFLAKE_WAREHOUSE};
-        '''
-
-        cursor.execute(var_query)
-
-        var_query = f'''
-            set current_organization='{SNOWFLAKE_ACCOUNT}';
-        '''
-
-        cursor.execute(var_query)
-
-        ol_query = f'''
-            SELECT ** FROM {ol_view}
-            WHERE EVENT:eventTime &gt; system$get_tag('{ol_event_time_tag}', '{ol_view}', 'table')
-            ORDER BY EVENT:eventTime ASC;
-        '''
-
-        cursor.execute(ol_query)
-        ol_events = [json.loads(ol_event[0]) for ol_event in cursor.fetchall()]
-
-        for ol_event in ol_events:
-            client.emit(ol_event)
-
-        if len(ol_events) &gt; 0:
-            latest_event_time = ol_events[-1]['eventTime']
-            cursor.execute(f'''
-                ALTER VIEW {ol_view} SET TAG {ol_event_time_tag} = '{latest_event_time}';
-            ''')
-
- -

with DAG( - 'etlopenlineage', - startdate=datetime(2022, 4, 12), - scheduleinterval='@hourly', - catchup=False, - defaultargs={ - 'owner': 'openlineage', - 'dependsonpast': False, - 'emailonfailure': False, - 'emailonretry': False, - 'email': ['demo@openlineage.io'], - 'snowflakeconnid': 'openlineagesnowflake' - }, - description='Send OL events every minutes.', - tags=["extract"], -) as dag: - sendol_events()

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-25 09:52:33
-
-

*Thread Reply:* OpenLineageClient expects RunEvent classes and you're sending it raw json. I think at this point your options are either sending them by constructing your own HTTP client, using something like requests, or using something like https://github.com/python-attrs/cattrs to structure json to RunEvent

-
- - - - - - - -
-
Website
- <https://catt.rs> -
- -
-
Stars
- 625 -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-25 10:05:57
-
-

*Thread Reply:* @Jakub Dardziński suggested that you can -change client.emit(ol_event) to client.transport.emit(ol_event) and it should work

- - - -
- 👍 Ross Turk, Sudhar Balaji -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@rossturk.com) -
-
2023-04-25 12:24:08
-
-

*Thread Reply:* @Maciej Obuchowski I believe this is from https://github.com/Snowflake-Labs/OpenLineage-AccessHistory-Setup/blob/main/examples/airflow/dags/lineage/extract_openlineage.py

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@rossturk.com) -
-
2023-04-25 12:25:26
-
-

*Thread Reply:* I believe this example no longer works - perhaps a new access history pull/push example could be created that is simpler and doesn’t use airflow.

- - - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-26 08:34:02
-
-

*Thread Reply:* I think separating the actual getting data from the view and Airflow DAG would make sense

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@rossturk.com) -
-
2023-04-26 13:57:34
-
-

*Thread Reply:* Yeah - I also think that Airflow confuses the issue. You don’t need Airflow to get lineage from Snowflake Access History, the only reason Airflow is in the example is a) to simulate a pipeline that can be viewed in Marquez; b) to establish a mechanism that regularly pulls and emits lineage…

- -

but most people will already have A, and the simplest example doesn’t need to accomplish B.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@rossturk.com) -
-
2023-04-26 13:58:59
-
-

*Thread Reply:* just a few weeks ago 🙂 I was working on a script that you could run like SNOWFLAKE_USER=foo ./process_snowflake_lineage.py --from-date=xxxx-xx-xx --to-date=xxxx-xx-xx

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-27 11:13:58
-
-

*Thread Reply:* Hi @Ross Turk! Do you have a link to this script? Perhaps this script can fix the connection issue 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ross Turk - (ross@rossturk.com) -
-
2023-04-27 11:47:20
-
-

*Thread Reply:* No, it never became functional before I stopped to take on another task 😕

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhar Balaji - (sudharshan.dataaces@gmail.com) -
-
2023-04-25 07:47:57
-
-

Hi, -Currently, In the .env file, we have using the OPENLINEAGE_URL as <http://marquez-api:5000> and got the error -requests.exceptions.HTTPError: 422 Client Error: for url: <http://marquez-api:5000/api/v1/lineage> -we have tried using OPENLINEAGE_URL as <http://localhost:5000> and getting the error as -requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /api/v1/lineage (Caused by NewConnectionError('&lt;urllib3.connection.HTTPConnection object at 0x7fc71edb9590&gt;: Failed to establish a new connection: [Errno 111] Connection refused')) -I'm not sure which variable value to use for OPENLINEAGE_URL, so please offer the correct variable value.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-25 09:54:07
-
-

*Thread Reply:* Looks like the first URL is proper, but there's something wrong with entity - Marquez logs would help here.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhar Balaji - (sudharshan.dataaces@gmail.com) -
-
2023-04-25 09:57:36
-
-

*Thread Reply:* This is my log in airflow, can you please prvide more info over it.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-25 10:13:37
-
-

*Thread Reply:* Airflow log does not tell us why Marquez rejected the event. Marquez logs would be more helpful

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sudhar Balaji - (sudharshan.dataaces@gmail.com) -
-
2023-04-26 05:48:08
-
-

*Thread Reply:* We investigated the marquez container logs and were unable to locate the error. Could you please specify the log file that belongs to marquez while connecting the airflow or snowflake?

- -

Is it correct that the marquez-web log points to <http://api:5000/>? -[HPM] Proxy created: /api/v1 -&gt; <http://api:5000/> -App listening on port 3000!

- -
- - - - - - - -
-
- - - - - - - -
- - -
- 👀 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-26 11:26:36
-
-

*Thread Reply:* I've the same error at the moment but can provide some additional screenshots. The Event data in Snowflake seems fine and the data is being retrieved correctly by the Airflow DAG. However, there seems to be a warning in the Marquez API logs. Hopefully we can troubleshoot this together!

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-26 11:33:35
-
-

*Thread Reply:*

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-26 13:06:30
-
-

*Thread Reply:* Possibly the Python part between does some weird things, like double-jsonning the data? I can imagine it being wrapped in second, unnecessary JSON object

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-26 13:08:18
-
-

*Thread Reply:* I guess only way to check is print one of those events - in the form they are send in Python part, not Snowflake - and see how they are like. For example using ConsoleTransport or setting DEBUG log level in Airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-26 14:37:32
-
-

*Thread Reply:* Here is a code snippet by using logging in DEBUG on the snowflake python connector:

- -

[20230426T17:16:55.166+0000] {cursor.py:593} DEBUG - binding: [set currentorganization='[PRIVATE]';] with input=[None], processed=[{}] -[2023-04-26T17:16:55.166+0000] {cursor.py:800} INFO - query: [set currentorganization='[PRIVATE]';] -[2023-04-26T17:16:55.166+0000] {connection.py:1363} DEBUG - sequence counter: 2 -[2023-04-26T17:16:55.167+0000] {cursor.py:467} DEBUG - Request id: f7bca188-dda0-4fe6-8d5c-a92dc5f9c7ac -[2023-04-26T17:16:55.167+0000] {cursor.py:469} DEBUG - running query [set currentorganization='[PRIVATE]';] -[2023-04-26T17:16:55.168+0000] {cursor.py:476} DEBUG - isfiletransfer: True -[2023-04-26T17:16:55.168+0000] {connection.py:1035} DEBUG - _cmdquery -[2023-04-26T17:16:55.168+0000] {connection.py:1062} DEBUG - sql=[set currentorganization='[PRIVATE]';], sequenceid=[2], isfiletransfer=[False] -[2023-04-26T17:16:55.168+0000] {network.py:1162} DEBUG - Session status for SessionPool [PRIVATE]', SessionPool 1/1 active sessions -[2023-04-26T17:16:55.169+0000] {network.py:850} DEBUG - remaining request timeout: None, retry cnt: 1 -[2023-04-26T17:16:55.169+0000] {network.py:828} DEBUG - Request guid: 4acea1c3-6a68-4691-9af4-22f184e0f660 -[2023-04-26T17:16:55.169+0000] {network.py:1021} DEBUG - socket timeout: 60 -[2023-04-26T17:16:55.259+0000] {connectionpool.py:465} DEBUG - [PRIVATE]"POST /queries/v1/query-request?requestId=f7bca188-dda0-4fe6-8d5c-a92dc5f9c7ac&requestguid=4acea1c3-6a68-4691-9af4-22f184e0f660 HTTP/1.1" 200 1118 -[2023-04-26T17:16:55.261+0000] {network.py:1047} DEBUG - SUCCESS -[2023-04-26T17:16:55.261+0000] {network.py:1168} DEBUG - Session status for SessionPool [PRIVATE], SessionPool 0/1 active sessions -[2023-04-26T17:16:55.261+0000] {network.py:729} DEBUG - ret[code] = None, after post request -[2023-04-26T17:16:55.261+0000] {network.py:751} DEBUG - Query id: 01abe3ac-0603-4df4-0042-c78307975eb2 -[2023-04-26T17:16:55.262+0000] {cursor.py:807} DEBUG - sfqid: 01abe3ac-0603-4df4-0042-c78307975eb2 -[2023-04-26T17:16:55.262+0000] {cursor.py:813} INFO - query execution done -[2023-04-26T17:16:55.262+0000] {cursor.py:827} DEBUG - SUCCESS -[2023-04-26T17:16:55.262+0000] {cursor.py:846} DEBUG - PUT OR GET: False -[2023-04-26T17:16:55.263+0000] {cursor.py:941} DEBUG - Query result format: json -[2023-04-26T17:16:55.263+0000] {resultbatch.py:433} DEBUG - parsing for result batch id: 1 -[2023-04-26T17:16:55.263+0000] {cursor.py:956} INFO - Number of results in first chunk: 1 -[2023-04-26T17:16:55.263+0000] {cursor.py:735} DEBUG - executing SQL/command -[2023-04-26T17:16:55.263+0000] {cursor.py:593} DEBUG - binding: [SELECT * FROM OPENLINEAGE_ACCESS_HISTORY WHERE EVENT:eventTime > system$get_tag(...] with input=[None], processed=[{}] -[2023-04-26T17:16:55.264+0000] {cursor.py:800} INFO - query: [SELECT * FROM OPENLINEAGEACCESSHISTORY WHERE EVENT:eventTime > system$gettag(...] -[2023-04-26T17:16:55.264+0000] {connection.py:1363} DEBUG - sequence counter: 3 -[2023-04-26T17:16:55.264+0000] {cursor.py:467} DEBUG - Request id: 21e2ab85-4995-4010-865d-df06cf5ee5b5 -[2023-04-26T17:16:55.265+0000] {cursor.py:469} DEBUG - running query [SELECT ** FROM OPENLINEAGEACCESSHISTORY WHERE EVENT:eventTime > system$gettag(...] -[2023-04-26T17:16:55.265+0000] {cursor.py:476} DEBUG - isfiletransfer: True -[2023-04-26T17:16:55.265+0000] {connection.py:1035} DEBUG - cmdquery -[2023-04-26T17:16:55.265+0000] {connection.py:1062} DEBUG - sql=[SELECT ** FROM OPENLINEAGEACCESSHISTORY WHERE EVENT:eventTime > system$gettag(...], sequenceid=[3], isfiletransfer=[False] -[2023-04-26T17:16:55.266+0000] {network.py:1162} DEBUG - Session status for SessionPool '[PRIVATE}', SessionPool 1/1 active sessions -[2023-04-26T17:16:55.267+0000] {network.py:850} DEBUG - remaining request timeout: None, retry cnt: 1 -[2023-04-26T17:16:55.268+0000] {network.py:828} DEBUG - Request guid: aba82952-a5c2-4c6b-9c70-a10545b8772c -[2023-04-26T17:16:55.268+0000] {network.py:1021} DEBUG - socket timeout: 60 -[2023-04-26T17:17:21.844+0000] {connectionpool.py:465} DEBUG - [PRIVATE] "POST /queries/v1/query-request?requestId=21e2ab85-4995-4010-865d-df06cf5ee5b5&requestguid=aba82952-a5c2-4c6b-9c70-a10545b8772c HTTP/1.1" 200 None -[2023-04-26T17:17:21.879+0000] {network.py:1047} DEBUG - SUCCESS -[2023-04-26T17:17:21.881+0000] {network.py:1168} DEBUG - Session status for SessionPool '[PRIVATE}', SessionPool 0/1 active sessions -[2023-04-26T17:17:21.882+0000] {network.py:729} DEBUG - ret[code] = None, after post request -[2023-04-26T17:17:21.882+0000] {network.py:751} DEBUG - Query id: 01abe3ac-0603-4df4-0042-c78307975eb6 -[2023-04-26T17:17:21.882+0000] {cursor.py:807} DEBUG - sfqid: 01abe3ac-0603-4df4-0042-c78307975eb6 -[2023-04-26T17:17:21.882+0000] {cursor.py:813} INFO - query execution done -[2023-04-26T17:17:21.883+0000] {cursor.py:827} DEBUG - SUCCESS -[2023-04-26T17:17:21.883+0000] {cursor.py:846} DEBUG - PUT OR GET: False -[2023-04-26T17:17:21.883+0000] {cursor.py:941} DEBUG - Query result format: arrow -[2023-04-26T17:17:21.903+0000] {resultbatch.py:102} DEBUG - chunk size=256 -[2023-04-26T17:17:21.920+0000] {cursor.py:956} INFO - Number of results in first chunk: 112 -[2023-04-26T17:17:21.949+0000] {arrowiterator.cpython-37m-x8664-linux-gnu.so:0} DEBUG - Batches read: 1 -[2023-04-26T17:17:21.950+0000] {CArrowIterator.cpp:16} DEBUG - Arrow BatchSize: 1 -[2023-04-26T17:17:21.950+0000] {CArrowChunkIterator.cpp:50} DEBUG - Arrow chunk info: batchCount 1, columnCount 1, usenumpy: 0 -[2023-04-26T17:17:21.950+0000] {resultset.py:232} DEBUG - result batch 1 has id: data001 -[2023-04-26T17:17:21.951+0000] {resultset.py:232} DEBUG - result batch 2 has id: data002 -[2023-04-26T17:17:21.951+0000] {resultset.py:232} DEBUG - result batch 3 has id: data003 -[2023-04-26T17:17:21.951+0000] {resultset.py:232} DEBUG - result batch 4 has id: data010 -[2023-04-26T17:17:21.951+0000] {resultset.py:232} DEBUG - result batch 5 has id: data011 -[2023-04-26T17:17:21.951+0000] {resultset.py:232} DEBUG - result batch 6 has id: data012 -[2023-04-26T17:17:21.952+0000] {resultset.py:232} DEBUG - result batch 7 has id: data013 -[2023-04-26T17:17:21.952+0000] {resultset.py:232} DEBUG - result batch 8 has id: data020 -[2023-04-26T17:17:21.952+0000] {resultset.py:232} DEBUG - result batch 9 has id: data02_1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-04-26 14:45:26
-
-

*Thread Reply:* I don't see any Airflow standard logs here, but anyway I looked at it and debugging it would not work if you're bypassing OpenLineageClient.emit and going directly to transport - the logging is done on Client level https://github.com/OpenLineage/OpenLineage/blob/acc207d63e976db7c48384f04bc578409f08cc8a/client/python/openlineage/client/client.py#L73

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-04-27 11:16:20
-
-

*Thread Reply:* I'm sorry, do you have a code snippet on how to get these logs from https://github.com/Snowflake-Labs/OpenLineage-AccessHistory-Setup/blob/main/examples/airflow/dags/lineage/extract_openlineage.py? I still get the ValueError for OpenLineageClient.emit

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-05-04 10:56:34
-
-

*Thread Reply:* Hey does anyone have an idea on this? I'm still stuck on this issue 😞

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-05-05 08:58:49
-
-

*Thread Reply:* I've found the root cause. It's because facets don't have _producer and _schemaURL set. I'll provide a fix soon

- - - -
- ♥️ Tom van Eijk, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-26 11:36:23
-
-

The first New York OpenLineage Meetup is happening today at 5:30 pm ET at Astronomer’s offices in the Flatiron District! https://openlineage.slack.com/archives/C01CK9T7HKR/p1681931978353159

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-04-26 11:36:57
-
-

*Thread Reply:* I’ll be there! I’m looking forward to see you all.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-04-26 11:37:23
-
-

*Thread Reply:* We’ll talk about the evolution of the spec.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-27 02:55:00
-
-

delta_table = DeltaTable.forPath(spark, path) -delta_table.alias("source").merge(df.alias("update"),lookup_statement).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute() -If I write based on df operations like this, I notice that OL does not emit any event. May I know whether these or similar cases can be supported too? 🙇

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-28 04:23:24
-
-

*Thread Reply:* I've created an integration test based on your example. The Openlineage event gets sent, however it does not contain output dataset. I will look deeper into that.

- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-28 08:55:43
-
-

*Thread Reply:* Hey, sorry do you mean input dataset is empty? Or output dataset?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-28 08:55:51
-
-

*Thread Reply:* I am seeing that input dataset is empty

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-28 08:56:05
-
-

*Thread Reply:* ooh, I see input datasets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-28 08:56:11
-
-

*Thread Reply:* Hmm

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-28 08:56:12
-
-

*Thread Reply:* I see

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-04-28 08:57:07
-
-

*Thread Reply:* I create a test in SparkDeltaIntegrationTest class a test method: -```@Test - void testDeltaMergeInto() { - Dataset<Row> dataset = - spark - .createDataFrame( - ImmutableList.of( - RowFactory.create(1L, "bat"), - RowFactory.create(2L, "mouse"), - RowFactory.create(3L, "horse") - ), - new StructType( - new StructField[] { - new StructField("a", LongType$.MODULE$, false, Metadata.empty()), - new StructField("b", StringType$.MODULE$, false, Metadata.empty()) - })) - .repartition(1); - dataset.createOrReplaceTempView("temp");

- -
spark.sql("CREATE TABLE t1 USING delta LOCATION '/tmp/delta/t1' AS SELECT ** FROM temp");
-spark.sql("CREATE TABLE t2 USING delta LOCATION '/tmp/delta/t2' AS SELECT ** FROM temp");
-
-DeltaTable.forName("t1")
-    .merge(spark.read().table("t2"),"t1.a = t2.a")
-    .whenMatched().updateAll()
-    .whenNotMatched().insertAll()
-    .execute();
-
-verifyEvents(mockServer, "pysparkDeltaMergeIntoCompleteEvent.json");
-
- -

}```

- - - -
- 👍 Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-28 08:59:14
-
-

*Thread Reply:* Oh yeah my bad. I am seeing output dataset is empty.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-04-28 08:59:21
-
-

*Thread Reply:* Checks out with your observation

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-03 23:23:36
-
-

*Thread Reply:* Hi @Paweł Leszczyński just curious, has a fix for this been implemented alr?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-04 02:40:11
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, I had some days ooo. I will look into this soon.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-04 07:37:52
-
-

*Thread Reply:* Ahh okie! Thanks so much! Hope you had a good rest!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-04 07:38:38
-
-

*Thread Reply:* yeah. this was an amazing extended weekend 😉

- - - -
- 🎉 Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-05 02:09:10
-
-

*Thread Reply:* This should be it: https://github.com/OpenLineage/OpenLineage/pull/1823

-
- - - - - - - -
-
Labels
- integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-05 02:43:24
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, please let me know if there is still something to be done within #1747 PROPOSAL] Support for V2SessionCatalog. I could not reproduce exactly what you described but fixed some issue nearby.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-05 02:49:38
-
-

*Thread Reply:* Hmm yeah sure let me find out the exact cause of the issue. The pipeline that was causing the issue is now inactive haha. So I'm trying to backtrace from the limited logs I captured last time. Let me get back by next week thanks! 🙇

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-05 09:35:00
-
-

*Thread Reply:* Hi @Paweł Leszczyński I was trying to replicate the issue from my end, but couldn't do so. I think we can close the issue for now, and revisit later on if the issue resurfaces. Does that sound okay?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-05 09:40:33
-
-

*Thread Reply:* sounds cool. we can surely create a new issue later on.

- - - -
- 👍 Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-09 23:34:04
-
-

*Thread Reply:* @Paweł Leszczyński - I was trying to implement these new changes in databricks. I was wondering which java file should I use for building the jar file? Could you plese help me?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-09 00:46:34
-
-

*Thread Reply:* .

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-09 02:37:49
-
-

*Thread Reply:* Hi I found that these merge operations have no input datasets/col lineage: -```df.write.format(fileformat).mode(mode).option("mergeSchema", mergeschema).option("overwriteSchema", overwriteSchema).save(path)

- -

df.write.format(fileformat).mode(mode).option("mergeSchema", mergeschema).option("overwriteSchema", overwriteSchema)\ - .partitionBy(**partitions).save(path)

- -

df.write.format(fileformat).mode(mode).option("mergeSchema", mergeschema).option("overwriteSchema", overwriteSchema)\ - .partitionBy(**partitions).option("replaceWhere", where_clause).save(path)`` -I also noticed the same issue when using theMERGE INTO` command from spark sql. -Would it be possible to extend the support to these df operations. too please? Thanks! -CC: @Paweł Leszczyński

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-09 02:41:24
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, great to hear from you. Could you create an issue out of this? I am working at the moment on Spark 3.4. Once this is ready, I will look at the spark issues. And this one seems to be nicely reproducible. Thanks for that.

- - - -
- 👍 Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-09 02:49:56
-
-

*Thread Reply:* Sure let me create an issue! Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-09 02:55:21
-
-

*Thread Reply:* Created an issue here! https://github.com/OpenLineage/OpenLineage/issues/1919 -Thanks! 🙇

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-15 10:39:50
-
-

*Thread Reply:* Hi @Paweł Leszczyński I just realised, https://github.com/OpenLineage/OpenLineage/pull/1823/files -This PR doesn't actually capture column lineage for the MergeIntoCommand? It looks like there is no column lineage field in the events json.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-17 04:21:24
-
-

*Thread Reply:* Hi @Paweł Leszczyński Is there a potential timeline in mind to support column lineage for the MergeIntoCommand? We're really excited for this feature and would be a huge help to overcome a current blocker. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-28 14:11:34
-
-

Thanks to everyone who came out to Wednesday night’s meetup in New York! In addition to great pizza from Grimaldi’s (thanks for the tip, @Harel Shein), we enjoyed a spirited discussion of: -• the state of observability tooling in the data space today -• the history and high-level architecture of the project courtesy of @Julien Le Dem -• exciting news of an OpenLineage Scanner being planned at MANTA courtesy of @Ernie Ostic -• updates on the project roadmap and some exciting proposals from @Julien Le Dem, @Harel Shein and @Willy Lulciuc -• an introduction to and demo of Marquez from project lead @Willy Lulciuc -• and more. -Be on the lookout for an announcement about the next meetup!

- -
- - - - - - - -
- - -
- ❤️ Harel Shein, Maciej Obuchowski, Peter Hicks, Jakub Dardziński, Atif Tahir -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-04-28 16:02:22
-
-

As discussed during the April TSC meeting, comments are sought from the community on a proposal to support RunEvent-less (AKA static) lineage metadata emission. This is currently a WIP. For details and to comment, please see: -• https://docs.google.com/document/d/1366bAPkk0OqKkNA4mFFt-41X0cFUQ6sOvhSWmh4Iydo/edit?usp=sharing -• https://docs.google.com/document/d/1gKJw3ITJHArTlE-Iinb4PLkm88moORR0xW7I7hKZIQA/edit?usp=sharing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2023-04-30 21:35:47
-
-

Hi all. Probably I just need to study the spec further, but what is the significance of _producer vs producer in the context of where they are used? (same question also for _schemaURL vs schemaURL)? Thx!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-01 12:02:13
-
-

*Thread Reply:* “producer” is an element of the event run itself - e.g. what produced the JSON packet you’re studying. There is only one of these per event run. You can think of it as a top-level property.

- -

producer” (and “schemaURL”) are elements of a facet. They are the 2 required elements for any customized facet (though I don’t agree they should be required, or at least I believe they should be able to be compatible with a blank value and a null value).

- -

A packet sent to an API should only have one “producer” element, but can have many _producer elements in sub-objects (though, only one _producer per facet).

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2023-05-01 12:06:52
-
-

*Thread Reply:* just curious --- is/was there any specific reason for the underscore prefix? If they are in a facet, they would already be qualified.......

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-01 13:13:28
-
-

*Thread Reply:* The facet “BaseFacet” that’s used for customization, has 2 required elements - _producer and _schemaURL. so I don’t believe it’s related to qualification.

- - - -
- 👍 Ernie Ostic -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-01 11:33:02
-
-

I’m opening a vote to release OpenLineage 0.24.0, including: -• a new OpenLineage extractor for dbt Cloud -• a new interface - TransportBuilder - for creating custom transport types without modifying core components of OpenLineage -• a fix to the LogicalPlanSerializer in the Spark integration to make it operational again -• a new configuration parameter in the Spark integration for making dataset paths less verbose -• a fix to the Flink integration CI -• and more. - Three +1s from committers will authorize an immediate release.

- - - -
- ➕ Jakub Dardziński, Willy Lulciuc, Julien Le Dem -
- -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-02 19:43:12
-
-

*Thread Reply:* Thanks for voting. The release will commence within 2 days.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-01 12:03:19
-
-

Does the Spark integration for OpenLineage also support ETL that uses the Apache Spark Structured Streaming framework?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-04 02:33:32
-
-

*Thread Reply:* Although it is not documented, we do have an integration test for that: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/test/resources/spark_scripts/spark_kafka.py

- -

The test reads and writes data to Kafka and verifies if input/output datasets are collected.

-
- - - - - - - - - - - - - - - - -
- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-01 13:14:14
-
-

Also, does it work for pyspark jobs? (Forgive me if Spark job = pyspark, I don’t have a lot of depth on how Spark works.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-01 22:37:25
-
-

*Thread Reply:* From my experience, yeah it works for pyspark

- - - -
- 🙌 Paweł Leszczyński, Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-01 13:35:41
-
-

(and in a less generic question, would it work on top of this Spline agent/lineage harvester, or is it a replacement for it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-01 22:39:18
-
-

*Thread Reply:* Also from my experience, I think we can only use one of them as we can only configure one spark listener... correct me if I'm wrong. But it seems like the latest releases of spline are already using openlineage to some capacity?

- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-08 09:46:15
-
-

*Thread Reply:* In spark.extraListeners you can configure multiple listeners by comma separating them - I think you can use multiple ones with OpenLineage without obvious problems. I think we do pretty similar things to Spline though

- - - -
- 👍 Anirudh Shrinivason -
- -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-25 11:28:41
-
-

*Thread Reply:* (I never said thank you for this, so, thank you!)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-02 04:03:40
-
-

Hi Team,

- -

I have configured Open lineage with databricks and it is sending events to Marquez as expected. I have a notebook which joins 3 tables and write the result data frame to an azure adls location. Each time I run the notebook manually, it creates two start events and two complete events for one run as shown in the screenshot. Is this something expected or I am missing something?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-02 10:45:37
-
-

*Thread Reply:* Hello Sai, thanks for your question! A number of folks who could help with this are OOO, but someone will reply as soon as possible.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-04 02:44:46
-
-

*Thread Reply:* That is interesting @Sai. Are you able to reproduce this with a simple code snippet? Which Openlineage version are you using?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-05 01:16:20
-
-

*Thread Reply:* Yes @Paweł Leszczyński. Each join query I run on top of delta tables have two start and two complete events. We are using below jar for openlineage.

- -

openlineage-spark-0.22.0.jar

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-05 02:41:26
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/1828

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-08 04:05:26
-
-

*Thread Reply:* Hi @Paweł Leszczyński any updates on this issue?

- -

Also, OL is not giving column level lineage for group by operations on tables. Is this expected?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-08 04:07:04
-
-

*Thread Reply:* Hi @Sai, https://github.com/OpenLineage/OpenLineage/pull/1830 should fix duplication issue

-
- - - - - - - -
-
Labels
- documentation, integration/spark -
- -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-08 04:08:06
-
-

*Thread Reply:* this would be part of next release?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-08 04:08:30
-
-

*Thread Reply:* Regarding column lineage & group by issue, I think it's something on databricks side -> we do have an open issue for that #1821

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-08 04:09:24
-
-

*Thread Reply:* once #1830 is reviewed and merged, it will be the part of the next relase

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-08 04:11:01
-
-

*Thread Reply:* sure.. thanks @Paweł Leszczyński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-16 03:27:01
-
-

*Thread Reply:* @Paweł Leszczyński I have used the latest jar (0.25.0) and still this issue persists. I see two events for same input/output lineage.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas - (xsist10@gmail.com) -
-
2023-05-03 03:55:44
-
-

Has anyone used Open Lineage for application lineage? I'm particularly interested in how if/how you handled service boundaries like APIs and Kafka topics and what Dataset Naming (URI) you used.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas - (xsist10@gmail.com) -
-
2023-05-03 04:06:37
-
-

*Thread Reply:* For example, MySQL is stored as producer + host + port + database + table as something like <mysql://db.foo.com:6543/metrics.orders> -For an API (especially one following REST conditions), I was thinking something like method + host + port + path or GET <https://api.service.com:433/v1/users>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-03 10:13:25
-
-

*Thread Reply:* Hi Thomas, thanks for asking about this — it sounds cool! I don’t know of others working on this kind of thing, but I’ve been developing a SQLAlchemy integration and have been experimenting with job naming — which I realize isn’t exactly what you’re working on. Hopefully others will chime in here, but in the meantime, would you be willing to create an issue about this? It seems worth discussing how we could expand the spec for this kind of use case.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas - (xsist10@gmail.com) -
-
2023-05-03 10:58:32
-
-

*Thread Reply:* I suspect this will definitely be a bigger discussion. Let me ponder on the problem a bit more and come back with something a bit more concrete.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-03 10:59:21
-
-

*Thread Reply:* Looking forward to hearing more!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Thomas - (xsist10@gmail.com) -
-
2023-05-03 11:05:47
-
-

*Thread Reply:* On a tangential note, does OpenLineage's column level lineage have support for (I see it can be extended but want to know if someone had to map this before): -• Properties as a path in a structure (like a JSON structure, Avro schema, protobuf, etc) maybe using something like JSON Path or XPath notation. -• Fragments (when a column is a JSON blob, there is an entire sub-structure that needs to be described) -• Transformation description (how an input affects an output. Is it a direct copy of the value or is it part of a formula)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-03 11:22:21
-
-

*Thread Reply:* I don’t know, but I’ll ping some folks who might.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-04 03:24:01
-
-

*Thread Reply:* Hi @Thomas. Column-lineage support currently does not include json fields. We have included in specification fields like transformationDescription and transformationType to store a string representation of the transformation applied and its type like IDENTITY|MASKED. However, those fields aren't filled within Spark integration at the moment.

- - - -
- 🙌 Thomas, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-03 09:54:57
-
-

@channel -We released OpenLineage 0.24.0, including: -Additions: -• Support custom transport types #1795 @nataliezeller1 -• Airflow: dbt Cloud integration #1418 @howardyoo -• Spark: support dataset name modification using regex #1796 @pawel-big-lebowski -Plus bug fixes and more. -Thanks to all the contributors! -For the bug fixes and details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.24.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.23.0...0.24.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🎉 Harel Shein, tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
GreetBot - -
-
2023-05-03 10:45:32
-
-

@GreetBot has joined the channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-04 11:25:23
-
-

@channel -This month’s TSC meeting is next Thursday, May 11th, at 10:00 am PT. The tentative agenda will be on the wiki. More info and the meeting link can be found on the website. All are welcome! Also, feel free to reply or DM me with discussion topics, agenda items, etc.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-05 12:11:37
-
-

Hello all, noticed that openlineage is not able to give column level lineage if there is a groupby operation on a spark dataframe. Has anyone else faced this issue and have any fixes or workarounds? Apache Spark 3.0.1 and Openlineage version 1 are being used. Also tried on Spark version 3.3.0

- -

Log4j error details follow:

- -

23/05/05 18:09:11 ERROR ColumnLevelLineageUtils: Error when invoking static method 'buildColumnLineageDatasetFacet' for Spark3 -java.lang.reflect.InvocationTargetException - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) - at java.lang.reflect.Method.invoke(Method.java:498) - at io.openlineage.spark.agent.lifecycle.plan.column.ColumnLevelLineageUtils.buildColumnLineageDatasetFacet(ColumnLevelLineageUtils.java:35) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.lambda$buildOutputDatasets$21(OpenLineageRunEventBuilder.java:424) - at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) - at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) - at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) - at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) - at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) - at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) - at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildOutputDatasets(OpenLineageRunEventBuilder.java:437) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.populateRun(OpenLineageRunEventBuilder.java:296) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:279) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:222) - at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:70) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:91) - at java.util.Optional.ifPresent(Optional.java:159) - at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:91) - at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:82) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:102) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:39) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:39) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:118) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:102) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:107) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:107) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:98) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1639) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:98) -Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.resultId()Lorg/apache/spark/sql/catalyst/expressions/ExprId; - at io.openlineage.spark3.agent.lifecycle.plan.column.ExpressionDependencyCollector.traverseExpression(ExpressionDependencyCollector.java:79) - at io.openlineage.spark3.agent.lifecycle.plan.column.ExpressionDependencyCollector.lambda$traverseExpression$4(ExpressionDependencyCollector.java:74) - at java.util.Iterator.forEachRemaining(Iterator.java:116) - at scala.collection.convert.Wrappers$IteratorWrapper.forEachRemaining(Wrappers.scala:31) - at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) - at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) - at io.openlineage.spark3.agent.lifecycle.plan.column.ExpressionDependencyCollector.traverseExpression(ExpressionDependencyCollector.java:74) - at io.openlineage.spark3.agent.lifecycle.plan.column.ExpressionDependencyCollector.lambda$null$2(ExpressionDependencyCollector.java:60) - at java.util.LinkedList$LLSpliterator.forEachRemaining(LinkedList.java:1235) - at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) - at io.openlineage.spark3.agent.lifecycle.plan.column.ExpressionDependencyCollector.lambda$collect$3(ExpressionDependencyCollector.java:60) - at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:285) - at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:286) - at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:286) - at scala.collection.Iterator.foreach(Iterator.scala:943) - at scala.collection.Iterator.foreach$(Iterator.scala:943) - at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) - at scala.collection.IterableLike.foreach(IterableLike.scala:74) - at scala.collection.IterableLike.foreach$(IterableLike.scala:73) - at scala.collection.AbstractIterable.foreach(Iterable.scala:56) - at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:286) - at io.openlineage.spark3.agent.lifecycle.plan.column.ExpressionDependencyCollector.collect(ExpressionDependencyCollector.java:38) - at io.openlineage.spark3.agent.lifecycle.plan.column.ColumnLevelLineageUtils.collectInputsAndExpressionDependencies(ColumnLevelLineageUtils.java:70) - at io.openlineage.spark3.agent.lifecycle.plan.column.ColumnLevelLineageUtils.buildColumnLineageDatasetFacet(ColumnLevelLineageUtils.java:40) - ... 36 more

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-08 07:38:19
-
-

*Thread Reply:* Hi @Harshini Devathi, I think this the same as issue: https://github.com/OpenLineage/OpenLineage/issues/1821

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- -
-
Labels
- integration/spark, integration/databricks -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-08 19:44:26
-
-

*Thread Reply:* Thank you @Paweł Leszczyński. So, is this an issue with databricks. The issue thread says that it was able to work on AWS Glue. If so, is there some kind of solution to make it work on Databricks?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-05 12:22:06
-
-

Hello all, is there a way to get lineage in azure synapse analytics with openlineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-05-09 20:17:38
-
-

*Thread Reply:* maybe @Will Johnson knows?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-08 07:06:37
-
-

Hi Team,

- -

I have a usecase where we are connecting to Azure sql database from databricks to extract, transform and load data to delta tables. I could see the lineage is getting build, but there is no column level lineage through its 1:1 mapping from source. Could you please check and update on this.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-09 10:06:02
-
-

*Thread Reply:* There are few possible issues:

- -
  1. The column-level lineage is not implemented for particular part of Spark LogicalPlan
  2. Azure SQL or Databricks have their own implementations of some Spark class, which does not exactly match our extractor. We've seen that happen
  3. You're using SQL JDBC connection with SELECT ** - in which case we can't do anything for now, since we don't know the input columns.
  4. Possibly something else 🙂 @Paweł Leszczyński might have an idea -To fully understand the issue, we'd have to see logs, LogicalPlan of the Spark job, or the job code itself
  5. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-10 02:35:32
-
-

*Thread Reply:* @Sai, providing a short code snippet that is able to reproduce this would be super helpful in examining that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-10 02:59:24
-
-

*Thread Reply:* sure Pawel -Will share the code I used in sometime

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-10 03:37:54
-
-

*Thread Reply:* Here is the code we use.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-16 03:23:13
-
-

*Thread Reply:* Hi Team, Any updates on this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-16 03:23:37
-
-

*Thread Reply:* I tried with putting a sql query having column names in it, still the lineage didn't show up..

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-09 10:00:39
-
-

2023-05-09T13:37:48.526698281Z java.lang.ClassCastException: class org.apache.spark.scheduler.ShuffleMapStage cannot be cast to class java.lang.Boolean (org.apache.spark.scheduler.ShuffleMapStage is in unnamed module of loader 'app'; java.lang.Boolean is in module java.base of loader 'bootstrap') -2023-05-09T13:37:48.526703550Z at scala.runtime.BoxesRunTime.unboxToBoolean(BoxesRunTime.java:87) -2023_05_09T13:37:48.526707874Z at scala.collection.LinearSeqOptimized.forall(LinearSeqOptimized.scala:85) -2023_05_09T13:37:48.526712381Z at scala.collection.LinearSeqOptimized.forall$(LinearSeqOptimized.scala:82) -2023_05_09T13:37:48.526716848Z at scala.collection.immutable.List.forall(List.scala:91) -2023_05_09T13:37:48.526723183Z at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.registerJob(OpenLineageRunEventBuilder.java:181) -2023_05_09T13:37:48.526727604Z at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.setActiveJob(SparkSQLExecutionContext.java:152) -2023_05_09T13:37:48.526732292Z at java.base/java.util.Optional.ifPresent(Unknown Source) -2023-05-09T13:37:48.526736352Z at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$onJobStart$10(OpenLineageSparkListener.java:150) -2023_05_09T13:37:48.526740471Z at java.base/java.util.Optional.ifPresent(Unknown Source) -2023-05-09T13:37:48.526744887Z at io.openlineage.spark.agent.OpenLineageSparkListener.onJobStart(OpenLineageSparkListener.java:147) -2023_05_09T13:37:48.526750258Z at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37) -2023_05_09T13:37:48.526753454Z at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) -2023_05_09T13:37:48.526756235Z at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) -2023_05_09T13:37:48.526759315Z at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) -2023_05_09T13:37:48.526762133Z at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) -2023_05_09T13:37:48.526764941Z at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) -2023_05_09T13:37:48.526767739Z at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) -2023_05_09T13:37:48.526776059Z at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) -2023_05_09T13:37:48.526778937Z at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) -2023_05_09T13:37:48.526781728Z at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) -2023_05_09T13:37:48.526786986Z at <a href="http://org.apache.spark.scheduler.AsyncEventQueue.org">org.apache.spark.scheduler.AsyncEventQueue.org</a>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) -2023_05_09T13:37:48.526789893Z at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) -2023_05_09T13:37:48.526792722Z at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446) -2023_05_09T13:37:48.526795463Z at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) -Hi, noticing this error message from OL... anyone know why its happening?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-09 10:02:25
-
-

*Thread Reply:* @Anirudh Shrinivason what's your OL and Spark version?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-09 10:03:29
-
-

*Thread Reply:* Some example job would also help, or logs/LogicalPlan 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-09 10:05:54
-
-

*Thread Reply:* OL version is 0.23.0 and spark version is 3.3.1

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-09 11:00:22
-
-

*Thread Reply:* Hmm actually, it seems like the error is intermittent actually. I ran the same job again, but did not notice any errors this time...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-10 02:27:19
-
-

*Thread Reply:* This is interesting and it happens within a line: -job.finalStage().parents().forall(toScalaFn(stage -&gt; stageMap.put(stage.id(), stage))); -The result of stageMap.put is Stage and for some reason which I don't undestand it tries doing unboxToBoolean . We could rewrite that to: -job.finalStage().parents().forall(toScalaFn(stage -&gt; { -stageMap.put(stage.id(), stage) -return true; -})); -but this is so weird that it is intermittent and I don't get why is it happening.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-11 02:22:25
-
-

*Thread Reply:* @Anirudh Shrinivason, please let us know if it is still a valid issue. If so, we can create an issue for that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-11 03:11:13
-
-

*Thread Reply:* Hi @Paweł Leszczyński Sflr. Yeah, I think if we are able to fix this, it'll be better. If this is the dedicated fix, then I can create an issue and raise an MR.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-11 04:12:46
-
-

*Thread Reply:* Opened an issue and PR. Do help check if its okay thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-11 04:29:33
-
-

*Thread Reply:* please run ./gradlew spotlessApply with Java 8

- - - -
- ✅ Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Pietro Brunetti - (pietrobrunetti89@gmail.com) -
-
2023-05-10 05:49:00
-
-

Hi all, -I’m new to openlineage (and marquez) so I’m trying to figure out if it could be the right option form a client usecase in which: -• a legacy custom data catalog (mongo backend + Java API backend for fronted in angular) -• AS-IS component lineage realations are retrieve in a custom way from the each component’s APIs -• the customer would like to bring in a basic data lineage feature based on already published metadata that represent custom workloads type (batch,streaming,interactive ones) + data access pattern (no direct relation with the datasources right now but only a abstraction layer upon them) -I’d like to exploit directly Marquez as the metastore to publish metadata about datasource, workload (the workload is the declaration + business logic code deployed into the customer platform) once the component is deployed (e.g. the service that exposes the specific access pattern, or the workload custom declaration), but I saw the openlinage spec is based on strictly coupling between run,job and datasource; I mean I want to be able to publish one item at a time and then (maybe in a future release of the customer product) be able to exploit runtime lineage also

- -

Am I in the right place? -Thanks anyway :)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-10 07:36:33
-
-

*Thread Reply:* > I mean I want to be able to publish one item at a time and then (maybe in a future release of the customer product) be able to exploit runtime lineage also -This is not something that we support yet - there are definitely a lot of plans and preliminary work for that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Pietro Brunetti - (pietrobrunetti89@gmail.com) -
-
2023-05-10 07:57:44
-
-

*Thread Reply:* Thanks for the response, btw I already took a look at the current capabilities provided by openlineage, so my “hidden” question is how do achieve what the customer want to in order to be integrated in some way with openalineage+marquez? -should I choose between make or buy (between already supported platforms) and then try to align “static” (aka declarative) lineage metadata within the openlinage conceptual model?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-10 11:04:20
-
-

@channel -This month’s TSC meeting is tomorrow at 10am PT. All are welcome! https://openlineage.slack.com/archives/C01CK9T7HKR/p1683213923529529

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-05-11 12:59:42
-
-

Does anyone here have experience with vendors in this space like Atlan or Manta? I’m advocating pretty heavily for OpenLineage at my company and have a strong suspicion that the LoE of enabling an equivalent solution from a vendor is equal or greater than that of OL/Marquez. Curious if anyone has first-hand experience with these tools they might be willing to share?

- - - -
- 👋 Eric Veleker -
- -
- 👀 Pietro Brunetti -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2023-05-11 13:58:28
-
-

*Thread Reply:* Hi John. Great question! [full disclosure, I am with Manta 🙂 ]. I'll let others answer as to their experience with ourselves or many other vendors that provide lineage, but want to mention that a variety of our customers are finding it beneficial to bring code based static lineage together with the event-based runtime lineage that OpenLineage provides. This gives them the best of both worlds, for analyzing the lineage of their existing systems, where rich parsers already exist (for everything from legacy ETL tools, reporting tools, rdbms, etc.), to newer or home-grown technologies where applying OpenLineage is a viable alternative.

- - - -
- 👍 John Lukenoff -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Brad Paskewitz - (bradford.paskewitz@fivetran.com) -
-
2023-05-11 14:12:04
-
-

*Thread Reply:* @Ernie Ostic do you see a single front-runner in the static lineage space? The static/event-based situation you describe is exactly the product roadmap I'm seeing here at Fivetran and I'm wondering if there's an opportunity to drive consensus towards a best-practice solution. If I'm not mistaken weren't there plans to start supporting non-run-based events in OL as well?

- - - -
- 👋 Eric Veleker -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-05-11 14:16:34
-
-

*Thread Reply:* I definitely like the idea of a 3rd party solution being complementary to OSS tools we can maintain ourselves while allowing us to offload maintenance effort where possible. Currently I have strong opinions on both sides of the build vs. buy aisle and this seems like the best of both worlds.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-05-11 14:52:40
-
-

*Thread Reply:* @Brad Paskewitz that’s 100% our plan to extend the OL spec to support “run-less” events. We want to collect that static metadata for Datasets and Jobs outside of the context of a run through OpenLineage. -happy to get your feedback here as well: https://github.com/OpenLineage/OpenLineage/pull/1839

- - - -
- :gratitude_thank_you: Brad Paskewitz -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Eric Veleker - (eric@atlan.com) -
-
2023-05-11 14:57:46
-
-

*Thread Reply:* Hi @John Lukenoff. Here at Atlan we've been working with the OpenLineage community for quite some time to unlock the use case you describe. These efforts are adjacent to our ongoing integration with Fivetran. Happy to connect and give you a demo of what we've built and dig into your use case specifics.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-05-12 11:26:32
-
-

*Thread Reply:* Thanks all! These comments are really informative, it’s exciting to hear about vendors leaning into the project to let us continue to benefit from the tremendous progress being made by the community. Had a great discussion with Atlan yesterday and plan to connect with Manta next week to discuss our use-cases.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2023-05-12 12:34:32
-
-

*Thread Reply:* Reach out anytime, John. @John Lukenoff Looking forward to engaging further with you on these topics!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-12 11:15:10
-
-

Hello all, I would like to have a new release of Openlineage as the new code base seems to have some issues fixed. I need these fixes for my project.

- - - -
- ➕ Michael Robinson, Maciej Obuchowski, Julien Le Dem, Jakub Dardziński, Anirudh Shrinivason, Harshini Devathi, Paweł Leszczyński, pankaj koti -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-12 11:19:02
-
-

*Thread Reply:* Thank you for requesting an OpenLineage release. As stated here, three +1s from committers will authorize an immediate release. Our policy is not to release on Fridays, so the earliest we could initiate would be Monday.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-12 13:12:43
-
-

*Thread Reply:* A release on Monday is totally fine @Michael Robinson.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-15 08:37:39
-
-

*Thread Reply:* The release will be initiated today. Thanks @Harshini Devathi

- - - -
- 👍 Anirudh Shrinivason, Harshini Devathi -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-16 20:16:07
-
-

*Thread Reply:* Appreciate it @Michael Robinson and thanks to all the committers for the prompt response

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-15 12:09:24
-
-

@channel -We released OpenLineage 0.25.0, including: -Additions: -• Spark: merge into query support #1823 @pawel-big-lebowski -Fixes: -• Spark: fix JDBC query handling #1808 @nataliezeller1 -• Spark: filter Delta adaptive plan events #1830 @pawel-big-lebowski -• Spark: fix Java class cast exception #1844 @Anirudh181001 -• Flink: include missing fields of Openlineage events #1840 @pawel-big-lebowski -Plus doc changes and more. -Thanks to all the contributors! -For the details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.25.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.24.0...0.25.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Jakub Dardziński, Sai, pankaj koti, Paweł Leszczyński, Perttu Salonen, Maciej Obuchowski, Fraser Marlow, Ross Turk, Harshini Devathi, tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-16 14:03:01
-
-

@channel -If you’re planning on being in San Francisco at the end of June — perhaps for this year’s Data+AI Summit — please stop by Astronomer’s offices on California Street on 6/27 for the first SF OpenLineage Meetup. We’ll be discussing spec changes planned for OpenLineage v1.0.0, progress on Airflow AIP 53, and more. Plus, dinner will be provided! For more info and to sign up, check out the OL blog. Join us!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🙌 alexandre bergere, Anirudh Shrinivason, Harel Shein, Willy Lulciuc, Jarek Potiuk, Ross Turk, John Lukenoff, tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-05-16 14:13:16
-
-

*Thread Reply:* Can’t wait! 💯

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-17 00:09:23
-
-

Hi, I've been noticing this error that is intermittently popping up in some of the spark jobs: -AsyncEventQueue: Dropping event from queue appStatus. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler. -spark.scheduler.listenerbus.eventqueue.size Increasing this spark config did not help either. -Any ideas on how to mitigate this issue? Seeing this in spark 3.1.2 btw

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-17 01:58:28
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, are you able to send the OL events to console. This would let us confirm if the issue is related with event generation or emitting it and waiting for the backend to repond.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-17 01:59:03
-
-

*Thread Reply:* Ahh okay sure. Let me see if I can do that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-17 01:52:15
-
-

Hi Team

- -

We are seeing an issue with OL configured cluster where delta table merge is failing with below error. It is running fine when we run with other clusters where OL is not configured. I ran it multiple times assuming its intermittent issue with memory, but it keeps on failing with same error. Attached the code for reference. We are using the latest release (0.25.0)

- -

org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError

- -

@Paweł Leszczyński @Michael Robinson

- -
- - - - - - - -
- - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-05-19 03:55:51
-
-

*Thread Reply:* Hi @Paweł Leszczyński

- -

Thanks for fixing the issue and with new release merge is working. But I could not see any input and output datasets for this. Let me know if you need any further details to look into this.

- -
},
-"job": {
-    "namespace": "openlineage_poc",
-    "name": "spark_ol_integration_execute_merge_into_command_edge",
-    "facets": {}
-},
-"inputs": [],
-"outputs": [],
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-19 04:00:01
-
-

*Thread Reply:* Oh man, it's just that vanilla spark differs from the one available in databricks platform. our integration tests do verify behaviour on vanilla spark which still leaves a possibility for inconsistency. will need to get back to it then at some time.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-06-02 02:11:28
-
-

*Thread Reply:* Hi @Paweł Leszczyński

- -

Did you get chance to look into this issue?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 02:13:18
-
-

*Thread Reply:* Hi Sai, I am going back to spark. I am working on support for Spark 3.4, which is going to add some event filtering on internal delta operations that trigger unncecessarly the events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 02:13:28
-
-

*Thread Reply:* this may be releated to issue you created

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 02:14:13
-
-

*Thread Reply:* I do have planned creating integration test for databricks which will be helpful to tackle the issues you raised

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 02:14:27
-
-

*Thread Reply:* so yes, I am looking at the Spark

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-06-02 02:20:06
-
-

*Thread Reply:* thanks much Pawel.. I am looking more into the merge part as first priority as we use is frequently.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 02:21:01
-
-

*Thread Reply:* I know, this is important.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 02:21:14
-
-

*Thread Reply:* It just need still some time.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 02:21:46
-
-

*Thread Reply:* thank you for your patience and being so proactive on those issues.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sai - (saivenkatesh161@gmail.com) -
-
2023-06-02 02:22:12
-
-

*Thread Reply:* no problem.. Please do keep us posted with updates..

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-17 10:47:27
-
-

Our recent Openlineage release (0.25.0) proved there are many users that use Openlineage on databricks, which is incredible. I am super happy to know that, although we realised that as a side effect of a bug. Sorry for that.

- -

I would like to opt for a new release which contains PR #1858 and should unblock databricks users.

- - - -
- ➕ Paweł Leszczyński, Maciej Obuchowski, Harshini Devathi, Jakub Dardziński, Sai, Anirudh Shrinivason, Anbarasi -
- -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-18 10:26:48
-
-

*Thread Reply:* The release request has been approved and will be initiated shortly.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-17 22:49:41
-
-

Actually, I noticed a few other stack overflow errors on 0.25.0. Let me raise an issue. Could we cut a release once this bug are fixed too please?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-18 02:29:55
-
-

*Thread Reply:* Hi Anirudh, I saw your issue and I think it is the same one as solved within #1858. Are you able to reproduce it on a version built on the top of main?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-05-18 06:21:05
-
-

*Thread Reply:* Hi I haven't managed to try with the main branch. But if its the same error then all's good! If the error resurfaces then we can look into it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lovenish Goyal - (lovenishgoyal@gmail.com) -
-
2023-05-18 02:21:13
-
-

Hi All,

- -

We are in POC phase OpenLineage integration with our core DBT, can anyone help me with document to start with.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-18 02:28:31
-
-

*Thread Reply:* I know this one: https://openlineage.io/docs/integrations/dbt

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lovenish Goyal - (lovenishgoyal@gmail.com) -
-
2023-05-18 02:41:39
-
-

*Thread Reply:* Hi @Paweł Leszczyński Thanks for the revert, I tried same but facing below issue

- -

requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url:

- -

Looks like I need to start the service

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-05-18 02:44:09
-
-

*Thread Reply:* @Lovenish Goyal, exactly. You need to start Marquez. -More about it: https://marquezproject.ai/quickstart

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-05-18 10:27:52
-
-

*Thread Reply:* @Lovenish Goyal how are you running dbt core currently?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lovenish Goyal - (lovenishgoyal@gmail.com) -
-
2023-05-19 01:55:20
-
-

*Thread Reply:* Trying but facing issue while running marquezproject @Jakub Dardziński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Lovenish Goyal - (lovenishgoyal@gmail.com) -
-
2023-05-19 01:56:03
-
-

*Thread Reply:* @Harel Shein we have created custom docker image of DBT + Airflow and running it on an EC2

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-05-19 09:05:31
-
-

*Thread Reply:* for running dbt core on Airflow, we have a utility that helps develop dbt natively on Airflow. There’s also built in support for collecting lineage if you have the airflow-openlineage provider installed. -https://astronomer.github.io/astronomer-cosmos/#quickstart

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-05-19 09:06:30
-
-

*Thread Reply:* RE issues running Marquez, can you share what those are? I’m guessing that since you are running both of them in individual docker images, the airflow deployment might not be able to communicate with the Marquez endpoints?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-05-19 09:06:53
-
-

*Thread Reply:* @Harel Shein I've already helped with running Marquez 🙂

- - - -
- :first_place_medal: Harel Shein, Paweł Leszczyński, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anbarasi - (anbujothi@gmail.com) -
-
2023-05-18 02:29:53
-
-

@Paweł Leszczyński We are facing the following issue with Azure databricks. When we use aggregate functions in databricks notebooks, Open lineage is not able to provide column level lineage. I understand its an existing issue. Can you please let me know in which release this issue will be fixed ? It is one of the most needed feature for us to implement openlineage in our current project. Kindly let me know

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-18 02:34:35
-
-

*Thread Reply:* I am not sure if this is the same. If you see OL events collected with column-lineage missing, then it's a different one.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-18 02:41:11
-
-

*Thread Reply:* Please also be aware, that it is extremely helpful to investigate the issues on your own before creating them.

- -

Our integration traverses spark's logical plans and extracts lineage events from plan nodes that it understands. Some plan nodes are not supported yet and, from my experience, when working on an issue, 80% of time is spent on reproducing the scenario.

- -

So, if you are able to produce a minimal amount of spark code that reproduces an issue, this can be extremely helpful and significantly speed up resolution time.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anbarasi - (anbujothi@gmail.com) -
-
2023-05-18 03:52:30
-
-

*Thread Reply:* @Paweł Leszczyński Thanks for the prompt response.

- -

Provided sample codes with and without using aggregate functions and its respective lineage events for reference.

- -
  1. Please find the code without using aggregate function: - finaldf=spark.sql(""" - select productid - ,OrderQty as TotalOrderQty - ,ReceivedQty as TotalReceivedQty - ,StockedQty as TotalStockedQty - ,RejectedQty as TotalRejectedQty - from openlineagepoc.purchaseorder - --group by productid - order by productid""")

    - -
       final_df.write.mode("overwrite").saveAsTable("openlineage_poc.productordertest1")
    -
  2. -
- -

Please find the Openlineage Events for the Input, Ouput datasets. We could find the column lineage in this.

- -

"inputs": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/gold/purchaseorder", - "facets": { - "dataSource": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", - "name": "dbfs", - "uri": "dbfs" - }, - "schema": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", - "fields": [ - { - "name": "PurchaseOrderID", - "type": "integer" - }, - { - "name": "PurchaseOrderDetailID", - "type": "integer" - }, - { - "name": "DueDate", - "type": "timestamp" - }, - { - "name": "OrderQty", - "type": "short" - }, - { - "name": "ProductID", - "type": "integer" - }, - { - "name": "UnitPrice", - "type": "decimal(19,4)" - }, - { - "name": "LineTotal", - "type": "decimal(19,4)" - }, - { - "name": "ReceivedQty", - "type": "decimal(8,2)" - }, - { - "name": "RejectedQty", - "type": "decimal(8,2)" - }, - { - "name": "StockedQty", - "type": "decimal(9,2)" - }, - { - "name": "RevisionNumber", - "type": "integer" - }, - { - "name": "Status", - "type": "integer" - }, - { - "name": "EmployeeID", - "type": "integer" - }, - { - "name": "NationalIDNumber", - "type": "string" - }, - { - "name": "JobTitle", - "type": "string" - }, - { - "name": "Gender", - "type": "string" - }, - { - "name": "MaritalStatus", - "type": "string" - }, - { - "name": "VendorID", - "type": "integer" - }, - { - "name": "ShipMethodID", - "type": "integer" - }, - { - "name": "ShipMethodName", - "type": "string" - }, - { - "name": "ShipMethodrowguid", - "type": "string" - }, - { - "name": "OrderDate", - "type": "timestamp" - }, - { - "name": "ShipDate", - "type": "timestamp" - }, - { - "name": "SubTotal", - "type": "decimal(19,4)" - }, - { - "name": "TaxAmt", - "type": "decimal(19,4)" - }, - { - "name": "Freight", - "type": "decimal(19,4)" - }, - { - "name": "TotalDue", - "type": "decimal(19,4)" - } - ] - }, - "symlinks": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet", - "identifiers": [ - { - "namespace": "/mnt/dlzones/warehouse/openlineagepoc/gold", - "name": "openlineagepoc.purchaseorder", - "type": "TABLE" - } - ] - } - }, - "inputFacets": {} - } - ], - "outputs": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/productordertest1", - "facets": { - "dataSource": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", - "name": "dbfs", - "uri": "dbfs" - }, - "schema": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", - "fields": [ - { - "name": "productid", - "type": "integer" - }, - { - "name": "TotalOrderQty", - "type": "short" - }, - { - "name": "TotalReceivedQty", - "type": "decimal(8,2)" - }, - { - "name": "TotalStockedQty", - "type": "decimal(9,2)" - }, - { - "name": "TotalRejectedQty", - "type": "decimal(8,2)" - } - ] - }, - "storage": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/StorageDatasetFacet.json#/$defs/StorageDatasetFacet", - "storageLayer": "unity", - "fileFormat": "parquet" - }, - "columnLineage": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-1/ColumnLineageDatasetFacet.json#/$defs/ColumnLineageDatasetFacet", - "fields": { - "productid": { - "inputFields": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/gold/purchaseorder", - "field": "ProductID" - } - ] - }, - "TotalOrderQty": { - "inputFields": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/gold/purchaseorder", - "field": "OrderQty" - } - ] - }, - "TotalReceivedQty": { - "inputFields": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/gold/purchaseorder", - "field": "ReceivedQty" - } - ] - }, - "TotalStockedQty": { - "inputFields": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/gold/purchaseorder", - "field": "StockedQty" - } - ] - }, - "TotalRejectedQty": { - "inputFields": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/gold/purchaseorder", - "field": "RejectedQty" - } - ] - } - } - }, - "symlinks": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet", - "identifiers": [ - { - "namespace": "/mnt/dlzones/warehouse/openlineagepoc", - "name": "openlineagepoc.productordertest1", - "type": "TABLE" - } - ] - }, - "lifecycleStateChange": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet", - "lifecycleStateChange": "OVERWRITE" - } - }, - "outputFacets": {} - } - ]

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anbarasi - (anbujothi@gmail.com) -
-
2023-05-18 03:55:04
-
-

*Thread Reply:* 2. Please find the code using aggregate function:

- -
    final_df=spark.sql("""
-    select productid
-    ,sum(OrderQty) as TotalOrderQty
-    ,sum(ReceivedQty) as TotalReceivedQty
-    ,sum(StockedQty) as TotalStockedQty
-    ,sum(RejectedQty) as TotalRejectedQty
-    from openlineage_poc.purchaseorder
-    group by productid
-    order by productid""")
-
-    final_df.write.mode("overwrite").saveAsTable("openlineage_poc.productordertest2")
-
- -

Please find the Openlineage Events for the Input, Ouput datasets. We couldnt find the column lineage in output section. Please find the sample

- -

"inputs": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/gold/purchaseorder", - "facets": { - "dataSource": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", - "name": "dbfs", - "uri": "dbfs" - }, - "schema": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", - "fields": [ - { - "name": "PurchaseOrderID", - "type": "integer" - }, - { - "name": "PurchaseOrderDetailID", - "type": "integer" - }, - { - "name": "DueDate", - "type": "timestamp" - }, - { - "name": "OrderQty", - "type": "short" - }, - { - "name": "ProductID", - "type": "integer" - }, - { - "name": "UnitPrice", - "type": "decimal(19,4)" - }, - { - "name": "LineTotal", - "type": "decimal(19,4)" - }, - { - "name": "ReceivedQty", - "type": "decimal(8,2)" - }, - { - "name": "RejectedQty", - "type": "decimal(8,2)" - }, - { - "name": "StockedQty", - "type": "decimal(9,2)" - }, - { - "name": "RevisionNumber", - "type": "integer" - }, - { - "name": "Status", - "type": "integer" - }, - { - "name": "EmployeeID", - "type": "integer" - }, - { - "name": "NationalIDNumber", - "type": "string" - }, - { - "name": "JobTitle", - "type": "string" - }, - { - "name": "Gender", - "type": "string" - }, - { - "name": "MaritalStatus", - "type": "string" - }, - { - "name": "VendorID", - "type": "integer" - }, - { - "name": "ShipMethodID", - "type": "integer" - }, - { - "name": "ShipMethodName", - "type": "string" - }, - { - "name": "ShipMethodrowguid", - "type": "string" - }, - { - "name": "OrderDate", - "type": "timestamp" - }, - { - "name": "ShipDate", - "type": "timestamp" - }, - { - "name": "SubTotal", - "type": "decimal(19,4)" - }, - { - "name": "TaxAmt", - "type": "decimal(19,4)" - }, - { - "name": "Freight", - "type": "decimal(19,4)" - }, - { - "name": "TotalDue", - "type": "decimal(19,4)" - } - ] - }, - "symlinks": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet", - "identifiers": [ - { - "namespace": "/mnt/dlzones/warehouse/openlineagepoc/gold", - "name": "openlineagepoc.purchaseorder", - "type": "TABLE" - } - ] - } - }, - "inputFacets": {} - } - ], - "outputs": [ - { - "namespace": "dbfs", - "name": "/mnt/dlzones/warehouse/openlineagepoc/productordertest2", - "facets": { - "dataSource": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", - "name": "dbfs", - "uri": "dbfs" - }, - "schema": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", - "fields": [ - { - "name": "productid", - "type": "integer" - }, - { - "name": "TotalOrderQty", - "type": "long" - }, - { - "name": "TotalReceivedQty", - "type": "decimal(18,2)" - }, - { - "name": "TotalStockedQty", - "type": "decimal(19,2)" - }, - { - "name": "TotalRejectedQty", - "type": "decimal(18,2)" - } - ] - }, - "storage": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/StorageDatasetFacet.json#/$defs/StorageDatasetFacet", - "storageLayer": "unity", - "fileFormat": "parquet" - }, - "symlinks": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet", - "identifiers": [ - { - "namespace": "/mnt/dlzones/warehouse/openlineagepoc", - "name": "openlineagepoc.productordertest2", - "type": "TABLE" - } - ] - }, - "lifecycleStateChange": { - "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.25.0/integration/spark", - "schemaURL": "https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet", - "lifecycleStateChange": "OVERWRITE" - } - }, - "outputFacets": {} - } - ]

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-18 04:09:17
-
-

*Thread Reply:* amazing. https://github.com/OpenLineage/OpenLineage/issues/1861

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anbarasi - (anbujothi@gmail.com) -
-
2023-05-18 04:11:56
-
-

*Thread Reply:* Thanks for considering the request and looking into it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-18 13:12:35
-
-

@channel -We released OpenLineage 0.26.0, including: -Additions: -• Proxy: Fluentd proxy support (experimental) #1757 @pawel-big-lebowski -Changes: -• Python client: use Hatchling over setuptools to orchestrate Python env setup #1856 @gaborbernat -Fixes: -• Spark: fix logicalPlan serialization issue on Databricks #1858 @pawel-big-lebowski -Plus an additional fix, doc changes and more. -Thanks to all the contributors, including new contributor @gaborbernat! -For the details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.26.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.25.0...0.26.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- ❤️ Paweł Leszczyński, Maciej Obuchowski, Anirudh Shrinivason, Peter Hicks, pankaj koti -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-05-18 14:42:49
-
-

Hi Team , can someone please address https://github.com/OpenLineage/OpenLineage/issues/1866.

-
- - - - - - - -
-
Labels
- proposal -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-05-18 20:13:09
-
-

*Thread Reply:* Hi @Bramha Aelem I replied in the ticket. Thank you for opening it.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-05-18 21:15:30
-
-

*Thread Reply:* Hi @Julien Le Dem - Thanks for quick response. I replied in the ticket. Please let me know if you need any more details.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-19 02:13:57
-
-

*Thread Reply:* Hi @Bramha Aelem - asked for more details in the ticket.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-05-22 11:08:58
-
-

*Thread Reply:* Hi @Paweł Leszczyński - I replied with necessary details in the ticket. Please let me know if you need any more details.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-05-25 15:22:42
-
-

*Thread Reply:* Hi @Paweł Leszczyński - any further updates on issue?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-26 01:56:47
-
-

*Thread Reply:* hi @Bramha Aelem, i was out of office for a few days. will get back into this soon. thanks for update.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-05-27 18:46:27
-
-

*Thread Reply:* Hi @Paweł Leszczyński -Thanks for your reply. will wait for your response to proceed further on the issue.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-06-02 19:29:08
-
-

*Thread Reply:* Hi @Paweł Leszczyński -Hope you are doing well. Did you get a chance to look into the samples which are provided in the ticket. Kindly let me know your observations/recommendations.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-06-09 12:43:54
-
-

*Thread Reply:* Hi @Paweł Leszczyński - Hope you are doing well. Did you get a chance to look into the samples which are provided in the ticket. Kindly let me know your observations/recommendations.

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-07-06 10:29:01
-
-

*Thread Reply:* Hi @Paweł Leszczyński - Good day. Did you get a chance to look into query which I have posted. can you please provide any thoughts on my observation/query.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Doe - (adarsh.pansari@tigeranalytics.com) -
-
2023-05-19 03:42:21
-
-

Hello Everyone, I was trying to integrate openlineage with Jupyter Notebooks, I followed the docs but when I run the sample notebook I am getting an error -23/05/19 07:39:08 ERROR EventEmitter: Could not emit lineage w/ exception -Can someone Please help understand why am I getting this error and the resolution.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-19 03:49:27
-
-

*Thread Reply:* Hello @John Doe, this mostly means there's somehting wrong with your transport config for emitting Openlineage events.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-19 03:49:41
-
-

*Thread Reply:* what do you want to do with the events?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Doe - (adarsh.pansari@tigeranalytics.com) -
-
2023-05-19 04:10:24
-
-

*Thread Reply:* Hi @Paweł Leszczyński, I am working on a PoC to understand the use cases of OL and how it build Lineages.

- -

As for the transport config I am using the codes from the documentation to setup OL. -https://openlineage.io/docs/integrations/spark/quickstart_local

- -

Apart from these I dont have anything else in my nb

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-19 04:38:58
-
-

*Thread Reply:* ok, I am wondering if what you experience isn't similar to issue #1860. Could you try openlineage 0.23.0 to see if get the same error?

- -

<https://github.com/OpenLineage/OpenLineage/issues/1860>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Doe - (adarsh.pansari@tigeranalytics.com) -
-
2023-05-19 10:05:59
-
-

*Thread Reply:* I tried with 0.23.0 still getting the same error

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Doe - (adarsh.pansari@tigeranalytics.com) -
-
2023-05-23 02:34:52
-
-

*Thread Reply:* @Paweł Leszczyński any other way I can try to setup. The issue still persists

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-29 03:53:04
-
-

*Thread Reply:* hmyy, I've just redone steps from https://openlineage.io/docs/integrations/spark/quickstart_local with 0.26.0 and could not reproduce behaviour you encountered.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-05-22 09:41:55
-
-

Hello Team!!! A part of my master thesis's case study was about data lineage in data mesh and how open-source initiatives such as OpenLineage and Marquez can realize this. Can you recommend me some material that can support the writing part of my thesis (more context: I tried to extract lineage events from Snowflake through Airlfow and used Docker Compose on EC2 to connect Airflow and the Marquez webserver)? We will divide the thesis into a few academic papers to make the content more digestible and publish one of them soon hopefully!

- - - -
- 👍 Ernie Ostic, Maciej Obuchowski, Ross Turk, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-22 16:34:00
-
-

*Thread Reply:* Tom, thanks for your question. This is really exciting! I assume you’ve already started checking out the docs, but there are many other resources on the website, as well (on the blog and resources pages in particular). And don’t skip the YouTube channel, where we’ve recently started to upload short, more digestible excerpts from the community meetings. Please keep us updated as you make progress!

- - - -
- 👀 Tom van Eijk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-05-22 16:48:06
-
-

*Thread Reply:* Hi Michael! Thank you so much for sending these resources! I've been working on this thesis for quite some time already and it's almost finished. I just needed some additional information to help in accurately describing some of the processes in OpenLineage and Marquez. Will send you the case study chapter later this week to get some feedback if possible. Keep you posted on things such as publication! Perhaps it can make OpenLineage even more popular than it already is 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-22 16:52:18
-
-

*Thread Reply:* Yes, please share it! Looking forward to checking it out. Super cool!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2023-05-22 09:57:50
-
-

Hi Tom. Good luck. Sounds like a great case study. You might want to compare and contrast various kinds of lineage solutions....all of which complement each other, as well as having their own pros and cons. (...code based lineage via parsing, data similarity lineage, run-time lineage reporting, etc.) ...and then focus on open source and OpenLineage with Marquez in particular.

- - - -
- 🙏 Tom van Eijk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tom van Eijk - (t.m.h.vaneijk@tilburguniversity.edu) -
-
2023-05-22 10:04:44
-
-

*Thread Reply:* Thank you so much Ernie! That sounds like a very interesting direction to keep in mind during research!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-22 16:37:44
-
-

@channel -For an easily digestible recap of recent events, communications and releases in the community, please sign up for our new monthly newsletter! Look for it in your inbox soon.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-05-22 23:32:16
-
-

looking here https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.json#L64 it show that the schemaURL must be set, but then the examples in https://openlineage.io/getting-started#step-1-start-a-run do not contain it, is this a bug, expected? 😄

-
-
openlineage.io
- - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-23 07:24:09
-
-

*Thread Reply:* yeah, it's a bug

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-05-23 12:00:48
-
-

*Thread Reply:* so it's optional then? 😄 or bug in the example?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-05-23 12:02:09
-
-

I noticed that DataQualityAssertionsDatasetFacet inherits from InputDatasetFacet, https://github.com/OpenLineage/OpenLineage/blob/main/spec/facets/DataQualityAssertionsDatasetFacet.json though I think should do from DatasetFacet like all else 🤔

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-23 14:20:09
-
-

@channel -Two years ago last Saturday, we released the first version of OpenLineage, a test release of the Python client. So it seemed like an appropriate time to share our first annual ecosystem survey, which is both a milestone in the project’s growth and an important effort to set our course. This survey has been designed to help us learn more about who is using OpenLineage, what your lineage needs are, and what new tools you hope the project will support. Thank you in advance for taking the time to share your opinions and vision for the project! (Please note: the survey might seem longer than it actually is due to the large number of optional questions. Not all questions apply to all use cases.)

-
-
Google Docs
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Harel Shein, Maciej Obuchowski, Atif Tahir, Peter Hicks, Tamara Fingerlin, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2023-05-23 18:59:46
-
-

Open Lineage Spark Integration our spark workloads on Spark 2.4 are correctly setting .config("spark.sql.catalogImplementation", "hive") however sql queries for CREATE/INSERT INTO dont recoognize the datasets as “Hive”. As per https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/supported-commands.md USING HIVE is needed for appropriate parsing. Why is that the case ? Why cant HQL format for CREATE/INSERT be supported?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2023-05-23 19:01:43
-
-

*Thread Reply:* @Michael Collado wondering if you could shed some light here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-24 05:39:01
-
-

*Thread Reply:* can you show logical plan of your Spark job? I think using hive is not the most important part, but whether job's LogicalPlan parses to CreateHiveTableAsSelectCommand or InsertIntoHiveTable

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2023-05-24 19:37:02
-
-

*Thread Reply:* It parses into InsertIntoHadoopFsRelationCommand. example -== Optimized Logical Plan == -InsertIntoHadoopFsRelationCommand <s3a://uchmsdev03/default/sharanyaOutputTable>, false, [id#89], Parquet, [serialization.format=1, mergeSchema=false, partitionOverwriteMode=dynamic], Append, CatalogTable( -Database: default -Table: sharanyaoutputtable -Owner: 2700940971 -Created Time: Thu Jun 09 11:13:35 PDT 2022 -Last Access: UNKNOWN -Created By: Spark 3.2.0 -Type: EXTERNAL -Provider: hive -Table Properties: [transient_lastDdlTime=1654798415] -Location: <s3a://uchmsdev03/default/sharanyaOutputTable> -Serde Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe -InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat -OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat -Storage Properties: [serialization.format=1] -Partition Provider: Catalog -Partition Columns: [`id`] -Schema: root - |-- displayName: string (nullable = true) - |-- serialnum: string (nullable = true) - |-- osversion: string (nullable = true) - |-- productfamily: string (nullable = true) - |-- productmodel: string (nullable = true) - |-- id: string (nullable = true) -), org.apache.spark.sql.execution.datasources.CatalogFileIndex@5fe23214, [displayName, serialnum, osversion, productfamily, productmodel, id] -+- Union false, false - :- Relation default.tablea[displayName#84,serialnum#85,osversion#86,productfamily#87,productmodel#88,id#89] parquet - +- Relation default.tableb[displayName#90,serialnum#91,osversion#92,productfamily#93,productmodel#94,id#95] parquet

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sharanya Santhanam - (santhanamsharanya@gmail.com) -
-
2023-05-24 19:39:54
-
-

*Thread Reply:* using spark 3.2 & this is the query -spark.sql(s"INSERT INTO default.sharanyaOutput select ** from (SELECT ** from default.tableA union all " + - s"select ** from default.tableB)")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-05-24 01:09:58
-
-

Is there any example of how sourceCodeLocation / git info can be used from a spark job? What do we need to set to be able to see that as part of metadata?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-24 05:37:06
-
-

*Thread Reply:* I think we can't really get it from Spark context, as Spark jobs are submitted in compiled, jar form, instead of plain text like for example Airflow dags.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-05-25 02:15:35
-
-

*Thread Reply:* How about Jupyter Notebook based spark job?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-25 08:44:18
-
-

*Thread Reply:* I don't think it changes much - but maybe @Paweł Leszczyński knows more

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-25 11:24:21
-
-

@channel -Deprecation notice: support for Airflow 2.1 will end in about two weeks, when it will be removed from testing. The exact date will be announced as we get closer to it — this is just a heads up. After that date, use 2.1 at your own risk! (Note: the next release, 0.27.0, will still support 2.1.)

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-25 11:27:39
-
-

For the OpenLineageSparkListener, is there a way to configure it to send packets locally, e.g. save to a file? (instead of pushing to a URL destination)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
alexandre bergere - (alexandre.pro.bergere@gmail.com) -
-
2023-05-25 12:00:04
-
-

*Thread Reply:* We developed a FileTransport class in order to save locally in json file our metrics if you interested in

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-25 12:00:37
-
-

*Thread Reply:* Does it also save the openlineage information, e.g. inputs/outputs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
alexandre bergere - (alexandre.pro.bergere@gmail.com) -
-
2023-05-25 12:02:07
-
-

*Thread Reply:* yes it save all json information, inputs / ouputs included

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-05-25 12:03:03
-
-

*Thread Reply:* Yes! then I am very interested. Is there guidance on how to use the FileTransport class?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-25 13:06:22
-
-

*Thread Reply:* @alexandre bergere it would be pretty useful contribution if you can submit it 🙂

- - - -
- 🙌 alexandre bergere -
- -
-
-
-
- - - - - -
-
- - - - -
- -
alexandre bergere - (alexandre.pro.bergere@gmail.com) -
-
2023-05-25 13:08:28
-
-

*Thread Reply:* We are using it on a transformed OpenLineage library we developed ! I'm going to make a PR in order to share it with you :)

- - - -
- 👍 Julien Le Dem, Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-05-25 13:56:48
-
-

*Thread Reply:* would be great to have. I had it in mind to implement as an enabler for databricks integration tests. great to hear that!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
alexandre bergere - (alexandre.pro.bergere@gmail.com) -
-
2023-05-29 08:19:46
-
-

*Thread Reply:* PR sent: https://github.com/OpenLineage/OpenLineage/pull/1891 🙂 -@Maciej Obuchowski could you tell me how to update the documentation once approved please?

-
- - - - - - - -
-
Labels
- client/python -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-29 08:36:21
-
-

*Thread Reply:* @alexandre bergere we have separate repo for website + docs: https://github.com/OpenLineage/docs

-
- - - - - - - -
-
Website
- <https://openlineage.io/docs> -
- -
-
Stars
- 5 -
- - - - - - - - -
- - - -
- 🙏 alexandre bergere -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bramha Aelem - (bramhaaelem@gmail.com) -
-
2023-05-25 16:40:26
-
-

Hi Team- When we run databricks job, lot of dbfs path namespaces are getting created. Can someone please let us know how to overwrite the symlink namespaces and link with the spark app name or openlineage namespace marquez UI.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-05-26 09:09:09
-
-

Hello,

- -

I am looking to connect the common data model in postgres marquez database and Azure Purview (which uses Apache Atlas API's) lineage endpoint. Does anyone have any how-to on this or can point me to some useful links for this?

- -

Thanks in advance.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-26 13:08:56
-
-

*Thread Reply:* I wonder if this blog post might help? https://openlineage.io/blog/openlineage-microsoft-purview

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-05-26 16:13:38
-
-

*Thread Reply:* This might not fully match your use case, either, but might help: https://learn.microsoft.com/en-us/samples/microsoft/purview-adb-lineage-solution-accelerator/azure-databricks-to-purview-lineage-connector/

-
-
learn.microsoft.com
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-06-01 23:23:49
-
-

*Thread Reply:* Thanks @Michael Robinson

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-05-26 12:44:09
-
-

Are there any constraints on facets? Such as is reasonable to expect that a single job will have a single parent? The schema hints to this by making the parent a single entry; but then one can send different parents for the START and COMPLETE event? 🤔

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-05-29 05:04:32
-
-

*Thread Reply:* I think, for now such thing is not defined other than by implementation of consumers.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-05-30 10:32:09
-
-

*Thread Reply:* Any reason for that?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-01 10:25:33
-
-

*Thread Reply:* The idea is that for particular run, facets can be attached to any event type.

- -

This has advantages, for example, job that modifies dataset that it's also reading from, can get particular version of dataset it's reading from and attach it on start; it would not work if you tried to do it on complete as the dataset would change by then.

- -

Similarly, if the job is creating dataset, we could not get additional metadata on it, so we can attach those information only on complete.

- -

There are also cases where we want facets to be cumulative. The reason for this are streaming jobs. For example, with Apache Flink, we could emit metadata on each checkpoint (or every N checkpoints) that contain metadata that show us how the job is progressing.

- -

Generally consumers should be agnostic for that, but we don't want to overspecify what consumers should do - as people might want to use OL data in different ways, or even ignore some data we're sending.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-05-30 17:49:54
-
-

Any reason why the lifecycle state change facet is not just on the output? But is also allowed on the inputs? 🤔 https://openlineage.io/docs/spec/facets/dataset-facets/lifecycle_state_change I can't see how would it be interpreted for an input 🤔

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-01 10:18:48
-
-

*Thread Reply:* I think it should be output-only, yes.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-01 10:19:14
-
-

*Thread Reply:* @Paweł Leszczyński what do you think?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-02 08:35:13
-
-

*Thread Reply:* yes, should be output only I think

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-05 13:39:07
-
-

*Thread Reply:* should we move it over then? 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-05 13:39:31
-
-

*Thread Reply:* under Output Dataset Facets that is

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-01 12:30:00
-
-

@channel -The first issue of OpenLineage News is now available. To get it directly in your inbox when it’s published, become a subscriber.

- - - -
- 🚀 Willy Lulciuc, Jakub Dardziński, Maciej Obuchowski, Bernat Gabor, Harel Shein, Laurent Paris, Tamara Fingerlin, Perttu Salonen -
- -
- 🔥 Willy Lulciuc, Natalie Zeller, Ernie Ostic, Laurent Paris -
- -
- 💯 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-01 14:23:17
-
-

*Thread Reply:* Correction: Julien and Willy’s talk at Data+AI Summit will take place on June 28

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-01 13:50:23
-
-

Hello all, I’m opening a vote to release 0.27.0, featuring: -• Spark: fixed column lineage from databricks in the case of aggregate queries -• Python client: configurable job-name filtering -• Airflow: fixed urllib.parse.urlparse in case of [] values -Three +1s from committers will authorize an immediate release.

- - - -
- ➕ Jakub Dardziński, Maciej Obuchowski, Willy Lulciuc, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-02 10:30:39
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated on Monday in accordance with our policy here.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-02 13:13:18
-
-

@channel -This month’s TSC meeting is next Thursday, June 8th, at 10:00 am PT. On the tentative agenda: announcements, meetup updates, recent releases, static lineage progress, and open discussion. More info and the meeting link can be found on the website. All are welcome! Also, feel free to reply or DM me with discussion topics, agenda items, etc.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🙌 Sheeri Cabral (Collibra), Maciej Obuchowski, Harel Shein, alexandre bergere, Paweł Leszczyński, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-05 12:34:29
-
-

@channel -We released OpenLineage 0.27.1, including: -Additions: -• Python client: add emission filtering mechanism and exact, regex filters #1878 @mobuchowski -Fixes: -• Spark: fix column lineage for aggregate queries on databricks #1867 @pawel-big-lebowski -• Airflow: fix unquoted [ and ] in Snowflake URIs #1883 @JDarDagran -Plus a CI fix and a proposal. -For the details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.27.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.26.0...0.27.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-05 13:01:06
-
-

Looking for a reviewer under: https://github.com/OpenLineage/OpenLineage/pull/1892 🙂

-
- - - - - - - -
-
Labels
- documentation, spec -
- - - - - - - - - - -
- - - -
- 🙌 Sheeri Cabral (Collibra), Paweł Leszczyński, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-06-05 15:47:08
-
-

*Thread Reply:* @Bernat Gabor thanks for the PR!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-06 08:17:47
-
-

Hey, I request release 0.27.2 to fix potential breaking change in Python client in 0.27.1: https://github.com/OpenLineage/OpenLineage/pull/1908

- - - -
- ➕ Jakub Dardziński, Paweł Leszczyński, Michael Robinson, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-06 10:58:23
-
-

*Thread Reply:* Thanks @Maciej Obuchowski. The release is authorized and will be initiated as soon as possible.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-06 12:33:55
-
-

@channel -We released OpenLineage 0.27.2, including: -Fixes: -• Python client: deprecate client.from_environment, do not skip loading config #1908 @Maciej Obuchowski
-For the details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.27.2 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.27.1...0.27.2 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-06 14:22:18
-
-

Found a major bug in the python client - https://github.com/OpenLineage/OpenLineage/pull/1917, if someone can review

-
- - - - - - - -
-
Labels
- client/python, common -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-06 14:54:47
-
-

And also https://github.com/OpenLineage/OpenLineage/pull/1913 🙂 that fixes the type information not being packaged

-
- - - - - - - -
-
Labels
- integration/airflow, integration/great-expectations, client/python, common, integration/dagster, extractor -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-07 09:48:58
-
-

@channel -This month’s TSC meeting is tomorrow, and all are welcome! https://openlineage.slack.com/archives/C01CK9T7HKR/p1685725998982879

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 11:11:31
-
-

Hi team,

- -

I wanted a lineage of my data for my tables and column level. -I am using jupyter notebook and spark code.

- -

spark = (SparkSession.builder.master('local') - .appName('samplespark') - .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') - .config('spark.jars.packages', 'io.openlineage:openlineagespark:0.12.0') - .config('spark.openlineage.host', 'http://marquez-api:5000') - .config('spark.openlineage.namespace', 'spark_integration') - .getOrCreate())

- -

I used this and then opened the localhost:3000 for marquez

- -

I can see my job there but when i click on the job when its supposed to show lineage, its just an empty screen

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 12:39:20
-
-

*Thread Reply:* Do you get any output in your devtools? I just ran into this yesterday and it looks like it’s related to this issue: https://github.com/MarquezProject/marquez/issues/2410

-
- - - - - - - -
-
Labels
- bug -
- -
-
Comments
- 2 -
- - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 12:40:01
-
-

*Thread Reply:* Seems like more of a Marquez client-side issue than something with OL

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 12:43:02
-
-

*Thread Reply:* ohh but if i try using the console output, it throws ClientProtocolError

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 12:43:41
-
-

*Thread Reply:* Sorry I mean in the dev console of your web browser

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 12:44:43
-
-

*Thread Reply:* this is the dev console in browser

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 12:47:59
-
-

*Thread Reply:* Seems like it’s coming from this line. Are there any job facets defined when you fetch from the API directly? That seems like kind of an old version of OL so maybe the schema is incompatible with the version Marquez is expecting

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 12:51:21
-
-

*Thread Reply:* from pyspark.sql import SparkSession

- -

spark = (SparkSession.builder.master('local') - .appName('sample_spark') - .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') - .config('spark.jars.packages', 'io.openlineage:openlineage_spark:0.12.0') - .config('spark.openlineage.host', '<http://marquez-api:5000>') - .config('spark.openlineage.namespace', 'spark_integration')
- .getOrCreate())

- -

spark.sparkContext.setLogLevel("INFO")

- -

spark.createDataFrame([ - {'a': 1, 'b': 2}, - {'a': 3, 'b': 4} -]).write.mode("overwrite").saveAsTable("temp_table8")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 12:51:49
-
-

*Thread Reply:* This is my only code, I havent done anything apart from this

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 12:52:30
-
-

*Thread Reply:* I would try a more recent version of OL. Looks like you’re using 0.12.0 and I think the project is on 0.27.x currently

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 12:55:07
-
-

*Thread Reply:* so i should change io.openlineage:openlineage_spark:0.12.0 to io.openlineage:openlineage_spark:0.27.1?

- - - -
- 👍 John Lukenoff, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:10:03
-
-

*Thread Reply:* it executed well, unable to see it in marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:18:16
-
-

*Thread Reply:* marquez didnt get updated

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:20:44
-
-

*Thread Reply:* I am actually doing a POC on OpenLineage to find table and column level lineage for my team at Amazon. -If this goes through, the team could use openlineage to track data lineage on a larger scale..

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 13:24:49
-
-

*Thread Reply:* Maybe marquez is still pulling the data from the previous run using the old OL version. Do you still get the same error in the browser console? Do you get the same result if you rebuild and start with a clean marquez db?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:25:10
-
-

*Thread Reply:* yes i did that as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:25:49
-
-

*Thread Reply:* the error was present only once you clicked on any of the jobs in marquez, -since my job isnt showing up i cant check for the error itself

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:26:29
-
-

*Thread Reply:* docker run --network sparkdefault -p 3000:3000 -e MARQUEZHOST=marquez-api -e MARQUEZ_PORT=5000 --link marquez-api:marquez-api marquezproject/marquez-web:0.19.1

- -

used this to rebuild marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 13:26:54
-
-

*Thread Reply:* That’s odd, sorry, that’s probably the most I can help, I’m kinda new to OL/Marquez as well 😅

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:27:41
-
-

*Thread Reply:* no problem, can you refer me to someone who would know, so that i can ask them?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 13:29:25
-
-

*Thread Reply:* Actually looking at in now I think you’re using a slightly outdated version of marquez-web too. I would update that tag to at least 0.33.0. that’s what I’m using

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 13:30:10
-
-

*Thread Reply:* Other than that I would ask in the marquez slack channel or raise an issue in github on that project. Seems like more of an issue with Marquez since some at least some data is rendering in the UI initially

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:32:58
-
-

*Thread Reply:* nope that version also didnt help

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:33:19
-
-

*Thread Reply:* can you share their slack link?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-08 13:34:52
-
-

*Thread Reply:* http://bit.ly/MarquezSlack

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-08 13:35:08
-
-

*Thread Reply:* that link is no longer active

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-06-09 18:44:25
-
-

*Thread Reply:* Hello @Rachana Gandhi could you point to the doc where you found the example .config(‘spark.jars.packages’, ‘io.openlineage:openlineage_spark:0.12.0’) ? We should update it to have the latest version instead.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-09 18:54:49
-
-

*Thread Reply:* https://openlineage.io/docs/integrations/spark/quickstart_local/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rachana Gandhi - (rachana.gandhi410@gmail.com) -
-
2023-06-09 18:59:17
-
-

*Thread Reply:* https://openlineage.io/docs/guides/spark

- -

also the docker compose here has an earlier version of marquez

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshit Soni - (harshit.soni@angelbroking.com) -
-
2023-07-13 17:00:54
-
-

*Thread Reply:* Facing same issue with my initial POC. Did we get any solution for this?

- - - -
-
-
-
- - - - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-08 14:36:38
-
-

Approve a new release 🙂

- - - -
- ➕ Michael Robinson, Willy Lulciuc, Maciej Obuchowski, Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-08 14:43:55
-
-

*Thread Reply:* Requesting a release? 3 +1s from committers will authorize. More info here: https://github.com/OpenLineage/OpenLineage/blob/main/GOVERNANCE.md

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-08 14:44:14
-
-

*Thread Reply:* Yeah, that one 😊

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-08 14:44:44
-
-

*Thread Reply:* Because the python client is broken as is today without a new release

- - - -
- 👍 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-08 18:45:04
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated by EOB next Tuesday, but in all likelihood well before then.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-06-08 19:06:34
-
-

*Thread Reply:* cool

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-12 13:15:26
-
-

@channel -We released OpenLineage 0.28.0, including: -Added -• dbt: add Databricks compatibility #1829 @Ines70 -Fixed -• Fix type-checked marker and packaging #1913 @gaborbernat -• Python client: add schemaURL to run event #1917 @gaborbernat -For the details, see: -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.28.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.27.2...0.28.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🚀 Maciej Obuchowski, Willy Lulciuc, Francis McGregor-Macdonald -
- -
- 👍 Ines DAHOUMANE -COWORKING PARIS- -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-12 14:35:56
-
-

@channel -Meetup announcement: there’s another meetup happening soon! This one will be an evening event on 6/22 in New York at Collibra’s HQ. For details and to sign up, please join the meetup group: https://www.meetup.com/data-lineage-meetup/events/294065396/. Thanks to @Sheeri Cabral (Collibra) for cohosting and providing a space.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-12 23:27:16
-
-

Hi, just curious, does openlineage have a log4j integration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-13 04:44:28
-
-

*Thread Reply:* Do you mean to just log events to logging backend?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-13 04:54:30
-
-

*Thread Reply:* Hmm more like have a separate logging config for sending all the logs to a backend

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-13 04:54:38
-
-

*Thread Reply:* Not the events itself

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-13 05:01:10
-
-

*Thread Reply:* @Anirudh Shrinivason with Spark integration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-13 05:01:59
-
-

*Thread Reply:* It uses slf4j so you should be able to set up your log4j logger

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-13 05:10:55
-
-

*Thread Reply:* Yeah with the spark integration. Ahh I see. Okay sure thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-21 23:21:14
-
-

*Thread Reply:* ~Hi @Maciej Obuchowski May I know what the class path I should be using for setting up the log4j if I want to set it up for OL related logs? Is there some guide or runbook to setting up the log4j with OL? Thanks!~ -Nvm lol found it! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-13 12:19:01
-
-

Hello all, we are just starting to use Marquez as part of our POC. We are following the getting started guide at https://openlineage.io/getting-started/ to set the environment on an AWS Ec2 instance. When we are running ./docker/up.sh, it is not bringing up marquez-web container. Also, we are not able to access Admin UI at 5000 and 5001 ports.

- -

Docker version: 24.0.2 -Docker compose version: 2.18.1 -OS: Ubuntu_20.04

- -

Can someone please let me know what I am missing? -Note: I had to modify docker-compose command in up.sh as per docker compose V2.

- -

Also, we are seeing following log when our loadbalancer is checking for health:

- -

WARN [2023-06-13 15:35:31,040] marquez.logging.LoggingMdcFilter: status: 404 -172.30.1.206 - - [13/Jun/2023:15:35:42 +0000] "GET / HTTP/1.1" 200 535 "-" "ELB-HealthChecker/2.0" 1 -172.30.1.206 - - [13/Jun/2023:15:35:42 +0000] "GET / HTTP/1.1" 404 43 "-" "ELB-HealthChecker/2.0" 2 -WARN [2023-06-13 15:35:42,866] marquez.logging.LoggingMdcFilter: status: 404

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-06-14 10:42:41
-
-

*Thread Reply:* Hello, is anyone eho has recently installed latest version of marquez/open-lineage-spark using docker image available to help Vamshi and I or provide any pointers? Thank you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-15 03:38:38
-
-

*Thread Reply:* if you're working on mac, you can have an issue related to port 5000. Instructions here https://github.com/MarquezProject/marquez#quickstart provides a workaround for that ./docker/up.sh --api-port 9000

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-06-15 08:43:33
-
-

*Thread Reply:* @Paweł Leszczyński, thank you, we were using ubuntu on an EC2 instance and each time we are running into different errors and are never able to access the application page, web server, the admin interface, we have run out of ideas of what else to try differently to get this setup up and running

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-22 14:47:00
-
-

*Thread Reply:* @Michael Robinson Can you please help us here?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-22 14:58:57
-
-

*Thread Reply:* @Vamshi krishna I’m sorry you’re still blocked. Thanks for the information about your system. Would you please share some of the errors you are getting? More details would help us reproduce and diagnose.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-06-22 16:35:00
-
-

*Thread Reply:* @Michael Robinson, thank you, vamshi and i will share the errors that we are running into shortly

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 09:48:16
-
-

*Thread Reply:* We are following https://openlineage.io/getting-started/ guide and trying to set up Marquez on a ubuntu ec2 instance. Following are versions of docker, docker compose and ubuntu

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 09:49:51
-
-

*Thread Reply:* @Michael Robinson When we follow the documentation without changing anything and run sudo ./docker/up.sh we are seeing following errors:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:00:38
-
-

*Thread Reply:* So, I edited up.sh file and modified docker compose command by removing --log-level flag and ran sudo ./docker/up.sh and found following errors:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:02:29
-
-

*Thread Reply:* Then I copied .env.example to .env since compose needs .env file

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:05:04
-
-

*Thread Reply:* I got this error:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:09:24
-
-

*Thread Reply:* since I am getting timeouts, I thought it might be an issue with proxy. So, I followed this doc: https://stackoverflow.com/questions/58841014/set-proxy-on-docker and added my outbound proxy and tried

-
-
Stack Overflow
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:23:46
-
-

*Thread Reply:* @Michael Robinson Then it kind of worked but seeing following errors:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:24:31
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:25:29
-
-

*Thread Reply:* @Michael Robinson @Paweł Leszczyński Can you please see above steps and let us know what are we missing/doing wrong? I appreciate your help and time.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-23 10:45:39
-
-

*Thread Reply:* The latest errors look to me like they’re being caused by postgres and might reflect a port conflict. Are you using the default port for the API (5000)? You might try using a different port. More info about this in the Marquez readme: https://github.com/MarquezProject/marquez/blob/0.35.0/README.md.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:46:55
-
-

*Thread Reply:* Yes we are using default ports: -APIPORT=5000 -APIADMINPORT=5001 -WEBPORT=3000 -TAG=0.35.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:47:40
-
-

*Thread Reply:* We see these postgres permission issues only occasionally. Other times we only see db and api containers up but not the web

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-23 10:52:38
-
-

*Thread Reply:* I would try running ./docker/up.sh --api-port 9000 (see Pawel’s message above for more context.)

- - - -
- 👍 Vamshi krishna -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:54:18
-
-

*Thread Reply:* Still no luck. Seeing same errors.

- -

2023-06-23 14:53:23.971 GMT [1] LOG: could not open configuration file "/etc/postgresql/postgresql.conf": Permission denied -marquez-db | 2023-06-23 14:53:23.971 GMT [1] FATAL: configuration file "/etc/postgresql/postgresql.conf" contains errors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 10:54:43
-
-

*Thread Reply:* ERROR [2023-06-23 14:53:42,269] org.apache.tomcat.jdbc.pool.ConnectionPool: Unable to create initial connections of pool. -marquez-api | ! java.net.UnknownHostException: postgres -marquez-api | ! at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:567) -marquez-api | ! at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) -marquez-api | ! at java.base/java.net.Socket.connect(Socket.java:633) -marquez-api | ! at org.postgresql.core.PGStream.createSocket(PGStream.java:243) -marquez-api | ! at org.postgresql.core.PGStream.&lt;init&gt;(PGStream.java:98) -marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:132) -marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258) -marquez-api | ! ... 26 common frames omitted -marquez-api | ! Causing: org.postgresql.util.PSQLException: The connection attempt failed. -marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:354) -marquez-api | ! at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54) -marquez-api | ! at org.postgresql.jdbc.PgConnection.&lt;init&gt;(PgConnection.java:253) -marquez-api | ! at org.postgresql.Driver.makeConnection(Driver.java:434) -marquez-api | ! at org.postgresql.Driver.connect(Driver.java:291) -marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:346) -marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:227) -marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:768) -marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:696) -marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.init(ConnectionPool.java:495) -marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.&lt;init&gt;(ConnectionPool.java:153) -marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.pCreatePool(DataSourceProxy.java:118) -marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.createPool(DataSourceProxy.java:107) -marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:131) -marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcUtils.openConnection(JdbcUtils.java:48) -marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcConnectionFactory.&lt;init&gt;(JdbcConnectionFactory.java:75) -marquez-api | ! at org.flywaydb.core.FlywayExecutor.execute(FlywayExecutor.java:147) -marquez-api | ! at <a href="http://org.flywaydb.core.Flyway.info">org.flywaydb.core.Flyway.info</a>(Flyway.java:190) -marquez-api | ! at marquez.db.DbMigration.hasPendingDbMigrations(DbMigration.java:73) -marquez-api | ! at marquez.db.DbMigration.migrateDbOrError(DbMigration.java:27) -marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:105) -marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:48) -marquez-api | ! at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:67) -marquez-api | ! at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:98) -marquez-api | ! at io.dropwizard.cli.Cli.run(Cli.java:78) -marquez-api | ! at io.dropwizard.Application.run(Application.java:94) -marquez-api | ! at marquez.MarquezApp.main(MarquezApp.java:60) -marquez-api | INFO [2023-06-23 14:53:42,274] marquez.MarquezApp: Stopping app...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-23 11:06:32
-
-

*Thread Reply:* Why do you run docker up with sudo? some of your screenshots suggest docker is not able to access docker registry. The last error java.net.UnknownHostException: postgres may be just a result of container being down. Could you verify if all the containers are up and running and if not what's the error? Are you able to test this docker.up in your laptop or other environment?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 11:08:34
-
-

*Thread Reply:* Docker commands require sudo and cannot run with other user. -Postgres container is not coming up. It is failing with following errors:

- -

2023-06-23 14:53:23.971 GMT [1] LOG: could not open configuration file "/etc/postgresql/postgresql.conf": Permission denied -marquez-db | 2023-06-23 14:53:23.971 GMT [1] FATAL: configuration file "/etc/postgresql/postgresql.conf" contains errors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-23 11:10:19
-
-

*Thread Reply:* and what does docker ps -a say about postgres container? why did it fail?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 11:11:36
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-23 11:25:17
-
-

*Thread Reply:* hmyy, no changes on our side have been done in postgresql.conf since August 2022. Did you apply any changes or have a clean clone of a repo?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 11:29:46
-
-

*Thread Reply:* No we didn't make any changes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-23 11:32:21
-
-

*Thread Reply:* you did write earlier Note: I had to modify docker-compose command in up.sh as per docker compose V2.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 11:34:54
-
-

*Thread Reply:* Yes all I did was modified this line: docker-compose --log-level ERROR $compose_files up $ARGS to -docker compose $compose_files up $ARGS since docker compose v2 doesn't support --log-level flag

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 11:37:03
-
-

*Thread Reply:* Let me pull an older version and try

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 12:09:43
-
-

*Thread Reply:* Still no luck same exact errors. Tried on a different ubuntu instance. Still seeing same errors with postgres

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Vamshi krishna - (vnallamothu@cardinalcommerce.com) -
-
2023-06-23 15:06:32
-
-

*Thread Reply:* @Jeremy W

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-15 10:40:47
-
-

Hi all, a general doubt. Would the column lineage associated with a job be present in both the start events and the complete events? Or could there be cases where the column lineage, and any output information is only present in one of the events, but not the other?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-15 10:49:42
-
-

*Thread Reply:* > Or could there be cases where the column lineage, and any output information is only present in one of the events, but not the other? -Yes. Generally events regarding single run are cumulative

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-15 11:07:03
-
-

*Thread Reply:* Ahh I see... Is it fair to assume that if I see column lineage in a start event, it's the full column lineage? Or could it be possible that half the lineage is in the start event, and half the lineage is in the complete event?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-15 22:50:51
-
-

*Thread Reply:* Hi @Maciej Obuchowski just pinging in case you'd missed the above message. 🙇

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-16 04:48:57
-
-

*Thread Reply:* Actually, in this case this definitely should not happen. @Paweł Leszczyński am I right?

- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-16 04:50:16
-
-

*Thread Reply:* @Maciej Obuchowski yes, you're

- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
nivethika R - (nivethikar8@gmail.com) -
-
2023-06-15 11:14:33
-
-

Hi All.. Is JDBC supported for openLineage and marquez for columnlineage? I did some POC using tables in postgresdb and I am able to see all events but for columnLineage Iam getting it as NULL. Not sure where I am missing.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-16 02:14:19
-
-

*Thread Reply:* ~No, we do have an open issue for that: https://github.com/OpenLineage/OpenLineage/issues/1758~

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-16 05:02:26
-
-

*Thread Reply:* @nivethika R, I am sorry for misleading response, we've merged PR for that https://github.com/OpenLineage/OpenLineage/pull/1636. It does not support select ** but besides that, it should be operational.

- -

Could you please try a query from our integration tests to verify if this is working for you or not: https://github.com/OpenLineage/OpenLineage/pull/1636/files#diff-137aa17091138b69681510e13e3b7d66aa9c9c7c81fe8fe13f09f0de76448dd5R46 ?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nagendra Kolisetty - (nkolisetty@geico.com) -
-
2023-06-16 12:12:00
-
-

Hi There,

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nagendra Kolisetty - (nkolisetty@geico.com) -
-
2023-06-16 12:12:43
-
-

We are trying to install the image on the private AKS cluster and we ended up in below error

- -

kubectl : pod marquez/pgsql-postgresql-client terminated (StartError) -At line:1 char:1

  • kubectl run pgsql-postgresql-client --rm --tty -i --restart='Never' `
  • ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -
    • CategoryInfo : NotSpecified: (pod marquez/pgs...ed (StartError):String) [], RemoteException
    • FullyQualifiedErrorId : NativeCommandError
    • -
  • -
- -

failed to create containerd task: failed to create shim task: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "PGPASSWORD=macondo": executable file not found in $PATH: -unknown

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nagendra Kolisetty - (nkolisetty@geico.com) -
-
2023-06-16 12:13:13
-
-

We followed the below article to install Marquez in AKS (Azure). -By the way, we pulled the images from docker pushed it to our acr. -tried installing the postgresql via ACR and it failed with the error

- -

https://github.com/MarquezProject/marquez/blob/main/docs/running-on-aws.md

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-21 11:07:04
-
-

*Thread Reply:* Hi Nagendra, sorry you’re running into this error. We’re looking into it!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-18 09:53:19
-
-

Hi, found this error in couple of the spark jobs: https://github.com/OpenLineage/OpenLineage/issues/1930 -Would request your help to kindly help patch thanks!

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-19 09:37:20
-
-

*Thread Reply:* Hey @Anirudh Shrinivason, me and Paweł are at Berlin Buzzwords right now. Will definitely look at it later

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-19 10:47:06
-
-

*Thread Reply:* Oh nice! Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ayush mittal - (ayushmittal057@gmail.com) -
-
2023-06-20 03:14:02
-
-

Hi Team, we are not able to generate lineage for aggregate functions while joining two tables. below is the query -df2 = spark.sql("select th.ProductID as Pid, pd.Name as N, sum(th.quantity) as TotalQuantity, sum(th.ActualCost) as TotalCost from silveradventureworks.transactionhistory as th join productdescription_dim as pd on th.ProductID = pd.ProductID group by th.ProductID, pd.Name ")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rahul - (rahul812ry@gmail.com) -
-
2023-06-20 03:47:50
-
-

*Thread Reply:* This is the event generated for above query.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
ayush mittal - (ayushmittal057@gmail.com) -
-
2023-06-20 03:18:22
-
-

and one more issue, we are not able to generate the open lineage events on top of view being created by joining multiple tables. -i have attached log events for your reference.

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
ayush mittal - (ayushmittal057@gmail.com) -
-
2023-06-20 03:31:11
-
-

this is event for view for which no lineage is being generated

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-20 13:59:00
-
-

Has anyone here successfully implemented the Amundsen OpenLineage extractor? I’m a little confused on the best way to output my lineage events to ndjson files in a scalable way as the docs seem to suggest. Currently I’m pushing all my lineage events to Marquez via REST API. I suppose I could change my transports to Kinesis and write the events to s3 but that comes with the cost of having to build some new way of getting the events to Marquez.

- -

In any case, this seems like a problem someone must have solved before?

- -

Edit: looking at the source code for this Amundsen extractor, it seems like it should be pretty straightforward to just implement our own extractor that can pull these records from the Marquez backend. Will give that a shot and see about getting that merged into Amundsen later.

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-20 17:34:08
-
-

*Thread Reply:* Hi John, glad to hear you figured out a path forward on this! Please let us know what you learn 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-20 14:21:03
-
-

Our New York meetup with Collibra is happening in just two days! https://openlineage.slack.com/archives/C01CK9T7HKR/p1686594956030059

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-06-20 14:31:56
-
-

Hello all, Do you know if we have th possibility of persisting column orders while creating lineage as it may be available in the table or data set from which it originates. Or, is there some way in which we can get the column order (id or something).

- -

For example, if a dataset has columns xyz, abc, fgh, dec, I would like to know which column shows first in the dataset in the common data model. Please let me know. m

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-20 17:33:36
-
-

*Thread Reply:* Hi Harshini, I’ve alerted our resident Spark and column-lineage expert about this. Hope to have an answer for you soon.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-06-20 19:39:46
-
-

*Thread Reply:* Thank you Michael, looking forward to it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-21 02:58:41
-
-

*Thread Reply:* Hello @Harshini Devathi. An interesting topic which I have never thought about. The ordering of the fields, we get for Spark Apps, comes from Spark logical plans we extract information from and we do not apply any sorting on them. So, if Spark plan contains columns a , b, c we trust it's the order of columns for a dataset and don't want to check it on our own.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-21 02:59:45
-
-

*Thread Reply:* btw. please let us know how do you obtain your lineage: within a Spark app or from some SQL's scheduled by Airflow?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshini Devathi - (harshini.devathi@tigeranalytics.com) -
-
2023-06-23 14:40:31
-
-

*Thread Reply:* Hello @Paweł Leszczyński, thank you for the response. We do not need you to check the ordering specifically but I assume that the spark logical plan maintains the column order based on the input datasets. Can we retain that order by adding column id or some sequence number which helps to represent the lineage in the same order.

- -

The lineage we are capturing using Spark openlineage connector, by posting custom lineage to Marquez through API calls, and also in process of leveraging SQL connector feature using Airflow.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-26 04:35:43
-
-

*Thread Reply:* Hi @Harshini Devathi, are you asking about schema facet within a dataset? This should have an order from spark logical plans. Or, are you asking about columnLineage facet? Or Marquez API responses? It's not clear to me why do you need it. Each column, is identified by a dataset (dataset namespace + dataset name) and field name. You can, on your side, generate and column id based on that and order columns based on the id, but still I think I am missing some arguments behind doing so.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-21 17:41:48
-
-

Attention all Bay-area data friends and Data+AI Summit attendees: our first San Francisco meetup is next Tuesday! https://www.meetup.com/meetup-group-bnfqymxe/events/293448130/

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 alexandre bergere -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-23 16:41:29
-
-

Last night in New York we held a meetup with Collibra at their lovely HQ in the Financial District! Many thanks to @Sheeri Cabral (Collibra) for inviting us. -Over a bunch of tasty snacks (thanks for the idea @Harel Shein), we discussed: -• the history and evolution of the spec, and trends in adoption -• progress on the OpenLineage Provider in Airflow (AIP 53) -• progress on “static” AKA design lineage support (expected soon in OpenLineage 1.0.0) -• progress in the LFAI program -• a proposal to add “jobless run” support for auditing use cases and similar edge cases -• an idea to throw a hackathon for creating validation tests and example payloads (would you be interested in participating? let us know!) -• and more. -Many thanks to: -• @Julien Le Dem for making the trip -• Sheeri & Collibra for hosting -• everyone for coming, including second-timer @Ernie Ostic and new member @Shirley Lu -It was great meeting/catching up with everyone. Hope to see you and more new faces at the next one!

- -
- - - - - - - -
- - -
- 🎉 Harel Shein, Peter Hanssens, Ernie Ostic, Paweł Leszczyński, Maciej Obuchowski, Shirley Lu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-26 10:59:08
-
-

Our first San Francisco meetup is tomorrow at 5:30 PM at Astronomer’s offices in the Financial District. https://openlineage.slack.com/archives/C01CK9T7HKR/p1687383708927189

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 🚀 alexandre bergere -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-27 03:43:10
-
-

I can’t seem to get OL logging working with Spark. Any guidance please?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-27 03:45:31
-
-

*Thread Reply:* Is it because the logLevel is set to WARN or ERROR?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-27 12:07:12
-
-

*Thread Reply:* No, I set it to INFO, may be I need to add some jars?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-27 12:30:02
-
-

*Thread Reply:* Hmm have you set the relevant spark configs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-27 12:32:50
-
-

*Thread Reply:* yep, I have http working. But not the console -spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener -spark.openlineage.transport.type=console

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-27 12:35:27
-
-

*Thread Reply:* Oh wait http works but not console...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-27 12:37:02
-
-

*Thread Reply:* If you want to see the console events which are emitted, then need to set logLevel to DEBUG

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-27 12:37:44
-
-

*Thread Reply:* tried that too, still nothing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-27 12:38:54
-
-

*Thread Reply:* Is the openlienage jar installed and added to config?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-27 12:39:09
-
-

*Thread Reply:* yep, that’s why http works

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-27 12:39:26
-
-

*Thread Reply:* the only thing I see in the logs is this: -23/06/27 07:39:11 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerJobEnd

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-27 12:40:59
-
-

*Thread Reply:* Hmm if an event is still emitted for this case, but logs not showing up then I'm not sure... Maybe someone with more knowledge on this can help

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-27 12:42:37
-
-

*Thread Reply:* sure, thanks for trying @Anirudh Shrinivason

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-06-28 05:23:36
-
-

*Thread Reply:* What job are you trying this on? If there's this message, then logging is working afaik

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-28 12:16:52
-
-

*Thread Reply:* Hi @Maciej Obuchowski Actually I also noticed a similar issue... For some spark pipelines, the log level is set to debug, but I'm not seeing any events being logged. I am however receiving these events in the backend. Have any of the logging been removed from some places?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rakesh Jain - (rakeshj@us.ibm.com) -
-
2023-06-28 20:57:45
-
-

*Thread Reply:* yep, exactly same thing here also @Maciej Obuchowski, I can get the events on http, but changing to console gets me nothing from ConsoleTransport.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-06-27 20:45:15
-
-

@here A bunch of us are downstairs in the lobby at 8 California but no one is down here to let us up. Anyone here to help?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-29 03:36:36
-
-

Hi guys, I noticed a few of the jobs getting OOMed while running with openlineage. Even increasing the number of executors and doubling the memory does not seem to fix it actually. This is observed especially when using the graphx libs. Is this a known issue? Just curious as to what the cause might be... The same jobs run fine once openlineage is disabled. Are there some rogue threads from the listener or any connections we aren't closing properly?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-29 05:57:59
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, could you disable serializing spark.logicalPlan to see if the behaviour is the same?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-29 05:58:28
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark -> spark.openlineage.facets.disabled -> [spark_unknown;spark.logicalPlan]

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-06-29 05:59:55
-
-

*Thread Reply:* We do serialize logicalPlan because this is useful in many cases, but sometimes can lead to serializing things that shouldn't be serialized

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-06-29 15:49:35
-
-

*Thread Reply:* Ahh I see. Yeah okay let me try that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-30 08:01:34
-
-

Hello all, I’m opening a vote to release OpenLineage 0.29.0, including: -• support for Spark 3.4 -• support for Flink 1.17.1 -• a fix in the Flink integration to enable dataset schema extraction for a KafkaSource when GenericRecord is used -• removal of the unused Golang proxy client (made redundant by the fluentd proxy) -• security vulnerability fixes, doc changes, test improvements, and more. -Three +1s from committers will authorize an immediate release.

- - - -
- ➕ Jakub Dardziński, Paweł Leszczyński, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-30 08:05:53
-
-

*Thread Reply:* Thanks, all. The release is authorized.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-30 13:27:35
-
-

@channel -We released OpenLineage 0.29.2, including: -Added -• Flink: support Flink version 1.17.1 #1947 @pawel-big-lebowski -• Spark: support Spark version 3.4 #1790 @pawel-big-lebowski -Removed -• Proxy: remove unused Golang client approach #1926 @mobuchowski -• Req: bump minimum supported Python version to 3.8 #1950 @mobuchowski - ◦ Note: this removes support for Python 3.7, which is at EOL. -Plus test improvements, docs changes, bug fixes and more. -Thanks to all the contributors! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.29.2 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.28.0...0.29.2 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👍 Shirley Lu, Maciej Obuchowski, Paweł Leszczyński, Tamara Fingerlin -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-06-30 17:23:04
-
-

@channel -The latest issue of OpenLineage News is now available, featuring a recap of recent events, releases, and more. To get it directly in your inbox each month, sign up https://openlineage.us14.list-manage.com/track/click?u=fe7ef7a8dbb32933f30a10466&id=e598962936&e=ef0563a7f8|here.

-
-
apache.us14.list-manage.com
- - - - - - - - - - - - - - - -
- -
- - - - - - - -
- - -
- 👍 Maciej Obuchowski, Paweł Leszczyński, Tristan GUEZENNEC -CROIX-, Tamara Fingerlin, Jeremy W, Anirudh Shrinivason, Julien Le Dem, Sheeri Cabral (Collibra), alexandre bergere -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-06 13:36:44
-
-

@channel -This month’s TSC meeting is next Thursday, 7/13, at a special time: 8 am PT. -All are welcome! -On the tentative agenda: -• announcements -• updates -• recent releases -• a new DataGalaxy integration -• open discussion

- - - -
- ✅ Sheeri Cabral (Collibra), Maciej Obuchowski, alexandre bergere, Paweł Leszczyński, Willy Lulciuc, Anirudh Shrinivason, Shirley Lu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-07-07 10:35:08
-
-

Wow, I just got finished watching @Julien Le Dem and @Willy Lulciuc’s presentation of OpenLineage at databricks and it’s really fantastic! There isn’t a better 30 minutes of content on theory + practice than this, IMO. https://www.databricks.com/dataaisummit/session/cross-platform-data-lineage-openlineage/ (you can watch for free by making an account. I’m not affiliated with databricks…)

-
-
databricks.com
- - - - - - - - - - - - - - - - - -
- - - -
- ❤️ Willy Lulciuc, Harel Shein, Yuanli Wang, Ross Turk, Michael Robinson, Jakub Dardziński, Conor Beverland, Maciej Obuchowski, Jarek Potiuk, Julien Le Dem, Chris Folkes, Anirudh Shrinivason, Shirley Lu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-07 10:37:49
-
-

*Thread Reply:* thanks for watching and sharing! the recording is also on youtube 😉 https://www.youtube.com/watch?v=rO3BPqUtWrI

-
-
YouTube
- -
- - - } - - Databricks - (https://www.youtube.com/@Databricks) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-07-07 10:38:01
-
-

*Thread Reply:* ❤️

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jarek Potiuk - (jarek@potiuk.com) -
-
2023-07-08 13:35:10
-
-

*Thread Reply:* Very much agree. I’ve even forwarded to a few people here and there, those who I think should learn about it.

- - - -
- ❤️ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-07-08 13:47:17
-
-

*Thread Reply:* You’re both too kind :) -Thank you for your support and being part of the community.

- - - -
- ❤️ Sheeri Cabral (Collibra), Jarek Potiuk -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-07 15:44:33
-
-

@channel -If you registered for TSC meetings through AddEvent, first of all, thank you! Second of all, I’ve had to create a new event series there to enable the editing of individual events. When you have a moment, would you please register for next week’s meeting? Apologies for the inconvenience.

-
-
addevent.com
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Kiran Hiremath, Willy Lulciuc, Shirley Lu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-10 12:29:31
-
-

Hi community, we are interested in capturing time-travel usage for Iceberg Spark sql in column lineage. For instance, INSERT INTO schema.table select ** from schema.another_table version as of 'some_version' . Column lineage is currently missing the version, if used, which it’s actually quite relevant. I’ve gone through the open issues and didn’t see anything similar. Does it look like a valid use case scenario? We started going through the OL, iceberg and Spark code in trying to capture/expose it, but so far we haven’t been able to. If anyone can give a hint/idea/pointer, we are willing to give it try a contribute back with the code

- - - -
- 👀 Rakesh Jain, Nitin Ramchandani -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-07-11 05:46:36
-
-

*Thread Reply:* I think yes this is a great use case. @Paweł Leszczyński is more familiar with the spark integration code than I. -I think in this case, we would add the datasetVersion facet with the underlying Iceberg version: https://github.com/OpenLineage/OpenLineage/blob/main/spec/facets/DatasetVersionDatasetFacet.json -We extract this information in a few places: -https://github.com/search?q=repo%3AOpenLineage%2FOpenLineage%20%20DatasetVersionDatasetFacet&type=code

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-11 05:57:17
-
-

*Thread Reply:* Yes, we do have datasetVersion which captures for output and input datasets their iceberg or delta version. Input versions are collected on START while output are collected on COMPLETE in case a job reads and writes to the same dataset. So, even though column-lineage facet is missing the version, it should be available within events related to a particular run.

- -

If it is not, then perhaps the case here is the lack of support of as of syntax. As far as I remeber, we always get a current version of a dataset and this may be a missing part here.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-11 05:58:49
-
-

*Thread Reply:* link to a method that gets dataset version for iceberg: https://github.com/OpenLineage/OpenLineage/blob/0.29.2/integration/spark/spark3/sr[…]lineage/spark3/agent/lifecycle/plan/catalog/IcebergHandler.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-11 10:57:26
-
-

*Thread Reply:* Thank you @Julien Le Dem and @Paweł Leszczyński -Based on what I’ve seen so far, indeed it seems that only the current snapshot is tracked. When IcebergHandler.getDatasetVersion() -Initially I was expecting to be able to obtain the snapshotId from the SparkTable which comes within getDatasetVersion() but now I realize that OL is using an older version of Iceberg runtime, (0.12.1) which does not support time travel (introduced in 0.14.1). -The evidence is: -• Iceberg documentation for release 0.14.1: https://iceberg.apache.org/docs/0.14.0/spark-queries/#sql -• Iceberg release notes https://iceberg.apache.org/releases/#0140-release -• Comparing the source code, I see the SparkTable from 0.14.1 onward does have a snapshotId instance variable, while previous versions don’t -https://github.com/apache/iceberg/blob/0.14.x/spark/v3.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L82 -https://github.com/apache/iceberg/blob/0.12.x/spark3/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L78

- -

I don’t see anyone complaining about the old version of Iceberg runtime being used and there is no open issue to upgrade so I’ll open the issue and please let me know if that seems reasonable as the immediate next step to take

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-11 15:48:53
-
-

*Thread Reply:* Created issues: #1969 and #1970

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-12 07:15:14
-
-

*Thread Reply:* Thanks @Juan Manuel Cappi. openlineage-spark jar contains modules like spark3 , spark32 , spark33 and spark34 that is going to be merged soon (we do have a ready PR for that). spark34 will be compiled against latest iceberg version. Once this is done #1969 can be closed. For 1970, one would need to implement datasetBuilder within spark34 module and visits node within spark's logical plan that is responsible for as of and creates dataset for OpenLineage event other way than getting latest snapshot version.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-13 12:51:19
-
-

*Thread Reply:* @Paweł Leszczyński I’ve see PR #1971 and I see a new spark34 project with the latest iceberg-spark dependency version, but other versions (spark33, spark32, etc) have not being upgraded in that PR. Since the change is small and does not break any tests, I’ve created PR #1976 for to fix #1969. That alone is unlocking some time travel lineage (i.e. dataset identifier now becomes schema.table.version or schema.table.snapshot_id). Hope it makes sense

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-14 04:37:55
-
-

*Thread Reply:* Hi @Juan Manuel Cappi, You're right and after discussion with you I realized we support some version of iceberg (for spark 3.3 it's still 0.14.0) but this is not the latest iceberg version matching spark version.

- -

There's some tricky part here. Although we wan't our code to succeed with latest spark, we don't want it to fail in a nasty way (class not found exception) when a user is working with an old iceberg version. There are places in our code where we do check are iceberg classes on the classpath? We need to extend this to are iceberg classes on classpath is iceberg version above 0.14 or not For sure this is the case for merge into commands I am working on at the moment. Let's see if the other integration tests are affected in your PR

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Amod Bhalerao - (amod.bhalerao@gmail.com) -
-
2023-07-11 08:09:57
-
-

HI Team, I Seen that Kafka lineage is not coming properly in for Spark streaming, Are we working on this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-11 08:28:59
-
-

*Thread Reply:* what do you mean by that? there is a pyspark & kafka integration test that verifies event being sent when reading or writing to kafka topic: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/tes[…]a/io/openlineage/spark/agent/SparkContainerIntegrationTest.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-11 09:28:56
-
-

*Thread Reply:* We do have an old issue https://github.com/OpenLineage/OpenLineage/issues/372 to support more spark plans that are stream related. But, if you had an example of streaming that is not working for you, this would have been really helpful.

-
- - - - - - - -
-
Labels
- integration/spark, streaming -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Amod Bhalerao - (amod.bhalerao@gmail.com) -
-
2023-07-26 08:03:30
-
-

*Thread Reply:* I have a pipeline Which reads from topic and send data to 3 HIVE tables and one postgres , Its not emitting any lineage for this pipeline

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Amod Bhalerao - (amod.bhalerao@gmail.com) -
-
2023-07-26 08:06:51
-
-

*Thread Reply:* just one task is getting created

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-12 05:55:19
-
-

Hi guys, I notice that with the below spark configs: -```from pyspark.sql import SparkSession -import os

- -

os.environ["TEST_VAR"] = "1"

- -

spark = (SparkSession.builder.master('local') - .appName('samplespark') - .config('spark.jars.packages', 'io.openlineage:openlineagespark:0.29.2,io.delta:deltacore2.12:1.0.1') - .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') - .config('spark.openlineage.transport.type', 'console') - .config('spark.sql.catalog.sparkcatalog', "org.apache.spark.sql.delta.catalog.DeltaCatalog") - .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") - .config("hive.metastore.schema.verification", False) - .config("spark.sql.warehouse.dir","/tmp/") - .config("hive.metastore.warehouse.dir","/tmp/") - .config("javax.jdo.option.ConnectionURL","jdbc:derby:;databaseName=/tmp/metastoredb;create=true") - .config("spark.openlineage.facets.customenvironmentvariables","[TESTVAR;]") - .config("spark.openlineage.facets.disabled","[sparkunknown;spark.logicalPlan]") - .config("spark.hadoop.fs.permissions.unmask-mode","000") - .enableHiveSupport() - .getOrCreate())``` -The custom environment variables facet is not kicking in. However, when all the delta related spark configs are removed, it is working fine. Is this a known issue? Are there any workarounds for it? Thanks!

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-12 06:14:41
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, I’m not familiar with Delta, but enabling debugging helped me a lot to understand what’s going when things fail silently. Just add at the end: -spark.sparkContext.setLogLevel("DEBUG")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-12 06:20:47
-
-

*Thread Reply:* Yeah I checked on debug

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-12 06:20:50
-
-

*Thread Reply:* There are no errors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-12 06:21:10
-
-

*Thread Reply:* Just that there is no environment-properties in the event that is being emitted

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-12 07:31:01
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, what spark version is that? i see you delta version is pretty old. Anyway, the observation is weird and don't know how come delta interferes with environment facet builder. These are so disjoint features. Are you sure you create a new session (there is getOrCreate) ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Glen M - (glen_m@apple.com) -
-
2023-07-12 19:29:06
-
-

*Thread Reply:* @Paweł Leszczyński its because of this line : https://github.com/OpenLineage/OpenLineage/blob/0.29.2/integration/spark/app/src/m[…]nlineage/spark/agent/lifecycle/InternalEventHandlerFactory.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Glen M - (glen_m@apple.com) -
-
2023-07-12 19:32:44
-
-

*Thread Reply:* Assuming this is https://learn.microsoft.com/en-us/azure/databricks/delta/ ... delta .. which is azure datbricks. @Anirudh Shrinivason

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-12 22:58:13
-
-

*Thread Reply:* Hmm I wasn't using databricks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-12 22:59:12
-
-

*Thread Reply:* @Paweł Leszczyński I'm using spark 3.1 btw

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-13 08:05:49
-
-

*Thread Reply:* @Anirudh Shrinivason This should resolve the issue https://github.com/OpenLineage/OpenLineage/pull/1973

-
- - - - - - - -
-
Labels
- documentation, integration/spark -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-13 08:06:11
-
-

*Thread Reply:* PR description contains info on how come the observed behaviour was possible

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-13 08:07:47
-
-

*Thread Reply:* As always, thank you @Anirudh Shrinivason for providing clear information on how to reproduce the issue 🚀 :medal: 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-13 09:52:29
-
-

*Thread Reply:* Ohh that is really great! Thankss so much for the help! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-12 13:50:51
-
-

@channel -A friendly reminder: this month’s TSC meeting — open to all — is tomorrow at 8 am PT. https://openlineage.slack.com/archives/C01CK9T7HKR/p1688665004736219

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Dongjin Seo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-07-12 14:54:29
-
-

Hi Team -How are you ? -Is there any chance to use airflow to run queries against Access file? -Sorry to bother with a question that is not directly related to openlineage ... but I am kind of stuck

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-07-12 15:22:52
-
-

*Thread Reply:* what do you mean by Access file?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-07-12 16:09:03
-
-

*Thread Reply:* ... accdb file, Microsoft Access File: I am in a reverse engineering projet facing a spaghetti style development and would have loved to use, airflow and openlineage as a magic wand, to help me in this damn work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-07-12 21:44:21
-
-

*Thread Reply:* oof.. I’d look into https://airflow.apache.org/docs/apache-airflow-providers-odbc/4.0.0/ -but I really have no clue..

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
thebruuu - (bruno.c@inwind.it) -
-
2023-07-13 09:47:02
-
-

*Thread Reply:* Thank you Harel -I started from that too ... but it became foggy after the initial step

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Aaman Lamba - (aamanlamba@gmail.com) -
-
2023-07-12 16:30:41
-
-

Hi folks, having an issue ingesting the seed metadata when starting the docker container. The output shows "seed-marquez-with-metadata exited with code 0" but no information is visible in marquez What can be the issue?

- - - -
- ✅ Aaman Lamba -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-12 16:55:00
-
-

*Thread Reply:* Did you check the namespace menu in the top right for a food_delivery namespace?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-12 16:55:12
-
-

*Thread Reply:* (Hi Aaman!)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Aaman Lamba - (aamanlamba@gmail.com) -
-
2023-07-12 16:55:45
-
-

*Thread Reply:* Hi! Thank you that helped!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Aaman Lamba - (aamanlamba@gmail.com) -
-
2023-07-12 16:55:55
-
-

*Thread Reply:* I think that should be added to the quickstart guide

- - - -
- 🙌 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-12 16:56:23
-
-

*Thread Reply:* Great idea, thank you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-07-13 12:09:29
-
-

As discussed in the Monthly meeting, I have opened a PR to propose adding deletion to facets for static lineage metadata: https://github.com/OpenLineage/OpenLineage/pull/1975

-
- - - - - - - -
-
Labels
- documentation, proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-07-13 23:21:29
-
-

Hi, I'm using OL python client. -client.emit( - DatasetEvent( - _eventTime_=datetime.now().isoformat(), - _producer_=producer, - _schemaURL_="<https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/DatasetEvent>", - _dataset_=Dataset(_namespace_=namespace, _name_=f"input-file"), - ) - ) -I want to send a dataset event once files been uploaded. But I received 422 from api/v1/lineage, saying that run and job must not be null. I don't have a job or run yet. How can I solve this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-14 04:09:15
-
-

*Thread Reply:* Hi @Steven, I assume you send your Openlineage events to Marquez. 422 http code is a response from backend and Marquez is still waiting for the PR https://github.com/MarquezProject/marquez/pull/2495 to be merged and released. This PR makes Marquez understand DatasetEvents. They won't be saved in Marquez database (this is to be implemented in future), but at least one will not experience error response code.

- -

To sum up: what you do is correct. You are using a feature that is allowed on a client side but still not implemented on a backend.

-
- - - - - - - -
-
Labels
- docs, api, client/java -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- ✅ Steven -
- -
- 🥳 Steven -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-07-14 04:10:30
-
-

*Thread Reply:* Thanks!!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshit Soni - (harshit.soni@angelbroking.com) -
-
2023-07-14 08:36:23
-
-

@here Hi Team, I am trying to run a spark application with openLineage -Spark :- 3.3.3 -Openlineage :- 0.29.2 -I am getting below error can you please me, what I could be doing wrong.

- -

``` spark = (SparkSession - .builder - .config('spark.port.maxRetries', 100) - .appName(app_name) - .config("spark.openlineage.url","http://localhost/api/v1/namespaces/spark_integration/") - .config("spark.extraListeners","io.openlineage.spark.agent.OpenLineageSparkListener") - .getOrCreate())

- -

23/07/14 18:04:01 ERROR Utils: uncaught error in thread spark-listener-group-shared, stopping SparkContext -java.lang.UnsatisfiedLinkError: /private/var/folders/z6/pl8p30z11v50zf6pv51p259m0000gp/T/native-lib4983292552717270883/libopenlineagesqljava.dylib: dlopen(/private/var/folders/z6/pl8p30z11v50zf6pv51p259m0000gp/T/native-lib4983292552717270883/libopenlineagesqljava.dylib, 0x0001): tried: '/private/var/folders/z6/pl8p30z11v50zf6pv51p259m0000gp/T/native-lib4983292552717270883/libopenlineagesqljava.dylib' (mach-o file, but is an incompatible architecture (have 'x8664', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/private/var/folders/z6/pl8p30z11v50zf6pv51p259m0000gp/T/native-lib4983292552717270883/libopenlineagesqljava.dylib' (no such file), '/private/var/folders/z6/pl8p30z11v50zf6pv51p259m0000gp/T/native-lib4983292552717270883/libopenlineagesqljava.dylib' (mach-o file, but is an incompatible architecture (have 'x8664', need 'arm64')) - at java.lang.ClassLoader$NativeLibrary.load(Native Method)```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-18 02:35:18
-
-

*Thread Reply:* Hi @Harshit Soni, where are you deploying your spark? locally or not? is it on mac? Calling @Maciej Obuchowski to help with ibopenlineage_sql_java architecture compilation issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshit Soni - (harshit.soni@angelbroking.com) -
-
2023-07-18 02:38:03
-
-

*Thread Reply:* Currently, was testing on local.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harshit Soni - (harshit.soni@angelbroking.com) -
-
2023-07-18 02:39:43
-
-

*Thread Reply:* We have created a centralised utility for all data ingestion needs and want to see how lineage is created for same using Openlineage.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-18 05:16:55
-
-

*Thread Reply:* 👀

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-14 13:00:29
-
-

@channel -If you missed this month’s TSC meeting, the recording is now available on our YouTube channel: https://youtu.be/2vD6-Uwr7ZE. -A clip of Alexandre Bergere’s DataGalaxy integration demo is also available: https://youtu.be/l_HbEtpXphY.

-
-
YouTube
- -
- - - } - - OpenLineage Project - (https://www.youtube.com/@openlineageproject6897) -
- - - - - - - - - - - - - - - - - -
-
-
YouTube
- -
- - - } - - OpenLineage Project - (https://www.youtube.com/@openlineageproject6897) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Kiran Hiremath, alexandre bergere, Harel Shein, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Robin Fehr - (robin.fehr@acosom.com) -
-
2023-07-16 17:39:26
-
-

Hey guys - trying to get a grip on the ecosystem regarding flink lineage 🙂 as far as my research has revealed, the openlineage project is the only one that supports flink lineage with an out of the box library that can be integrated in jobs. at least as far as i've seen the for other toolings such as datahub we'd have to write our custom hooks that implement their api. as for my question - is my current assumption correct that an integration into the openlineage project of for example datahub/openmetadata would also require support from datahub/openmetadata itself so that they can work with the openlineage spec? or would it somewhat work to write a mapper in between to support their spec? (more of an architectural decision i assume but would be interested in knowing what the openlinage's approach is regarding that)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-17 08:13:49
-
-

*Thread Reply:* > or would it somewhat work to write a mapper in between to support their spec? -I think yeah - maybe https://github.com/Natural-Intelligence/openLineage-openMetadata-transporter would work out of the box if I understand correctly?

-
- - - - - - - -
-
Website
- <https://www.top10.com> -
- -
-
Language
- Java -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-07-17 08:38:59
-
-

*Thread Reply:* Tagging @Natalie Zeller in case you want to collaborate

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Natalie Zeller - (natalie.zeller@naturalint.com) -
-
2023-07-17 08:47:34
-
-

*Thread Reply:* Hi, -We've implemented a transporter that transmits lineage from OpenLineage to OpenMetadata, you can find the github project here. -I've also published a blog post that explains this integration and how to use it. -I'll be happy to help if you have any question

-
- - - - - - - -
-
Website
- <https://www.top10.com> -
- -
-
Language
- Java -
- - - - - - - - -
- - - -
- 🙌 Robin Fehr -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Robin Fehr - (robin.fehr@acosom.com) -
-
2023-07-17 09:49:30
-
-

*Thread Reply:* very cool! thanks a lot for responding so quickly

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-17 18:23:53
-
-

🚀 We recently hit the 1000-member mark on here! Thank you for joining the movement to establish an open standard for data lineage across the data ecosystem! Tell your friends 🙂! -💯💯💯💯💯💯💯💯💯💯 -https://bit.ly/lineageslack

- - - -
- 🎉 Juan Manuel Cappi, Harel Shein, Paweł Leszczyński, Maciej Obuchowski, Willy Lulciuc, Viraj Parekh -
- -
- 💯 Harel Shein, Anirudh Shrinivason, Paweł Leszczyński, Maciej Obuchowski, Willy Lulciuc, Robin Fehr, Viraj Parekh, Ernie Ostic -
- -
- 👏 thebruuu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-18 04:58:14
-
-

Btw, just curious what exactly does the runId correspond to in the OL spark integration? Is it possible to obtain the spark application id from the event too?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-18 05:10:31
-
-

*Thread Reply:* runId is an UUID assigned per spark action (compute trigger within a spark job). A single spark script can result in multiple runs then

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-18 05:13:17
-
-

*Thread Reply:* adding an extra facet with applicationId looks like a good idea to me: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/SparkContext.html#applicationId:String

-
-
spark.apache.org
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-18 23:06:01
-
-

*Thread Reply:* Got it thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-18 09:47:47
-
-

Hi, I have an usecase to integrate queries run in Jupyter notebook using pandas integrate with OpenLineage to get the Lineage in Marquez. Did anyone implemented this before? please let me know. Thanks

- - - -
- 🤩 thebruuu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-20 06:48:54
-
-

*Thread Reply:* I think we don't have pandas support so far. So, if one uses pandas to read local files on disk, then perhaps Openlineage (OL) has little sense to do. There is an old pandas issues in our backlog (over 2 years old) -> https://github.com/OpenLineage/OpenLineage/issues/108

- -

Surely one can use use python OL client to create manully events and send them to MQZ, which may be less convenient (https://github.com/OpenLineage/OpenLineage/tree/main/client/python)

- -

Anyway, we would like to know what's you usecase? this would be super helpful in understanding why OL & pandas integration may be useful.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-20 06:52:32
-
-

*Thread Reply:* Thanks Pawel for responding

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-19 02:57:57
-
-

Hi guys, when can we expect the next Openlineage release? Excited for MergeIntoCommand column lineage feature!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-19 03:40:20
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, I am still working on that. It's kind of complex because I want to refactor column level lineage so that it can work with multiple Spark versions and multiple delta jars as merge into implementation for delta differs for different delta releases. I thought it's ready, but this needs some extra work to be done in next days. I am excited about that too!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-19 03:54:37
-
-

*Thread Reply:* Ahh I see... Got it! Is there a tentative timeline for when we can expect this? So sorry haha don't mean to rush you. Just curious to know thats all! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-19 22:06:10
-
-

*Thread Reply:* Can we author a release sometime soon? Would like to use the CustomEnvironmentFacetBuilder for delta catalog!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-20 05:28:43
-
-

*Thread Reply:* we're pretty close i think with merge into delta which is under review. waiting for it would be nice. anyway, we're 3 weeks after the last release.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-20 06:50:56
-
-

*Thread Reply:* @Anirudh Shrinivason releases are available basically on-demand using our process in GOVERNANCE.md. I recommend watching 1958 and then making a request in #general once it’s been merged. But, as Paweł suggested, we have a scheduled release coming soon, anyway. Thanks for your interest in the fix!

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-20 11:01:14
-
-

*Thread Reply:* Ahh I see. Got it. Thanks! @Michael Robinson @Paweł Leszczyński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-21 03:12:22
-
-

*Thread Reply:* @Anirudh Shrinivason it's merged -> https://github.com/OpenLineage/OpenLineage/pull/1958

-
- - - - - - - -
-
Labels
- documentation, integration/spark -
- -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-21 04:19:15
-
-

*Thread Reply:* Awesome thanks so much! @Paweł Leszczyński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-19 06:59:31
-
-

Hi there, related to my question a few days ago about usage of time travel in iceberg, currently only the alias used (i.e. tag, branch) is captured as part of the dataset identifier for lineage. If the tag is removed, or even worse, if it’s removed and re-created with the same name pointing to a difference snapshotid, the lineage will be capturing an inaccurate history. So, ideally, we’d like to capture the actual snapshotid behind the named reference, as part of the lineage. Anyone else thinking this is a reasonable scenario? => more in 🧵

- - - -
- 👀 Paweł Leszczyński, Dongjin Seo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-19 07:14:54
-
-

*Thread Reply:* One hacky approach would be to update the current dataset identifier to include the snapshot_id, so, for schema.table.tag we would have something like schema.table.tag-snapshot_id. The benefit is that it’s explicit and it doesn’t require a change in the OL schemas. The obvious downside (though not that serious in my opinion) is that impacts readability. Not sure though if there are other non-obvious side-effects.

- -

Another alternative would be to add a dedicated property. For instance, the job > latestRun schema, the input/output dataset version objects could look like this: -"inputDatasetVersions": [ - { - "datasetVersionId": { - "namespace": "<s3a://warehouse>", - "name": "schema.table.tag", - "snapshot_id": "7056736771450556218", - "version": "1c634e18-e357-347b-b758-4337ac352d6d" - }, - "facets": {} - } -] -And column lineage could look like: -```"columnLineage": [ - { - "name": "somefield", - "inputFields": [ - { - "namespace": "s3a:warehouse", - "dataset": "schema.table.tag", - "snapshotid": "7056736771450556218", - "field": "some_field", - ... - }, - ...

- -
  ],
-
- -

...```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-19 08:33:43
-
-

*Thread Reply:* @Paweł Leszczyński what do you think?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-19 08:38:16
-
-

*Thread Reply:* 1. How does snapshotId differ from version? Could one make OL version property to be a string concat of iceberg-snapshot-id.iceberg-version

- -
  1. I don't think it's necessary (or don't understand why) to add snapshot-id within column-linegea. Each entry within inputFields of columnLineage is already available within inputs of the OL event related to this run.
  2. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Manuel Cappi - (juancappi@gmail.com) -
-
2023-07-19 18:43:31
-
-

*Thread Reply:* Yes, I think follow the idea. The problem with that is the version is tied to the dataset name, i.e. my_namespace.table_A.tag_v1 which stays the same for the source dataset, which is the one being used with time travel. -Suppose the following sequence: -step 1 => -tableA.tagv1 has snapshot id 123-abc -run job: table_A.tag_v1 -> job x -> table_B -the inputDatasetVersions > datasetVersionId > version for table_B points to an object which represents table_A.tag_v1 with snapshot id 123-abc correctly captured within facets > version > datasetVersion

- -

step 2 => -delete tag_v1, insert some data, create tag_v1 again -now table_A.tag_v1 has snapshot id 456-def -run job again: table_A.tag_v1 -> job x -> table_B -the inputDatasetVersions > datasetVersionId > version for table_B points to the same object which represents table_A.tag_v1 only now snapshot id has been replaced by 456-def within facets > version > datasetVersion which means I don’t have way to know which was the snapshot id used in the step 1

- -

The “hack” I mentioned above though seems to solve the issue, since a new dataset is captured for each combination, so no information is overwritten/lost, i.e., the datasets referenced in inputDatasetVersions are now named: -table_A.tag_v1-123-abc -table_A.tag_v1-456-def

- -

As a side effect, the column lineage also gets “fixed”, since the lineage for the step 1 and step 2 job runs, without the “hack” both referenced table_A.tag_v1 as the source of input field, though in each run the snapshot id was different. With the hack, one run references table_A.tag_v1-123-abc and the other one table_A.tag_v1-456-def

- -

Hope it makes sense. If it helps, I can put together a few json files with the examples I’ve been using to experiment

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-20 06:35:22
-
-

*Thread Reply:* So, my understanding of the problem is that icberg version is not unique. So, if you have version 3, revert to version 2, and then write something again, one ends up again with version 3.

- -

I would not like to mess with dataset names because on the backend sides like Marquez, dataset names being the same in different jobs and runs allow creating lineage graph. If dataset names are different, then there is no way to build lineage graph across multiple jobs.

- -

Adding snapshot_id to datasetVersion is one option to go. My concern here is that this is so iceberg specific while we're aiming to have a general solution to dataset versioning.

- -

Some other options are: send concat of version+snapshotId as a version or send only snapshot_id as a version. The second ain't that bad as actually snapashotId is something we're aiming to get as a version, isn't it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-21 04:21:26
-
-

Hi guys, I’d like to open a vote to release the next OpenLineage version! We'd really like to use the fixed CustomEnvironmentFacetBuilder for delta catalogs, and column lineage for Merge Into command in the spark integration! Thanks! 🙂

- - - -
- ➕ Jakub Dardziński, Willy Lulciuc, Michael Robinson, Maciej Obuchowski, Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-21 13:09:39
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated within two business days per our policy here.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-25 13:44:47
-
-

*Thread Reply:* @Anirudh Shrinivason and others waiting on this release: the release process isn’t working as expected due to security improvements recently made to the website, ironically enough, which is the source for the spec. But we’re working on a fix and hope to complete the release soon.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-25 15:19:49
-
-

*Thread Reply:* @Anirudh Shrinivason the release (0.30.1) is out now. Thanks for your patience 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-07-25 23:21:14
-
-

*Thread Reply:* Hi @Michael Robinson Thanks a lot!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-26 08:52:24
-
-

*Thread Reply:* 👍

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 06:38:16
-
-

Hi, I am running a job in Marquez with 180 rows of metadata but it is running for more than an hour. Is there a way to check the log on Marquez? Below is the screenshot of the job:

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-21 08:10:58
-
-

*Thread Reply:* > I am running a job in Marquez with 180 rows of metadata -Do you mean that you have +100 rows of metadata in the jobs table for Marquez? Or that the job never finishes?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-21 08:11:47
-
-

*Thread Reply:* Also, yes, we have an even viewer that allows you to query the raw OL events

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-21 08:12:19
-
-

*Thread Reply:* If you post a sample of your events, it’d be helpful to troubleshoot your issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 08:53:25
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 08:53:31
-
-

*Thread Reply:* Sure Willy thanks for your response. The job is still running. This is the code I am running from jupyter notebook using Python client:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 08:54:33
-
-

*Thread Reply:* as you can see my input and output datasets are just 1 row

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 08:55:02
-
-

*Thread Reply:* included column lineage but job keeps running so I don't know if it is working

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 06:38:49
-
-

Please ignore 'UPDATED AT' timestamp

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 07:56:48
-
-

@Paweł Leszczyński there is lot of interest in our organisation to implement Openlineage in several project and we might take the spark route so on that note a small question: Does open lineage works from extracting data from the Catalyst optimiser's Physical/Logical plans etc?

- - - -
- 👍 Paweł Leszczyński -
- -
- ❤️ Willy Lulciuc, Paweł Leszczyński, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-21 08:20:33
-
-

*Thread Reply:* spark integration is based on extracting lineage from optimized plans

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-21 08:25:35
-
-

*Thread Reply:* https://youtu.be/rO3BPqUtWrI?t=1326 i recommend whole presentation but in case you're just interested in Spark integration, there few mins that explain how this is achieved (link points to 22:06 min of video)

-
-
YouTube
- -
- - - } - - Databricks - (https://www.youtube.com/@Databricks) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-07-21 08:43:47
-
-

*Thread Reply:* Thanks Pawel for sharing. I will take a look. Have a nice weekend.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jens Pfau - (jenspfau@google.com) -
-
2023-07-21 08:22:51
-
-

Hello everyone!

- - - -
- 👋 Jakub Dardziński, Maciej Obuchowski, Willy Lulciuc, Michael Robinson, Harel Shein, Ross Turk, Robin Fehr, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-21 09:57:51
-
-

*Thread Reply:* Welcome, @Jens Pfau!

- - - -
- 😀 Jens Pfau -
- -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-23 08:36:38
-
-

hello everyone! I am trying to follow your guide -https://openlineage.io/docs/integrations/spark/quickstart_local -and when i execute -spark.createDataFrame([ - {'a': 1, 'b': 2}, - {'a': 3, 'b': 4} -]).write.mode("overwrite").saveAsTable("temp1")

- -

i not getting the expected result

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-23 08:37:55
-
-

``23/07/23 12:35:20 INFO OpenLineageRunEventBuilder: Visiting query plan Optional[== Parsed Logical Plan == -'CreateTabletemp1`, Overwrite -+- LogicalRDD [a#6L, b#7L], false

- -

== Analyzed Logical Plan ==

- -

CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Optimized Logical Plan == -CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Physical Plan == -Execute CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- **(1) Scan ExistingRDD[a#6L,b#7L] -] with input dataset builders [<function1>, <function1>, <function1>, <function1>, <function1>] -23/07/23 12:35:20 INFO OpenLineageRunEventBuilder: Visiting query plan Optional[== Parsed Logical Plan == -'CreateTable temp1, Overwrite -+- LogicalRDD [a#6L, b#7L], false

- -

== Analyzed Logical Plan ==

- -

CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Optimized Logical Plan == -CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Physical Plan == -Execute CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- **(1) Scan ExistingRDD[a#6L,b#7L] -] with output dataset builders [<function1>, <function1>, <function1>, <function1>, <function1>, <function1>, <function1>] -23/07/23 12:35:20 INFO CreateDataSourceTableAsSelectCommandVisitor: Matched io.openlineage.spark.agent.lifecycle.plan.CreateDataSourceTableAsSelectCommandVisitor<org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand,io.openlineage.client.OpenLineage$OutputDataset> to logical plan CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

23/07/23 12:35:20 INFO CreateDataSourceTableAsSelectCommandVisitor: Matched io.openlineage.spark.agent.lifecycle.plan.CreateDataSourceTableAsSelectCommandVisitor<org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand,io.openlineage.client.OpenLineage$OutputDataset> to logical plan CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

23/07/23 12:35:20 ERROR EventEmitter: Could not emit lineage w/ exception -io.openlineage.client.OpenLineageClientException: io.openlineage.spark.shaded.org.apache.http.client.ClientProtocolException - at io.openlineage.client.transports.HttpTransport.emit(HttpTransport.java:105) - at io.openlineage.client.OpenLineageClient.emit(OpenLineageClient.java:34) - at io.openlineage.spark.agent.EventEmitter.emit(EventEmitter.java:71) - at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:77) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:99) - at java.base/java.util.Optional.ifPresent(Optional.java:183) - at io.openlineage.spark.agent.OpenLineageSparkListener.sparkSQLExecStart(OpenLineageSparkListener.java:99) - at io.openlineage.spark.agent.OpenLineageSparkListener.onOtherEvent(OpenLineageSparkListener.java:90) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1381) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) -Caused by: io.openlineage.spark.shaded.org.apache.http.client.ClientProtocolException - at io.openlineage.spark.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:187) - at io.openlineage.spark.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) - at io.openlineage.spark.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) - at io.openlineage.client.transports.HttpTransport.emit(HttpTransport.java:100) - ... 21 more -Caused by: io.openlineage.spark.shaded.org.apache.http.ProtocolException: Target host is not specified - at io.openlineage.spark.shaded.org.apache.http.impl.conn.DefaultRoutePlanner.determineRoute(DefaultRoutePlanner.java:71) - at io.openlineage.spark.shaded.org.apache.http.impl.client.InternalHttpClient.determineRoute(InternalHttpClient.java:125) - at io.openlineage.spark.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) - ... 24 more -23/07/23 12:35:20 INFO ParquetFileFormat: Using default output committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter -23/07/23 12:35:20 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 -23/07/23 12:35:20 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false -23/07/23 12:35:20 INFO SQLHadoopMapReduceCommitProtocol: Using user defined output committer class org.apache.parquet.hadoop.ParquetOutputCommitter -23/07/23 12:35:20 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 -23/07/23 12:35:20 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false -23/07/23 12:35:20 INFO SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.parquet.hadoop.ParquetOutputCommitter -23/07/23 12:35:20 INFO CodeGenerator: Code generated in 120.989125 ms -23/07/23 12:35:21 INFO SparkContext: Starting job: saveAsTable at NativeMethodAccessorImpl.java:0 -23/07/23 12:35:21 INFO DAGScheduler: Got job 0 (saveAsTable at NativeMethodAccessorImpl.java:0) with 1 output partitions -23/07/23 12:35:21 INFO DAGScheduler: Final stage: ResultStage 0 (saveAsTable at NativeMethodAccessorImpl.java:0) -23/07/23 12:35:21 INFO DAGScheduler: Parents of final stage: List() -23/07/23 12:35:21 INFO DAGScheduler: Missing parents: List() -23/07/23 12:35:21 INFO OpenLineageRunEventBuilder: Visiting query plan Optional[== Parsed Logical Plan == -'CreateTable temp1, Overwrite -+- LogicalRDD [a#6L, b#7L], false

- -

== Analyzed Logical Plan ==

- -

CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Optimized Logical Plan == -CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Physical Plan == -Execute CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- **(1) Scan ExistingRDD[a#6L,b#7L] -] with input dataset builders [<function1>, <function1>, <function1>, <function1>, <function1>] -23/07/23 12:35:21 INFO OpenLineageRunEventBuilder: Visiting query plan Optional[== Parsed Logical Plan == -'CreateTable temp1, Overwrite -+- LogicalRDD [a#6L, b#7L], false

- -

== Analyzed Logical Plan ==

- -

CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Optimized Logical Plan == -CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

== Physical Plan == -Execute CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- **(1) Scan ExistingRDD[a#6L,b#7L] -] with output dataset builders [<function1>, <function1>, <function1>, <function1>, <function1>, <function1>, <function1>] -23/07/23 12:35:21 INFO CreateDataSourceTableAsSelectCommandVisitor: Matched io.openlineage.spark.agent.lifecycle.plan.CreateDataSourceTableAsSelectCommandVisitor<org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand,io.openlineage.client.OpenLineage$OutputDataset> to logical plan CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

23/07/23 12:35:21 INFO CreateDataSourceTableAsSelectCommandVisitor: Matched io.openlineage.spark.agent.lifecycle.plan.CreateDataSourceTableAsSelectCommandVisitor<org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand,io.openlineage.client.OpenLineage$OutputDataset> to logical plan CreateDataSourceTableAsSelectCommand temp1, Overwrite, [a, b] -+- LogicalRDD [a#6L, b#7L], false

- -

23/07/23 12:35:21 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[10] at saveAsTable at NativeMethodAccessorImpl.java:0), which has no missing parents -23/07/23 12:35:21 ERROR EventEmitter: Could not emit lineage w/ exception -io.openlineage.client.OpenLineageClientException: io.openlineage.spark.shaded.org.apache.http.client.ClientProtocolException - at io.openlineage.client.transports.HttpTransport.emit(HttpTransport.java:105) - at io.openlineage.client.OpenLineageClient.emit(OpenLineageClient.java:34) - at io.openlineage.spark.agent.EventEmitter.emit(EventEmitter.java:71) - at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:174) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$onJobStart$9(OpenLineageSparkListener.java:153) - at java.base/java.util.Optional.ifPresent(Optional.java:183) - at io.openlineage.spark.agent.OpenLineageSparkListener.onJobStart(OpenLineageSparkListener.java:149) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1381) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) -Caused by: io.openlineage.spark.shaded.org.apache.http.client.ClientProtocolException - at io.openlineage.spark.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:187) - at io.openlineage.spark.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) - at io.openlineage.spark.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) - at io.openlineage.client.transports.HttpTransport.emit(HttpTransport.java:100) - ... 20 more -Caused by: io.openlineage.spark.shaded.org.apache.http.ProtocolException: Target host is not specified - at io.openlineage.spark.shaded.org.apache.http.impl.conn.DefaultRoutePlanner.determineRoute(```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-23 08:38:46
-
-

23/07/23 12:35:20 ERROR EventEmitter: Could not emit lineage w/ exception -io.openlineage.client.OpenLineageClientException: io.openlineage.spark.shaded.org.apache.http.client.ClientProtocolException - at io.openlineage.client.transports.HttpTransport.emit(HttpTransport.java:105) - at io.openlineage.client.OpenLineageClient.emit(OpenLineageClient.java:34) - at io.openlineage.spark.agent.EventEmitter.emit(EventEmitter.java:71) - at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:77) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:99)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-23 13:31:53
-
-

*Thread Reply:* That looks like your URL provided to OpenLineage is missing http:// or https:// in the front

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-23 14:54:55
-
-

*Thread Reply:* sorry how can i resolve this ? do i need to add this ? i just follow the guide step by step . You dont mention anywhere to add anything. You provide smth that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-23 14:55:05
-
-

*Thread Reply:* really does not work out of the box

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-23 14:55:13
-
-

*Thread Reply:* anbd this is supposed to be a demo

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-07-23 17:07:49
-
-

*Thread Reply:* bumping e.g. to io.openlineage:openlineage_spark:0.29.2 seems to be fixing the issue

- -

not sure why it stopped working for 0.12.0 but we’ll take a look and fix accordingly

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-24 04:51:34
-
-

*Thread Reply:* ...probably by bumping the version on this page 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-24 05:00:28
-
-

*Thread Reply:* thank you both for coming back to me , I bumped to 0.29 and i think that it now runs.Is this the expected output ? -23/07/24 08:43:55 INFO ConsoleTransport: {"eventTime":"2023_07_24T08:43:55.941Z","producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/integration/spark>","schemaURL":"<https://openlineage.io/spec/2-0-0/OpenLineage.json#/$defs/RunEvent>","eventType":"COMPLETE","run":{"runId":"186c06c0_e79c_43cf_8bb7_08e1ab4c86a5","facets":{"spark.logicalPlan":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/integration/spark>","_schemaURL":"<https://openlineage.io/spec/2-0-0/OpenLineage.json#/$defs/RunFacet>","plan":[{"class":"org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand","num-children":1,"table":{"product-class":"org.apache.spark.sql.catalyst.catalog.CatalogTable","identifier":{"product-class":"org.apache.spark.sql.catalyst.TableIdentifier","table":"temp2","database":"default"},"tableType":{"product-class":"org.apache.spark.sql.catalyst.catalog.CatalogTableType","name":"MANAGED"},"storage":{"product_class":"org.apache.spark.sql.catalyst.catalog.CatalogStorageFormat","compressed":false,"properties":null},"schema":{"type":"struct","fields":[]},"provider":"parquet","partitionColumnNames":[],"owner":"","createTime":1690188235517,"lastAccessTime":-1,"createVersion":"","properties":null,"unsupportedFeatures":[],"tracksPartitionsInCatalog":false,"schemaPreservesCase":true,"ignoredProperties":null},"mode":null,"query":0,"outputColumnNames":"[a, b]"},{"class":"org.apache.spark.sql.execution.LogicalRDD","num_children":0,"output":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"a","dataType":"long","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":12,"jvmId":"173725f4_02c4_4174_9d18_3a61aa311d62"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num_children":0,"name":"b","dataType":"long","nullable":true,"metadata":{},"exprId":{"product_class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":13,"jvmId":"173725f4-02c4-4174-9d18-3a61aa311d62"},"qualifier":[]}]],"rdd":null,"outputPartitioning":{"product_class":"org.apache.spark.sql.catalyst.plans.physical.UnknownPartitioning","numPartitions":0},"outputOrdering":[],"isStreaming":false,"session":null}]},"spark_version":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/integration/spark>","_schemaURL":"<https://openlineage.io/spec/2-0-0/OpenLineage.json#/$defs/RunFacet>","spark-version":"3.1.2","openlineage_spark_version":"0.29.2"}}},"job":{"namespace":"default","name":"sample_spark.execute_create_data_source_table_as_select_command","facets":{}},"inputs":[],"outputs":[{"namespace":"file","name":"/home/jovyan/spark-warehouse/temp2","facets":{"dataSource":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/integration/spark>","_schemaURL":"<https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet>","name":"file","uri":"file"},"schema":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/integration/spark>","_schemaURL":"<https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet>","fields":[{"name":"a","type":"long"},{"name":"b","type":"long"}]},"symlinks":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/integration/spark>","_schemaURL":"<https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet>","identifiers":[{"namespace":"/home/jovyan/spark-warehouse","name":"default.temp2","type":"TABLE"}]},"lifecycleStateChange":{"_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/integration/spark>","_schemaURL":"<https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet>","lifecycleStateChange":"CREATE"}},"outputFacets":{}}]} -? Also i then proceeded to run -docker run --network spark_default -p 3000:3000 -e MARQUEZ_HOST=marquez-api -e MARQUEZ_PORT=5000 --link marquez-api:marquez-api marquezproject/marquez-web:0.19.1 -but the page is empty

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-24 11:11:08
-
-

*Thread Reply:* You'd need to set up spark.openlineage.transport.url to send OpenLineage events to Marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-24 11:12:28
-
-

*Thread Reply:* where n how can i do this ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-24 11:13:04
-
-

*Thread Reply:* do i need to edit the conf ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-24 11:37:09
-
-

*Thread Reply:* yes, in the spark conf

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-24 11:37:48
-
-

*Thread Reply:* what this url should be ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-24 11:37:51
-
-

*Thread Reply:* http://localhost:3000/ ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-24 11:43:30
-
-

*Thread Reply:* That depends how you ran Marquez, but looking at your screenshot UI is at 3000, I guess API would be at 5000

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-24 11:43:46
-
-

*Thread Reply:* as that's default in Marquez docker-compose

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-24 11:44:14
-
-

*Thread Reply:* i cannot see spark conf

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-24 11:44:23
-
-

*Thread Reply:* is it in there or do i need to create it ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-24 16:42:53
-
-

*Thread Reply:* Is something like -```from pyspark.sql import SparkSession

- -

spark = (SparkSession.builder.master('local') - .appName('samplespark') - .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') - .config('spark.jars.packages', 'io.openlineage:openlineagespark:0.29.2') - .config('spark.openlineage.transport.url', 'http://marquez:5000') - .config('spark.openlineage.transport.type', 'http') - .getOrCreate())``` -not working?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-25 05:08:08
-
-

*Thread Reply:* OK when i use the snippet you provided and then execute -docker run --network sparkdefault -p 3000:3000 -e MARQUEZHOST=marquez-api -e MARQUEZ_PORT=5000 --link marquez-api:marquez-api marquezproject/marquez-web:0.19.1

- -

I can now see this

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-25 05:08:52
-
-

*Thread Reply:* but when i click on the job i then get this

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-25 05:09:05
-
-

*Thread Reply:* so i cannot see any details of the job

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sarwat Fatima - (sarwatfatimam@gmail.com) -
-
2023-09-05 05:54:50
-
-

*Thread Reply:* @George Polychronopoulos Hi, I am facing the same issue. After adding spark conf and using the docker run command, marquez is still showing empty. Do I need to change something in the run command?

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 05:55:15
-
-

*Thread Reply:* yes i will tell you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sarwat Fatima - (sarwatfatimam@gmail.com) -
-
2023-09-05 07:36:41
-
-

*Thread Reply:* For the docker command that I used, I updated the marquez-web version to 0.40.0 and I also updated the Marquez_host which I am not sure if I have to or not. The UI is running but not showing anything docker run --network spark_default -p 3000:3000 -e MARQUEZ_HOST=localhost -e MARQUEZ_PORT=5000 --link marquez-api:marquez-api marquez/marquez-web:0.40.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:36:52
-
-

*Thread Reply:* is because you are running this command right

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:36:55
-
-

*Thread Reply:* yes thats it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:36:58
-
-

*Thread Reply:* you need 0.40

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:37:03
-
-

*Thread Reply:* and there is a lot of stuff

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:37:07
-
-

*Thread Reply:* you need rto chwange

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:37:10
-
-

*Thread Reply:* in the Docker

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:37:24
-
-

*Thread Reply:* so the spark

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:37:25
-
-

*Thread Reply:* version

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:37:27
-
-

*Thread Reply:* the python

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:38:05
-
-

*Thread Reply:* version: "3.10" -services: - notebook: - image: jupyter/pyspark-notebook:spark-3.4.1 - ports: - - "8888:8888" - volumes: - - ./docker/notebooks:/home/jovyan/notebooks - - ./build:/home/jovyan/openlineage - links: - - "api:marquez" - depends_on: - - api

- -

Marquez as an OpenLineage Client

- -

api: - image: marquezproject/marquez - containername: marquez-api - ports: - - "5000:5000" - - "5001:5001" - volumes: - - ./docker/wait-for-it.sh:/usr/src/app/wait-for-it.sh - links: - - "db:postgres" - dependson: - - db - entrypoint: [ "./wait-for-it.sh", "db:5432", "--", "./entrypoint.sh" ]

- -

db: - image: postgres:12.1 - containername: marquez-db - ports: - - "5432:5432" - environment: - - POSTGRESUSER=postgres - - POSTGRESPASSWORD=password - - MARQUEZDB=marquez - - MARQUEZUSER=marquez - - MARQUEZPASSWORD=marquez - volumes: - - ./docker/init-db.sh:/docker-entrypoint-initdb.d/init-db.sh - # Enables SQL statement logging (see: https://www.postgresql.org/docs/12/runtime-config-logging.html#GUC-LOG-STATEMENT) - # command: ["postgres", "-c", "log_statement=all"]

-
-
PostgreSQL Documentation
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:38:10
-
-

*Thread Reply:* this is hopw mine looks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:38:20
-
-

*Thread Reply:* it is all tested and letest version

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:38:31
-
-

*Thread Reply:* postgres does not work beyond 12

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:38:56
-
-

*Thread Reply:* if you run this docker-compose up

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:38:58
-
-

*Thread Reply:* the notebooks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:39:02
-
-

*Thread Reply:* are 10 faster

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:39:06
-
-

*Thread Reply:* and give no errors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:39:14
-
-

*Thread Reply:* also you need to update other stuff

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:39:18
-
-

*Thread Reply:* such as

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:39:26
-
-

*Thread Reply:* dont run what is in the docs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:39:34
-
-

*Thread Reply:* but run what is in github

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:40:13
- -
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:40:22
-
-

*Thread Reply:* run in your notebooks what is in here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:40:32
-
-

*Thread Reply:* ```from pyspark.sql import SparkSession

- -

spark = (SparkSession.builder.master('local') - .appName('samplespark') - .config('spark.jars.packages', 'io.openlineage:openlineagespark:1.1.0') - .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') - .config('spark.openlineage.transport.url', 'http://{openlineage.client.host}/api/v1/namespaces/spark_integration/') - .getOrCreate())```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:40:38
-
-

*Thread Reply:* the dont update documentation

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-09-05 07:40:44
-
-

*Thread Reply:* it took me 4 weeks to get here

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-07-23 08:39:13
-
-

is this a known error ? does anyone know how to debug this ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-07-23 23:57:43
-
-

Hi, -Using Marquez. I tried to get the dataset version through two apis. -First: -http://host/api/v1/namespaces/{namespace}/datasets/{dataset} -It will include a currentVersion in the response. -Then: -http://host/api/v1/namespaces/{namespace}/datasets/{dataset}/versions/{currentVersion} -But the version used here refers to the "version" column in table dataset_versions. Not the primary key "uuid". Which leads to 404 not found. -I checked other apis but seemed that there are no other way to get the version through "currentVersion". -Any help?

- - - -
- 👀 Maciej Obuchowski, Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-07-24 00:14:43
-
-

*Thread Reply:* Like I want to change the facets of a specific dataset.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-24 16:45:18
-
-

*Thread Reply:* @Willy Lulciuc do you have any idea? 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-07-25 05:02:47
-
-

*Thread Reply:* I solved this by adding a new job which outputs to the same dataset. This ended up in a newer dataset version.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-25 06:20:58
-
-

*Thread Reply:* @Steven great to hear that you solved the issue! but there are some minor logical inconsistencies that we’d like to address with versioning (for both datasets and jobs) in Marquez. The tl;dr is the version column wasn’t meant to be used externally, but internally within Marquez. The issue is “minor” as it’s more of a pointer thing. We’ll be addressing soon. For some background, you can look at: -• https://github.com/MarquezProject/marquez/issues/2071 -• https://github.com/MarquezProject/marquez/pull/2153

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-07-25 05:06:48
-
-

Hi, -Are there any keys to set in marquez.yaml to skip db initialization and use existing db? I am deploying the marquez client on k8s client, which uses a cloud postgres. Every time I restart the marquez deployment I have to drop all those tables otherwise it will raise table already exists ERROR

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-25 06:43:32
-
-

*Thread Reply:* @Steven ahh very good point, it’s technically not “error” in the true sense, but annoying nonetheless. I think you’re referencing the init container in the Marquez helm chart? https://github.com/MarquezProject/marquez/blob/main/chart/templates/marquez/deployment.yaml#L37

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-25 06:45:24
-
-

*Thread Reply:* hmm, actually what raises the error you’re referencing? the Maruez http server?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-07-25 06:49:08
-
-

*Thread Reply:* > Every time I restart the marquez deployment I have to drop all those tables otherwise it will raise table already exists ERROR -This shouldn’t be an error. I’m trying to understand the scenario in which this error is thrown (any info is helpful). We use flyway to manage our db schema, but you may have gotten in an odd state somehow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-07-25 12:52:51
-
-

For Databricks notebooks, does the Spark listener work without any notebook changes? (I see that Azure Databricks -> purview needs no changes, but I’m not sure if that applies to anywhere….e.g. if I have an existing databricks notebook, and I add a spark listener, can I get column-level lineage? or do I need to change my notebook to use openlineage libraries, like I do with an arbitrary Python script?)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-07-31 03:35:58
-
-

*Thread Reply:* Nope, one should modify the cluster as per doc <https://openlineage.io/docs/integrations/spark/quickstart_databricks> but no changes in notebook are required.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-08-02 10:59:00
-
-

*Thread Reply:* Right, great, that’s exactly what I was hoping 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-25 15:24:17
-
-

@channel -We released OpenLineage 0.30.1, including: -Added -• Flink: support Iceberg sinks #1960 @pawel-big-lebowski -• Spark: column-level lineage for merge into on delta tables #1958 @pawel-big-lebowski -• Spark: column-level lineage for merge into on Iceberg tables #1971 @pawel-big-lebowski -• Spark: add supprt for Iceberg REST catalog #1963 @juancappi -• Airflow: add possibility to force direct-execution based on environment variable #1934 @mobuchowski -• SQL: add support for Apple Silicon to openlineage-sql-java #1981 @davidjgoss -• Spec: add facet deletion #1975 @julienledem -• Client: add a file transport #1891 @alexandre bergere -Changed -• Airflow: do not run plugin if OpenLineage provider is installed #1999 @JDarDagran -• Python: rename config to config_class #1998 @mobuchowski -Plus test improvements, docs changes, bug fixes and more. -Thanks to all the contributors, including new contributors @davidjgoss, @alexandre bergere and @Juan Manuel Cappi! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/0.30.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.29.2...0.30.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👏 Julian Rossi, Bernat Gabor, Anirudh Shrinivason, Maciej Obuchowski, Jens Pfau, Sheeri Cabral (Collibra) -
- -
- 👍 Athitya Kumar, Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Codrut Stoicescu - (codrut.stoicescu@gmail.com) -
-
2023-07-27 11:53:09
-
-

Hello everyone! I’m part of a team trying to integrate OpenLineage and Marquez with multiple tools in our ecosystem. Integration with Spark and Iceberg was fairly easy with the listener you guys developed. We are now trying to integrate with Ray and we are having some trouble there. I was wondering if anybody has tried any work in that direction, so we can chat and exchange ideas. Thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-27 14:47:18
-
-

*Thread Reply:* This is the first I’ve heard of someone trying to do this, but others have tried getting lineage from pandas. There isn’t support for this currently, but this thread contains a link to an issue that might be helpful: https://openlineage.slack.com/archives/C01CK9T7HKR/p1689850134978429?thread_ts=1689688067.729469&cid=C01CK9T7HKR.

-
- - -
- - - } - - Paweł Leszczyński - (https://openlineage.slack.com/team/U02MK6YNAQ5) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Codrut Stoicescu - (codrut.stoicescu@gmail.com) -
-
2023-07-28 02:10:14
-
-

*Thread Reply:* Thank you for your response. We have implemented the “manual way” of emitting events with python OL client. We are now looking for a more automated way, so that updates to the scripts that run in Ray are minimal to none

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-07-28 13:03:43
-
-

*Thread Reply:* If you're actively using Ray, then you know way more about it than me, or probably any other OL contributor 🙂 -I don't know how it works or is deployed, but I would recommend checking if there's robust way of being notified in the runtime about processing occuring there.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Codrut Stoicescu - (codrut.stoicescu@gmail.com) -
-
2023-07-31 12:17:07
-
-

*Thread Reply:* Thank you for the tip. That’s the kind of details I’m looking for, but couldn’t find yet

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Tereza Trojanová - (tereza.trojanova@revolt.bi) -
-
2023-07-28 09:20:34
-
-

Hi, does anyone have experience integrating OpenLineage and Marquez with Keboola? I am new to OpenLineage and struggling with the KBC component configuration.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-28 10:53:35
-
-

*Thread Reply:* @Martin Fiser can you share any resources or pointers that might be helpful?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Martin Fiser - (fisa@keboola.com) -
-
2023-08-21 19:17:17
-
-

*Thread Reply:* Hi, apologies - vacation period has hit m. However here are the resources:

- -

API endpoint: -https://app.swaggerhub.com/apis-docs/keboola/job-queue-api/1.3.4#/Jobs/getJobOpenApiLineage|job-queue-api | 1.3.4 | keboola | SwaggerHub -Dedicated component to push data into openlineage(Marquez instance): -https://components.keboola.com/components/keboola.wr-openlineage|OpenLineage data destination | Keboola Developer Portal

- - - -
- 🙌 Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-07-31 12:32:22
-
-

Hi folks. I'm looking to find the complete spec in openapi format. For example, if I want to find the complete spec of 1.0.5 , where would I find that? I've looked here: https://openlineage.io/apidocs/openapi/ however when I download the spec, things are missing, specifically the facets. This makes it difficult to generate clients / backend interfaces from the (limited) openapi spec.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Silvia Pina - (silviampina@gmail.com) -
-
2023-08-01 05:14:58
-
-

*Thread Reply:* +1, I could also really use this!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Silvia Pina - (silviampina@gmail.com) -
-
2023-08-01 05:27:34
-
-

*Thread Reply:* Found a way: you download it as json in the above link (“Download OpenAPI specification”), then if you copy paste it to editor.swagger.io it asks f you want to convert to yaml :)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-08-01 10:25:49
-
-

*Thread Reply:* Whilst that works, it isn't complete. The issue is that the "facets" are not resolved. Exploring the website repository (https://github.com/OpenLineage/website/tree/main/static/spec) shows that facets aren't published alongside the spec, beyond 1.0.1 - which means its hard to know which revisions of the facets belong to which version of the spec.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Silvia Pina - (silviampina@gmail.com) -
-
2023-08-01 10:26:54
-
-

*Thread Reply:* Good point! Would be good if we could clarify how to get the full spec, in that case

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-08-01 10:30:57
-
-

*Thread Reply:* Granted. If the spec follows backwards compatible evolution rules, then this shouldn't be a problem, i.e., new fields must be optional, you can not remove existing fields, you can not modify existing fields, etc.

- - - -
- 🙌 Silvia Pina -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-01 12:15:22
-
-

*Thread Reply:* We don't have facets with newer version than 1.1.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-01 12:15:56
-
-

*Thread Reply:* @Damien Hawes we've moved to merge docs and website repos here: https://github.com/OpenLineage/docs

-
- - - - - - - -
-
Website
- <https://openlineage.io/docs> -
- -
-
Stars
- 5 -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-01 12:18:23
-
-

*Thread Reply:* > Would be good if we could clarify how to get the full spec, in that case -Is using https://github.com/OpenLineage/OpenLineage/tree/main/spec not enough? We have separate files with facets definition to be able to evolve them separetely from main spec

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-08-02 04:53:03
-
-

*Thread Reply:* @Maciej Obuchowski - thanks for your input. I understand the desire to want to evolve the facets independently from the main spec, yet I keep running into a mental wall.

- -

If I say, 'My application is compatible with OpenLineage 1.0.5' - what does that mean exactly? Does it mean that I am at least compatible with the base definition of RunEvent and its nested components, but not facets?

- -

That's what I'm finding difficult to wrap my head around. Right now, I can not define (for my own sake and the sake of my org) what 'OpenLineage 1.0.5' means.

- -

When I read the Marquez source code, I see that they state they implement 1.0.5, but again, it isn't clear what that completely entails.

- -

I hope I am making sense.

- - - -
- 👍 Silvia Pina -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-08-02 04:56:36
-
-

*Thread Reply:* If I approach this from a conventional software engineering standpoint, where I provide a library to my consumers. The library has a version associated with it, and that version encompasses all the objects located within that particular library. If I release a new version of my library, it implies that some form of evolution has happened. Whether it is a bug fix, a documentation change, or evolving the API of my objects it means something has changed and the new version is there to indicate that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-02 04:56:53
-
-

*Thread Reply:* Yes - it means you can read and understand base spec. Facets are completely optional - reading them might provide you additional information, but you as a event consumer need to define what you do with them. Basically, the needs can be very different between consumers, spec should not define behavior of a consumer.

- - - -
- 🙌 Silvia Pina -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-08-02 05:01:26
-
-

*Thread Reply:* OK. Thanks for the clarification. That clears things up for me.

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-07-31 16:42:48
-
-

This month’s issue of OpenLineage News was just sent out. Please to get it directly in your inbox each month!

-
-
apache.us14.list-manage.com
- - - - - - - - - - - - - - - -
- -
- - - - - - - -
- - -
- 👍 Ross Turk, Maciej Obuchowski, Shirley Lu -
- -
- 🎉 Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-01 12:35:22
-
-

Hello, I request OpenLineage release, especially for two things: -• Snowflake/HTTP/Airflow bugfix: https://github.com/OpenLineage/OpenLineage/pull/2025 -• Spec: removing refs from core: https://github.com/OpenLineage/OpenLineage/pull/1997 -Three approvals from committers will authorize release. @Michael Robinson

- - - -
- ➕ Jakub Dardziński, Harel Shein, Michael Robinson, George Polychronopoulos, Willy Lulciuc, Shirley Lu -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-01 13:26:30
-
-

*Thread Reply:* Thanks, @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-01 15:43:00
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated within two business days.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-01 16:42:32
-
-

@channel -We released OpenLineage 1.0.0, featuring static lineage capability! -Added: -• Airflow: convert lineage from legacy File definition #2006 @Maciej Obuchowski -Removed: -• Spec: remove facet ref from core #1997 @JDarDagran -Changed -• Airflow: change log level to DEBUG when extractor isn’t found #2012 @kaxil -• Airflow: make sure we cannot fail in thread despite direct execution #2010 @Maciej Obuchowski -Plus test improvements, docs changes, bug fixes and more. -*See prior releases for additional changes related to static lineage. -Thanks to all the contributors, including new contributors @kaxil and @Mars Lan! -*Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.0.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/0.30.1...1.0.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🙌 Julian LaNeve, Bernat Gabor, Maciej Obuchowski, Peter Hicks, Ross Turk, Harel Shein, Willy Lulciuc, Paweł Leszczyński, Peter Hicks -
- -
- 🥳 Julian LaNeve, alexandre bergere, Maciej Obuchowski, Peter Hicks, Juan Manuel Cappi, Ross Turk, Harel Shein, Paweł Leszczyński, Peter Hicks -
- -
- 🚀 alexandre bergere, Peter Hicks, Ross Turk, Harel Shein, Paweł Leszczyński, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-02 08:51:57
-
-

hi folks! so happy to see that static lineage is making its way through OL. one question: is the OpenAPI spec up to date? https://openlineage.io/apidocs/openapi/ IIUC, proposal 1837 says that JobEvent and DatasetEvent can be emitted independently from RunEvents now, but it's not clear how this affected the spec.

- -

I see the Python client https://pypi.org/project/openlineage-python/1.0.0/ includes these changes already, so I assume I can go ahead and use it already? (I'm also keeping tabs on https://github.com/MarquezProject/marquez/issues/2544)

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/wslulciuc">@wslulciuc</a> -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-02 10:09:33
-
-

*Thread Reply:* I think the apidocs are not up to date 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-02 10:09:43
-
-

*Thread Reply:* https://openlineage.io/spec/2-0-2/OpenLineage.json has the newest spec

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-02 10:44:23
-
-

*Thread Reply:* thanks for the pointer @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-02 10:49:17
-
-

*Thread Reply:* Also working on updating the apidocs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-02 11:21:14
-
-

*Thread Reply:* The API docs are now up to date @Juan Luis Cano Rodríguez! Thank you for raising this issue.

- - - -
- 🙌:skin_tone_3: Juan Luis Cano Rodríguez -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-02 12:58:15
-
-

@channel -If you can, please join us in San Francisco for a meetup at Astronomer on August 30th at 5:30 PM PT. -On the agenda: a presentation by special guest @John Lukenoff plus updates on the Airflow Provider, static lineage, and more. -Food will be provided, and all are welcome. -Please https://www.meetup.com/meetup-group-bnfqymxe/events/295195280/?utmmedium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|RSVP to let us know you’re coming.

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-03 03:18:08
-
-

Hey, I hope this is the right channel for this kind of question - I’m running a tests to integrate Airflow (2.4.3) with Marquez (Openlineage 0.30.1). Currently, I’m testing the postgres operator and for some reason queries like “Copy” and “Unload” are being sent as events, but doesn’t appear in the graph. Any idea how to solve it?

- -

You can see attached

- -
  1. The graph of an airflow DAG with all the tasks beside the copy and unload.
  2. The graph with the unload task that isn’t connected to the other flow.
  3. -
- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-03 05:36:04
-
-

*Thread Reply:* I think our underlying SQL parser does not hancle the Postgres versions of those queries

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-03 05:36:14
-
-

*Thread Reply:* Can you post the (anonymized?) queries?

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-03 07:03:09
-
-

*Thread Reply:* for example

- -
copy bi.marquez_test_2 from '******' iam_role '**********' delimiter as '^' gzi
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-07 13:35:30
-
-

*Thread Reply:* @Zahi Fail iam_role suggests you want redshift version of this supported, not Postgres one right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-08 04:04:35
-
-

*Thread Reply:* @Maciej Obuchowski hey, actually I tried both Postgres and Redshift to S3 operators. -Both of them sent a new event through OL to Marquez, and still wasn’t part of the entire flow.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-04 01:40:15
-
-

Hey team! 👋

- -

We were exploring open-lineage and had a couple of questions:

- -
  1. Does open-lineage support presto-sql?
  2. Do we have any docs/benchmarks on query coverage (inner joins, subqueries, etc) & source/sink coverage (spark.read from JDBC, Files etc) for spark-sql?
  3. Can someone point to the code where we currently parse the input/output facets from the spark integration (like sql queries / transformations) and if it's extendable?
  4. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-04 02:17:19
-
-

*Thread Reply:* Hey @Athitya Kumar,

- -
  1. For parsing SQL queries, we're using sqlparser-rs (https://github.com/sqlparser-rs/sqlparser-rs) which already has great coverage of sql syntax and supports different dialects. it's open source project and we already did contribute to it for snowflake dialect.
  2. We don't have such a benchmark, but if you like, you could contribute and help us providing such. We do support joins, subqueries, iceberg and delta tables, jdbc for Spark and much more. Everything we do support, is covered in our tests.
  3. Not sure if got it properly. Marquez is our reference backend implementation which parses all the facets and stores them in relational db in a relational manner (facets, jobs, datasets and runs in separate tables).
  4. -
-
- - - - - - - -
-
Stars
- 1956 -
- -
-
Language
- Rust -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-04 02:29:53
-
-

*Thread Reply:* For (3), I was referring to where we call the sqlparser-rs in our spark-openlineage event listener / integration; and how customising/improving them would look like

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-04 02:37:20
-
-

*Thread Reply:* sqlparser-rs is a rust libary and we bundle it within iface-java (https://github.com/OpenLineage/OpenLineage/blob/main/integration/sql/iface-java/src/main/java/io/openlineage/sql/SqlMeta.java). It's capable of extracting input/output datasets, column lineage information from SQL

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-04 02:40:02
-
-

*Thread Reply:* and this is Spark code that extracts it from JdbcRelation -> https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/[…]ge/spark/agent/lifecycle/plan/handlers/JdbcRelationHandler.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-04 04:08:53
-
-

*Thread Reply:* I think 3 question relates generally to Spark SQL handling, rather than handling JDBC connections inside Spark, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-04 04:24:57
-
-

*Thread Reply:* Yup, both actually. Related to getting the JDBC connection info in the input/output facet, as well as spark-sql queries we do on that JDBC connection

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-04 06:00:17
-
-

*Thread Reply:* For Spark SQL - it's translated to Spark's internal query LogicalPlan. We take that plan, and process it's nodes. From root node we can take output dataset, from leaf nodes we can take input datasets, and inside internal nodes we track columns to extract column-level lineage. We express those (table-level) operations by implementing classes like QueryPlanVisitor

- -

You can extend that, for example for additional types of nodes that we don't support by implementing your own QueryPlanVisitor, and then implementing OpenLineageEventHandlerFactory and packaging this into a .jar deployed alongside OpenLineage jar - this would be loaded by us using Java's ServiceLoader .

- - - - - -
- 👍 Kiran Hiremath -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-08 05:06:07
-
-

*Thread Reply:* @Maciej Obuchowski @Paweł Leszczyński - Thanks for your responses! I had a follow-up query regarding the sqlparser-rs that's used internally by open-lineage: we see that these are the SQL dialects supported by sqlparser-rs here doesn't include spark-sql / presto-sql dialects which means they'd fallback to generic dialect:

- -

"--ansi" =&gt; Box::new(AnsiDialect {}), -"--bigquery" =&gt; Box::new(BigQueryDialect {}), -"--postgres" =&gt; Box::new(PostgreSqlDialect {}), -"--ms" =&gt; Box::new(MsSqlDialect {}), -"--mysql" =&gt; Box::new(MySqlDialect {}), -"--snowflake" =&gt; Box::new(SnowflakeDialect {}), -"--hive" =&gt; Box::new(HiveDialect {}), -"--redshift" =&gt; Box::new(RedshiftSqlDialect {}), -"--clickhouse" =&gt; Box::new(ClickHouseDialect {}), -"--duckdb" =&gt; Box::new(DuckDbDialect {}), -"--generic" | "" =&gt; Box::new(GenericDialect {}), -Any idea on how much coverage generic dialect provides for spark-sql / how different they are etc?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:21:32
-
-

*Thread Reply:* spark-sql integration is based on spark LogicalPlan's tree. Extracting input/output datasets from tree nodes which is more detailed than sql parsing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-08 07:04:52
-
-

*Thread Reply:* I think presto/trino dialect is very standard - there shouldn't be any problems with regular queries

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-08 11:19:53
-
-

*Thread Reply:* @Paweł Leszczyński - Got it, and would you be able to point me to where within the openlineage-spark integration do we:

- -
  1. provide the Spark Logical Plan / query to sqlparser-rs
  2. get the output of sqlparser-rs (parsed query AST) & stitch back the inputs/outputs in the open-lineage events?
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-08 12:09:06
-
-

*Thread Reply:* For example, we'd like to understand which dialectname of sqlparser-rs would be used in which scenario by open-lineage and what're the interactions b/w open-lineage & sqlparser-rs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-09 12:18:47
-
-

*Thread Reply:* @Paweł Leszczyński - Incase you missed the above messages ^

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-10 03:31:32
-
-

*Thread Reply:* Sqlparser-rs is used within Spark integration only for spark jdbc queries (queries to external databases). That's the only scenario. For spark.sql(...) , instead of SQL parsing, we rely on a logical plan of a job and extract information from it. For jdbc queries, that user sqlparser-rs, dialect is extracted from url: -https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/main/java/io/openlineage/spark/agent/util/JdbcUtils.java#L69

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Athitya Kumar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
nivethika R - (nivethikar8@gmail.com) -
-
2023-08-06 07:16:53
-
-

Hi.. Is column lineage available for spark version 2.4.0?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-06 17:25:31
-
-

*Thread Reply:* No, it's not.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
nivethika R - (nivethikar8@gmail.com) -
-
2023-08-06 23:53:17
-
-

*Thread Reply:* Is it only available for spark version 3+?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-07 04:53:41
-
-

*Thread Reply:* Yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHubOpenLineageIssues - (githubopenlineageissues@gmail.com) -
-
2023-08-07 11:18:25
-
-

Hi, Will really appreciate if I can learn how community have been able to harness spark integration. In our testing where a spark application writes to S3 multiple times (different location), OL generates the same job name for all writes (namepsacename.executeinsertintohadoopfsrelation_command ) rendering the OL graph final output less helpful. Say for example if I have series of transformation/writes 5 times , in Lineage graph we are just seeing last 1. There is an open bug and hopefully will be resolved soon.

- -

Curious how much is adoption of OL spark integration in presence of that bug, as generating same name for a job makes it less usable for anything other than trivial one output application.

- -

Example from 2 write application -EXPECTED : first produce weather dataset and the subsequent produce weather40. (generated/mocked using 2 spark app). (1st image) -ACTUAL OL : weather40. see only last one. (2nd image)

- -

Will really appreciate community guidance as in how successful they have been in utilizing spark integration (vanilla not Databricks) . Thank you

- -

Expected. vs Actual.

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-07 11:30:00
-
-

@channel -This month’s TSC meeting is this Thursday, August 10th at 10:00 a.m. PT. On the tentative agenda: -• announcements -• recent releases -• Airflow provider progress update -• OpenLineage 1.0 overview -• open discussion -• more (TBA) -More info and the meeting link can be found on the website. All are welcome! Also, feel free to reply or DM me with discussion topics, agenda items, etc.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Maciej Obuchowski, Athitya Kumar, Anirudh Shrinivason, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 04:39:45
-
-

I can’t see output when saveAsTable 100+ columns in spark. Any help or ideas for issue? Really thanks.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-08 04:59:23
-
-

*Thread Reply:* Does this work with similar jobs, but with small amount of columns?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:12:52
-
-

*Thread Reply:* thanks for reply @Maciej Obuchowski yes it works for small amount of columns -but not work in big amount of columns

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-08 05:14:04
-
-

*Thread Reply:* one more question: how much data the jobs approximately process and how long does the execution take?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:14:54
-
-

*Thread Reply:* ah… it’s like 20 min ~ 30 min various -data size is like 2000,0000 rows with columns 100 ~ 1000

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:15:17
-
-

*Thread Reply:* that's interesting. we could prepare integration test for that. 100 cols shouldn't make a difference

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:15:37
-
-

*Thread Reply:* honestly sorry for typo its 1000 columns

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:15:44
-
-

*Thread Reply:* pivoting features

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:16:09
-
-

*Thread Reply:* i check it works good for small numbers of columns

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-08 05:16:39
-
-

*Thread Reply:* if it's 1000, then maybe we're over event size - event is too large and backend can't accept that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-08 05:17:06
-
-

*Thread Reply:* maybe debug logs could tell us something

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:19:27
-
-

*Thread Reply:* i’ll do spark.sparkContext.setLogLevel("DEBUG") ing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:19:30
-
-

*Thread Reply:* are there any errors in the logs? perhaps pivoting uses contains nodes in SparkPlan that we don't support yet

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:19:52
-
-

*Thread Reply:* did you check pivoting that results in less columns?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-08 05:20:33
-
-

*Thread Reply:* @추호관 would also be good to disable logicalPlan facet: -spark.openlineage.facets.disabled: [spark_unknown;spark.logicalPlan] -in spark conf

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:23:40
-
-

*Thread Reply:* got it can’t we do in python config -.config("spark.dynamicAllocation.enabled", "true") \ -.config("spark.dynamicAllocation.initialExecutors", "5") \ -.config("spark.openlineage.facets.disabled", [spark_unknown;spark.logicalPlan]

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:24:31
-
-

*Thread Reply:* .config("spark.dynamicAllocation.enabled", "true") \ -.config("spark.dynamicAllocation.initialExecutors", "5") \ -.config("spark.openlineage.facets.disabled", "[spark_unknown;spark.logicalPlan]"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:24:42
-
-

*Thread Reply:* ah.. string got it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:36:03
-
-

*Thread Reply:* ah… there are no errors nor debug level issue successfully Registered listener ìo.openlineage.spark.agent.OpenLineageSparkListener

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:39:40
-
-

*Thread Reply: maybe df.groupBy(some column).pivot(some_column).agg(*agg_cols) is not supported

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:43:44
-
-

*Thread Reply:* oh.. interesting spark.openlineage.facets.disabled this option gives me output when eventType is START -“eventType”: “START” -“outputs”: [ -… -columns -…. -]

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:54:13
-
-

*Thread Reply:* Yes -"spark.openlineage.facets.disabled", "[spark_unknown;spark.logicalPlan]" <- this option gives output when eventType is START but not give output bunches of columns when that config is not set

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:55:18
-
-

*Thread Reply:* this option prevents logicalPlan being serialized and sent as a part of Openlineage event which included in one of the facets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:56:12
-
-

*Thread Reply:* possibly, serializing logicalPlans, in case of pivots, leads to size of the events that are not acceptable

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:57:56
-
-

*Thread Reply:* Ah… so you mean pivot makes serializing logical plan not availble for generating event because of size. -and disable logical plan with not serializing make availabe to generate event cuz not serialiize logical plan made by pivot

- -

Can we overcome this

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:58:48
-
-

*Thread Reply:* we've seen such issues for some plans some time ago

- - - -
- 🙌 추호관 -
- -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 05:59:29
-
-

*Thread Reply:* oh…. how did you solve it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:59:32
- -
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 05:59:51
-
-

*Thread Reply:* by excluding some properties from plan to be serialized

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 06:01:14
-
-

*Thread Reply:* here https://github.com/OpenLineage/OpenLineage/blob/c3a5211f919c01870a7f79f48588177a9b[…]io/openlineage/spark/agent/lifecycle/LogicalPlanSerializer.java we exclude certain classes

-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 추호관 -
- -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 06:02:00
-
-

*Thread Reply:* AH…. excluded properties cause ignore logical plan’s of pivointing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 06:08:25
-
-

*Thread Reply:* you can start with writing a failing test here -> https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/tes[…]/openlineage/spark/agent/lifecycle/SparkReadWriteIntegTest.java

- -

then you can try to debug logical plan trying to find out what should be excluded from it when it's being serialized. Even, if you find this difficult, a failing integration test is super helpful to let others help you in that.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 06:24:54
-
-

*Thread Reply:* okay i would look into and maybe pr thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 06:38:45
-
-

*Thread Reply:* Can I ask if there are any suspicious properties?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-08 06:39:25
-
-

*Thread Reply:* sure

- - - -
- 👍 추호관 -
- -
- 🙂 추호관 -
- -
-
-
-
- - - - - -
-
- - - - -
- -
추호관 - (hogan.chu@toss.im) -
-
2023-08-08 07:10:40
-
-

*Thread Reply:* Thanks I would also try to find the property too

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-08-08 05:34:46
-
-

Hi guys, I've a generic sql-parsing doubt... what would be the recommended way (if any) to check for sql similarity? I understand that most sql parsers parse the query into an AST, but are there any well known ways to measure semantic similarities between 2 or more ASTs? Just curious lol... Any ideas appreciated! Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guy Biecher - (guy.biecher21@gmail.com) -
-
2023-08-08 07:49:55
-
-

*Thread Reply:* Hi @Anirudh Shrinivason, -I think I would take a look on this -https://sqlglot.com/sqlglot/diff.html

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-08-09 23:12:37
-
-

*Thread Reply:* Hey @Guy Biecher Yeah I was looking at this... but it seems to calculate similarity from a more textual context, as opposed to a more semantic one... -eg: SELECT ** FROM TABLE_1 and SELECT col1,col2,col3 FROM TABLE_1 could be the same semantic query, but sqlglot would give diffs in the ast because its textual...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guy Biecher - (guy.biecher21@gmail.com) -
-
2023-08-10 02:26:51
-
-

*Thread Reply:* I totally get you. In such cases without the metadata of the TABLE_1, it's impossible what I would do I would replace all ** before you use the diff function.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-08-10 07:04:37
-
-

*Thread Reply:* Yeah I was thinking about the same... But the more nested and complex your queries get, the harder it'll become to accurately pre-process before running the ast diff too... -But yeah that's probably the approach I'd be taking haha... Happy to discuss and learn if there are better ways to doing this

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-08 08:36:46
-
-

dear all, I have some novice questions. I put them in separate messages for clarity. 1st Question: I understand from the examples in the documentation that the main lineage events are RunEvent's, which can contain link to Run ID, Job ID, Dataset ID (I see they are RunEvent because they have EventType, correct?). However, the main openlineage json object contains also JobEvent and DatasetEvent. When are JobEvent and DatasetEvent supposed to be used in the workflow? Do you have relevant examples? thanks!

-
-
openlineage.io
- - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 09:53:05
-
-

*Thread Reply:* Hey @Luigi Scorzato! -You can read about these 2 event types in this blog post: https://openlineage.io/blog/static-lineage

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Luigi Scorzato -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 09:53:38
-
-

*Thread Reply:* we’ll work on getting the documentation improved to clarify the expected use cases for each event type. this is a relatively new addition to the spec.

- - - -
- 👍 Luigi Scorzato -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-08 10:08:28
-
-

*Thread Reply:* this sounds relevant for my 3rd question, doesn't it? But I do not see scheduling information among the use cases, am I wrong?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 11:16:39
-
-

*Thread Reply:* you’re not wrong, these 2 events were not designed for runtime lineage, but rather “static” lineage that gets emitted after the fact

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-08 08:46:39
-
-

2nd Question. I see that the input dataset appears in the RunEvent with EventType=START, the output dataset appears in the RunEvent with EventType=COMPLETE only, the RunEvent with EventType=RUNNING has no dataset attached. This makes sense for ETL jobs, but for streaming (e.g. Flink), the run could run very long and never terminate with a COMPLETE. On the other hand, emitting all the info about the output dataset in every RUNNING event would be far too verbose. What is the recommended set up in this case? TLDR: what is the recommended configuration of the frequency and data model of the lineage events for streaming systems like Flink?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 09:54:40
-
-

*Thread Reply:* great question! did you get a chance to look at the current Flink integration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-08 10:07:06
-
-

*Thread Reply:* to be honest, I only quickly went through this and I did not identfy what I needed. Can you please point me to the relevant section?

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 11:13:17
-
-

*Thread Reply:* here’s an example START event for Flink: https://github.com/OpenLineage/OpenLineage/blob/main/integration/flink/src/test/resources/events/expected_kafka.json

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 11:13:26
-
-

*Thread Reply:* or a checkpoint (RUNNING) event: https://github.com/OpenLineage/OpenLineage/blob/main/integration/flink/src/test/resources/events/expected_kafka_checkpoints.json

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 11:15:55
-
-

*Thread Reply:* generally speaking, you can see the execution contexts that invoke generation of OL events here: https://github.com/OpenLineage/OpenLineage/blob/main/integration/flink/src/main/ja[…]/openlineage/flink/visitor/lifecycle/FlinkExecutionContext.java

-
- - - - - - - - - - - - - - - - -
- - - -
- 👍 Luigi Scorzato -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-08 17:46:17
-
-

*Thread Reply:* thank you! So, if I understand correctly, the key is that even eventType=START, admits an output datasets. Correct? What determines how often are the eventType=RUNNING emitted?

- - - -
- 👍 Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-09 03:25:16
-
-

*Thread Reply:* now I see, RUNNING events are emitted on onJobCheckpoint

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-08 08:59:40
-
-

3rd Question: I am looking for information about the time when the next run should start, in case of scheduled jobs. I see that the Run Facet has a Nominal Time Facet, but -- if I understand correctly -- it refers to the current run, so it is always emitted after the fact. Is the Nominal Start Time of the next run available somewhere? If not, where do you recommend to add it as a custom field? In principle, it belongs to the Job object, but would that maybe cause an undesirable fast change in the Job object?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-08 11:10:47
-
-

*Thread Reply:* For Airflow, this is part of the AirflowRunFacet, here: https://github.com/OpenLineage/OpenLineage/blob/81372ca2bc2afecab369eab4a54cc6380dda49d0/integration/airflow/facets/AirflowRunFacet.json#L100

- -

For other orchestrators / schedulers, that would depend..

- - - -
- 👍 Luigi Scorzato -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Kiran Hiremath - (kiran_hiremath@intuit.com) -
-
2023-08-08 10:30:56
-
-

Hi Team, Question regarding Databricks OpenLineage init script, is the path /mnt/driver-daemon/jars common to all the clusters? or its unique to each cluster? https://github.com/OpenLineage/OpenLineage/blob/81372ca2bc2afecab369eab4a54cc6380d[…]da49d0/integration/spark/databricks/open-lineage-init-script.sh

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-08 12:15:40
-
-

*Thread Reply:* I might be wrong, but I believe it's unique for each cluster - the common part is dbfs\.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-09 02:38:54
-
-

*Thread Reply:* dbfs is mounted to a databricks workspace which can run multiple clusters. so i think, it's common.

- -

Worth mentioning: init-scripts located in dbfs are becoming deprecated next month and we plan moving them into workspaces.

- - - -
- 👍 Kiran Hiremath -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Kiran Hiremath - (kiran_hiremath@intuit.com) -
-
2023-08-11 01:33:24
-
-

*Thread Reply:* yes, the init scripts are moved at workspace level.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHubOpenLineageIssues - (githubopenlineageissues@gmail.com) -
-
2023-08-08 14:19:40
-
-

Hi @Paweł Leszczyński Will really aprecaite if you please let me know once this PR is good to go. Will love to test in our environment : https://github.com/OpenLineage/OpenLineage/pull/2036. Thank you for all your help.

-
- - - - - - - -
-
Labels
- integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-09 02:35:28
-
-

*Thread Reply:* great to hear. I still need some time as there are few corner cases. For example: what should be the behaviour when alter table rename is called 😉 But sure, you can test it if you like. ci is failing on integration tests but ./gradlew clean build with unit tests are fine.

- - - -
- :gratitude_thank_you: GitHubOpenLineageIssues -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-10 03:33:50
-
-

*Thread Reply:* @GitHubOpenLineageIssues Feel invited to join todays community and advocate for the importance of this issue. Such discussions are extremely helpful in prioritising backlog the right way.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Gaurav Singh - (gaurav.singh@razorpay.com) -
-
2023-08-09 07:54:33
-
-

Hi Team, -I'm doing a POC with open lineage to extract column lineage of Spark. I'm using it on databricks notebook. I'm facing a issue where I,m trying to get the column lineage in a join involving external tables on s3. The lineage that is being extracted is returning on base path of the table ie on the s3 file path and not on the corresponding tables. Is there a way to extract/map columns of output to the columns of base tables instead of storage location.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Gaurav Singh - (gaurav.singh@razorpay.com) -
-
2023-08-09 07:55:28
-
-

*Thread Reply:* Query: -INSERT INTO test.merchant_md -(Select - m.`id`, - m.name, - m.activated, - m.parent_id, - md.contact_name, - md.contact_email -FROM - test.merchants_0 m - LEFT JOIN merchant_details md ON m.id = md.merchant_id -WHERE - m.created_date &gt; '2023-08-01')

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Gaurav Singh - (gaurav.singh@razorpay.com) -
-
2023-08-09 08:01:56
-
-

*Thread Reply:* "columnLineage":{ - "_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.30.1/integration/spark>", - "_schemaURL":"<https://openlineage.io/spec/facets/1-0-1/ColumnLineageDatasetFacet.json#/$defs/ColumnLineageDatasetFacet>", - "fields":{ - "merchant_id":{ - "inputFields":[ - { - "namespace":"<s3a://datalake>", - "name":"/test/merchants", - "field":"id" - } - ] - }, - "merchant_name":{ - "inputFields":[ - { - "namespace":"<s3a://datalake>", - "name":"/test/merchants", - "field":"name" - } - ] - }, - "activated":{ - "inputFields":[ - { - "namespace":"<s3a://datalake>", - "name":"/test/merchants", - "field":"activated" - } - ] - }, - "parent_id":{ - "inputFields":[ - { - "namespace":"<s3a://datalake>", - "name":"/test/merchants", - "field":"parent_id" - } - ] - }, - "contact_name":{ - "inputFields":[ - { - "namespace":"<s3a://datalake>", - "name":"/test/merchant_details", - "field":"contact_name" - } - ] - }, - "contact_email":{ - "inputFields":[ - { - "namespace":"<s3a://datalake>", - "name":"/test/merchant_details", - "field":"contact_email" - } - ] - } - } - }, - "symlinks":{ - "_producer":"<https://github.com/OpenLineage/OpenLineage/tree/0.30.1/integration/spark>", - "_schemaURL":"<https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet>", - "identifiers":[ - { - "namespace":"/warehouse/test.db", - "name":"test.merchant_md", - "type":"TABLE" - }

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Gaurav Singh - (gaurav.singh@razorpay.com) -
-
2023-08-09 08:23:57
-
-

*Thread Reply:* "contact_name":{ - "inputFields":[ - { - "namespace":"<s3a://datalake>", - "name":"/test/merchant_details", - "field":"contact_name" - } - ] - } -This is returning mapping from the s3 location on which the table is created.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-09 10:56:27
-
-

Hey, -I’m running Spark application (spark version 3.4) with OL integration. -I changed spark to use “debug” level, and I see the OL events with the below message: -“Emitting lineage completed successfully:”

- -

With all the above, I can’t see the event in Marquez.

- -

Attaching the OL configurations. -When changing the OL-spark version to 0.6.+, I do see event created in Marquez with only “Start” status (attached below).

- -

The OL-spark version is matching the Spark version? Is there a known issues with the Spark / OL versions ?

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-09 11:23:42
-
-

*Thread Reply:* > OL-spark version to 0.6.+ -This OL version is ancient. You can try with 1.0.0

- -

I think you're hitting this issue which duplicates jobs: https://github.com/OpenLineage/OpenLineage/issues/1943

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- -
-
Labels
- integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-10 01:46:08
-
-

*Thread Reply:* I haven’t mentioned that I tried multiple OL versions - 1.0.0 / 0.30.1 / 0.6.+ … -None of them worked for me. -@Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 05:25:49
-
-

*Thread Reply:* @Zahi Fail understood. Can you provide sample job that reproduces this behavior, and possibly some logs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 05:26:11
-
-

*Thread Reply:* If you can, it might be better to create issue at github and communicate there.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-10 08:34:01
-
-

*Thread Reply:* Before creating an issue in GIT, I wanted to check if my issue only related to versions compatibility..

- -

This is the sample of my test: -```from pyspark.sql import SparkSession -from pyspark.sql.functions import col

- -

spark = SparkSession.builder\ - .config('spark.jars.packages', 'io.openlineage:openlineage_spark:1.0.0') \ - .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') \ - .config('spark.openlineage.host', 'http://localhost:9000') \ -.config('spark.openlineage.namespace', 'default') \ - .getOrCreate()

- -

spark.sparkContext.setLogLevel("DEBUG")

- -

csv_file = location.csv

- -

df = spark.read.format("csv").option("header","true").option("sep","^").load(csv_file)

- -

df = df.select("campaignid","revenue").groupby("campaignid").sum("revenue").show()``` -Part of the logs with the OL configurations and the processed event

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 08:40:13
-
-

*Thread Reply:* try spark.openlineage.transport.url instead of spark.openlineage.host

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 08:40:27
-
-

*Thread Reply:* and possibly link the doc where you've seen spark.openlineage.host 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-10 08:59:27
-
-

*Thread Reply:* https://openlineage.io/blog/openlineage-spark/

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- -
- - - - - - - -
- - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-10 09:04:56
-
-

*Thread Reply:* changing to “spark.openlineage.transport.url” didn’t make any change

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 09:09:42
-
-

*Thread Reply:* do you see the ConsoleTransport log? it suggests Spark integration did not register that you want to send events to Marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 09:10:09
-
-

*Thread Reply:* let's try adding spark.openlineage.transport.type to http

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-10 09:14:50
-
-

*Thread Reply:* Now it works !

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Zahi Fail - (zahi.fail@gmail.com) -
-
2023-08-10 09:14:58
-
-

*Thread Reply:* thanks @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 09:23:04
-
-

*Thread Reply:* Cool 🙂 however it should not require it if you provide spark.openlineage.transport.url - I'll create issue for debugging that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-09 14:37:24
-
-

@channel -This month’s TSC meeting is tomorrow! All are welcome. https://openlineage.slack.com/archives/C01CK9T7HKR/p1691422200847979

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-10 02:11:07
-
-

While using the spark integration, we're unable to see the query in the job facet for any spark-submit - is this a known issue/limitation, and can someone point to the code where this is currently extracted / can be enhanced?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-10 02:55:46
-
-

*Thread Reply:* Let me first rephrase my understanding of the question assume a user runs spark.sql('INSERT INTO ...'). Are we able to include sql queryINSERT INTO ...within SQL facet?

- -

We once had a look at it and found it difficult. Given an SQL, spark immediately translates it to a logical plan (which our integration is based on) and we didn't find any place where we could inject our code and get access to sql being run.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-10 04:27:51
-
-

*Thread Reply:* Got it. So for spark.sql() - there's no interaction with sqlparser-rs and we directly try stitching the input/output & column lineage from the spark logical plan. Would something like this fall under the spark.jdbc() route or the spark.sql() route (say, if the df is collected / written somewhere)?

- -

val df = spark.read.format("jdbc") - .option("url", url) - .option("user", user) - .option("password", password) - .option("fetchsize", fetchsize) - .option("driver", driver)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 05:15:17
-
-

*Thread Reply:* @Athitya Kumar I understand your issue. From my side, there's one problem with this - potentially there can be multiple queries for one spark job. You can imagine something like joining results of two queries - possible to separate systems - and then one SqlJobFacet would be misleading. This needs more thorough spec discussion

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Luigi Scorzato - (luigi.scorzato@gmail.com) -
-
2023-08-10 05:33:47
-
-

Hi Team, has anyone experience with integrating OpenLineage with the SAP ecosystem? And with Salesforce/MuleSoft?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 05:40:47
-
-

Hi, -Are there any ways to save list of string directly in the dataset facets? Such as the myfacets field in this dict -"facets": { - "metadata_facet": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.29.2/client/python>", - "_schemaURL": "<https://sth/schemas/facets.json#/definitions/SomeFacet>", - "myfacets": ["a", "b", "c"] - } - }

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 05:42:20
-
-

*Thread Reply:* I'm using python OpenLineage package and extend the BaseFacet class

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 05:53:57
-
-

*Thread Reply:* for custom facets, as long as it's valid json - go for it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 05:55:03
-
-

*Thread Reply:* However I tried to insert a list of string. And I tried to get the dataset, the returned valued of that list field is empty.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 05:55:57
-
-

*Thread Reply:* @attr.s -class MyFacet(BaseFacet): - columns: list[str] = attr.ib() -Here's my python code.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 05:59:02
-
-

*Thread Reply:* How did you emit, serialized the event, and where did you look when you said you tried to get the dataset?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 06:00:27
-
-

*Thread Reply:* I assume the problem is somewhere there, not on the level of facet definition, since SchemaDatasetFacet looks pretty much the same and it works

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 06:00:54
-
-

*Thread Reply:* I use the python openlineage client to emit the RunEvent. -openlineage_client.emit( - RunEvent( - eventType=RunState.COMPLETE, - eventTime=datetime.now().isoformat(), - run=run, - job=job, - producer=PRODUCER, - outputs=outputs, - ) - ) -And use marquez to visualize the get data result

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 06:02:12
-
-

*Thread Reply:* Yah, list of objects is working, but list of string is not.😩

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 06:03:23
-
-

*Thread Reply:* I think the problem is related to the openlineage package openlineage.client.serde.py. The function Serde.to_json()

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 06:05:56
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 06:19:34
-
-

*Thread Reply:* I think the code here filters out those string values in the list

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 06:21:39
-
-

*Thread Reply:* 👀

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 06:24:48
-
-

*Thread Reply:* Yah, the value in list will end up False in this code and be filtered out -isinstance(_x_, dict)

- -

😳

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 06:26:33
-
-

*Thread Reply:* wow, that's right 😬

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-10 06:26:47
-
-

*Thread Reply:* want to create PR fixing that?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 06:27:20
-
-

*Thread Reply:* Sure! May do this later tomorrow.

- - - -
- 👍 Maciej Obuchowski, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-10 23:59:28
-
-

*Thread Reply:* I created the pr at https://github.com/OpenLineage/OpenLineage/pull/2044 -But the ci on integration-test-integration-spark FAILED

-
- - - - - - - -
-
Labels
- client/python -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-11 04:17:01
-
-

*Thread Reply:* @Steven sorry for that - some tests require credentials that are not present on the forked versions of CI. It will work once I push it to origin. Anyway Spark tests failing aren't blocker for this Python PR

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-11 04:17:45
-
-

*Thread Reply:* I would only ask to add some tests for that case with facets containing list of string

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-11 04:18:21
-
-

*Thread Reply:* Yeah sure, I will add them now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-11 04:25:19
-
-

*Thread Reply:* ah we had other CI problem, go version was too old in one of the jobs - neverthless I won't judge your PR on stuff failing outside your PR anyway 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Steven - (xli@zjuici.com) -
-
2023-08-11 04:36:57
-
-

*Thread Reply:* LOL🤣 I've added some tests and made a force push

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
savan - (SavanSharan_Navalgi@intuit.com) -
-
2023-10-20 08:31:45
-
-

*Thread Reply:* @GitHubOpenLineageIssues -I am trying to contribute to Integration tests which is listed here as good first issue -the CONTRIBUTING.md mentions that i can trigger CI for integration tests from forked branch. -using this tool. -but i am unable to do so, is there a way to trigger CI from forked brach or do i have to get permission from someone to run the CI?

- -

i am getting this error when i run this command sudo git-push-fork-to-upstream-branch upstream savannavalgi:hacktober -> Username for '<https://github.com>': savannavalgi -&gt; Password for '<https://savannavalgi@github.com>': -&gt; remote: Permission to OpenLineage/OpenLineage.git denied to savannavalgi. -&gt; fatal: unable to access '<https://github.com/OpenLineage/OpenLineage.git/>': The requested URL returned error: 403 -i have tried to configure ssh key -also tried to trigger CI from another brach, -and tried all of this after fetching the latest upstream

- -

cc: @Athitya Kumar @Maciej Obuchowski @Steven

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-23 04:57:44
-
-

*Thread Reply:* what PR is the probem related to? I can run git-push-fork-to-upstream-branch for you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
savan - (SavanSharan_Navalgi@intuit.com) -
-
2023-10-25 01:08:41
-
-

*Thread Reply:* @Paweł Leszczyński thanks for approving my PR - ( link )

- -

I will make the changes needed for the new integration test case for drop table (good first issue) , in another PR, -I would need your help to run the integration tests again, thank you

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
savan - (SavanSharan_Navalgi@intuit.com) -
-
2023-10-26 07:48:52
-
-

*Thread Reply:* @Paweł Leszczyński -opened a PR ( link ) for integration test for drop table -can you please help run the integration test

-
- - - - - - - -
-
Labels
- documentation, integration/spark -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-26 07:50:29
-
-

*Thread Reply:* sure, some of our tests require access to S3/BigQuery secret keys, so will not work automatically from the fork, and require action on our side. working on that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
savan - (SavanSharan_Navalgi@intuit.com) -
-
2023-10-29 09:31:22
-
-

*Thread Reply:* thanks @Paweł Leszczyński -let me know if i can help in any way

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
savan - (SavanSharan_Navalgi@intuit.com) -
-
2023-11-15 02:31:50
-
-

*Thread Reply:* @Paweł Leszczyński any action item on my side?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-11 07:36:57
-
-

Hey folks! 👋

- -

Had a query/observation regarding columnLineage inferred in spark integration - opened this issue for the same. Basically, when we do something like this in our spark-sql: -SELECT t1.c1, t1.c2, t1.c3, t2.c4 FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1 AND t1.c2 = t2.c2 -The expected column lineage for output table t3 is: -t3.c1 -&gt; Comes from both t1.c1 &amp; t2.c1 (SELECT + JOIN clause) -t3.c2 -&gt; Comes from both t1.c2 &amp; t2.c2 (SELECT + JOIN clause) -t3.c3 -&gt; Comes from t1.c3 -t3.c4 -&gt; Comes from t2.c4 -However, actual column lineage for output table t3 is: -t3.c1 -&gt; Comes from t1.c1 (Only based on SELECT clause) -t3.c2 -&gt; Comes from t1.c1 (Only based on SELECT clause) -t3.c3 -&gt; Comes from t1.c3 -t3.c4 -&gt; Comes from t2.c4 -Is this a known issue/behaviour?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-11 09:18:44
-
-

*Thread Reply:* Hmm... this is kinda "logical" difference - is column level lineage taken from actual "physical" operations - like in this case, we always take from t1 - or from "logical" where t2 is used only for predicate, yet we still want to indicate it as a source?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-11 09:18:58
-
-

*Thread Reply:* I think your interpretation is more useful

- - - -
- 🙏 Athitya Kumar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-11 09:25:03
-
-

*Thread Reply:* @Maciej Obuchowski - Yup, especially for use-cases where we wanna depend on column lineage for impact analysis, I think we should be considering even predicates. For example, if t2.c1 / t2.c2 gets corrupted or dropped, the query would be impacted - which means that we should be including even predicates (t2.c1 / t2.c2) in the column lineage imo

- -

But is there any technical limitation if we wanna implement this / make an OSS contribution for this (like logical predicate columns not being part of the spark logical plan object that we get in the PlanVisitor or something like that)?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-11 11:14:58
-
-

*Thread Reply:* It's probably a bit of work, but can't think it's impossible on parser side - @Paweł Leszczyński will know better about spark collection

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Ernie Ostic - (ernie.ostic@getmanta.com) -
-
2023-08-11 12:45:34
-
-

*Thread Reply:* This is a case where it would be nice to have an alternate indication (perhaps in the Column lineage facet?) for this type of "suggested" lineage. As noted, this is especially important for impact analysis purposes. We (and I believe others do the same or similar) call that "indirect" lineage at Manta.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-11 12:49:10
-
-

*Thread Reply:* Something like additional flag in inputFields, right?

- - - -
- 👍 Athitya Kumar, Ernie Ostic, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-14 02:36:34
-
-

*Thread Reply:* Yes, this would require some extension to the spec. What do you mean spark-sql : spark.sql() with some spark query or SQL in spark JDBC?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-15 15:16:49
-
-

*Thread Reply:* Sorry, missed your question @Paweł Leszczyński. By spark-sql, I'm referring to the former: spark.sql() with some spark query

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-16 03:10:57
-
-

*Thread Reply:* cc @Jens Pfau - you may be also interested in extending column level lineage facet.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-08-22 02:23:08
-
-

*Thread Reply:* Hi, is there a github issue for this feature? Seems like a really cool and exciting functionality to have!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-22 08:03:49
-
-

*Thread Reply:* @Anirudh Shrinivason - Are you referring to this issue: https://github.com/OpenLineage/OpenLineage/issues/2048?

-
- - - - - - - -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
- :gratitude_thank_you: Anirudh Shrinivason -
- -
- ✅ Anirudh Shrinivason -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-14 05:13:48
-
-

Hey team 👋

- -

Is there a way we can feed the logical plan directly to check the open-lineage events being built, without actually running a spark-job with open-lineage configs? Basically interested to see if we can mock a dry-run of a spark job w/ open-lineage by mimicking the logical plan 😄

- -

cc @Shubh

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-14 06:00:21
-
-

*Thread Reply:* Not really I think - the integration does not rely purely on the logical plan

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-14 06:00:44
-
-

*Thread Reply:* At least, not in all cases. For some maybe

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-14 07:34:39
-
-

*Thread Reply:* We're using pretty similar approach in our column level lineage tests where we run some spark commands, register custom listener https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/tes[…]eage/spark/agent/util/LastQueryExecutionSparkEventListener.java which catches the logical plan. Further we run our tests on the captured logical plan.

- -

The difference here, between what you're asking about, is that we still have an access to the same spark session.

- -

In many cases, our integration uses active Spark session to fetch some dataset details. This happens pretty often (like fetch dataset location) and cannot be taken just from a Logical Plan.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-14 11:03:28
-
-

*Thread Reply:* @Paweł Leszczyński - We're mainly interested to see the inputs/outputs (mainly column schema and column lineage) for different logical plans. Is that something that could be done in a static manner without running spark jobs in your opinion?

- -

For example, I know that we can statically create logical plans

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-16 03:05:44
-
-

*Thread Reply:* The more we talk the more I am wondering what is the purpose of doing so? Do you want to test openlineage coverage or is there any production scenario where you would like to apply this?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-16 04:01:39
-
-

*Thread Reply:* @Paweł Leszczyński - This is for testing openlineage coverage so that we can be more confident on what're the happy path scenarios and what're the scenarios where it may not work / work partially etc

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-16 04:22:01
-
-

*Thread Reply:* If this is for testing, then you're also capable of mocking some SparkSession/catalog methods when Openlineage integration tries to access them. If you want to reuse LogicalPlans from your prod environment, you will encounter logicalplan serialization issues. On the other hand, if you generate logical plans from some example Spark jobs, then the same can be easier achieved in a way the integration tests are run with mockserver.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-14 09:45:31
-
-

Hi Team,

- -

Spark & Databricks related question: Starting 1st September Databricks is going to block running init_scripts located in dbfs and this is the way our integration works (https://www.databricks.com/blog/securing-databricks-cluster-init-scripts).

- -

We have two ways of mitigating this in our docs and quickstart: - (1) move initscripts to workspace - (2) move initscripts to S3

- -

None of them is perfect. (1) requires creating init_script file manually through databricks UI and copy/paste its content. I couldn't find the way to load it programatically. (2) requires quickstart user to have s3 bucket access.

- -

Would love to hear your opinion on this. Perhaps there's some better way to do that. Thanks. `

-
-
Databricks
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-08-15 01:13:49
-
-

*Thread Reply:* We're uploading the init scripts to s3 via tf. But yeah ig there are some access permissions that the user needs to have

- - - -
- :gratitude_thank_you: Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-16 07:32:00
-
-

*Thread Reply:* Hello -I am new here and I am asking why do you need an init script ? -If it's a spark integration we can just specify --package=io.openlineage...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-16 07:41:25
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/databricks/open-lineage-init-script.sh -> I think the issue was in having openlineage-jar installed immediately on the classpath bcz it's required when OpenLineageSparkListener is instantiated. It didn't work without it.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-16 07:43:55
-
-

*Thread Reply:* Yes it happens if you use --jars s3://.../...openlineage-spark-VERSION.jar parameter. (I made a ticket for this issue in Databricks support) -But if you use --package io.openlineage... (the package will be downloaded from maven) it works fine.

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-16 07:47:50
-
-

*Thread Reply:* I think they don't use the right class loader.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-16 08:36:14
-
-

*Thread Reply:* To make sure: are you able to run Openlineage & Spark on Databricks Runtime without init_scripts?

- -

I was doing this a second ago and this ended up with Caused by: java.lang.ClassNotFoundException: io.openlineage.spark.agent.OpenLineageSparkListener not found in com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader@1609ed55

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Alexandre Campelo - (aleqi200@gmail.com) -
-
2023-08-14 19:49:00
-
-

Hello, I just downloaded Marquez and I'm trying to send a sample request but I'm getting a 403 (forbidden). Any idea how to find the authentication details?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Alexandre Campelo - (aleqi200@gmail.com) -
-
2023-08-15 12:19:34
-
-

*Thread Reply:* Ok, nevermind. I figured it out. The port 5000 is reserved in MACOS so I had to start on port 9000 instead.

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-08-15 01:25:48
-
-

Hi, I noticed that while capturing lineage for merge into commands, some of the tables/columns are unaccounted for the lineage. Example: -```fdummyfunnelstg = spark.sql("""WITH dummyfunnel AS ( - SELECT ** - FROM fdummyfunnelone - WHERE dateid BETWEEN {startdateid} AND {enddateid}

- -
        UNION ALL
-
-        SELECT **
-        FROM f_dummy_funnel_two
-        WHERE date_id BETWEEN {start_date_id} AND {end_date_id}
-
-        UNION ALL
-
-        SELECT **
-        FROM f_dummy_funnel_three
-        WHERE date_id BETWEEN {start_date_id} AND {end_date_id}
-
-        UNION ALL
-
-        SELECT **
-        FROM f_dummy_funnel_four
-        WHERE date_id BETWEEN {start_date_id} AND {end_date_id}
-
-        UNION ALL
-
-        SELECT **
-        FROM f_dummy_funnel_five
-        WHERE date_id BETWEEN {start_date_id} AND {end_date_id}
-
-    )
-    SELECT DISTINCT
-        dummy_funnel.customer_id,
-        dummy_funnel.product,
-        dummy_funnel.date_id,
-        dummy_funnel.country_id,
-        dummy_funnel.city_id,
-        dummy_funnel.dummy_type_id,
-        dummy_funnel.num_attempts,
-        dummy_funnel.num_transactions,
-        dummy_funnel.gross_merchandise_value,
-        dummy_funnel.sub_category_id,
-        dummy_funnel.is_dummy_flag
-    FROM dummy_funnel
-    INNER JOIN d_dummy_identity as dummy_identity
-        ON dummy_identity.id = dummy_funnel.customer_id
-    WHERE
-        date_id BETWEEN {start_date_id} AND {end_date_id}""")
-
- -

spark.sql(f""" - MERGE INTO {tablename} - USING fdummyfunnelstg - ON - fdummyfunnelstg.customerid = {tablename}.customerid - AND fdummyfunnelstg.product = {tablename}.product - AND fdummyfunnelstg.dateid = {tablename}.dateid - AND fdummyfunnelstg.countryid = {tablename}.countryid - AND fdummyfunnelstg.cityid = {tablename}.cityid - AND fdummyfunnelstg.dummytypeid = {tablename}.dummytypeid - AND fdummyfunnelstg.subcategoryid = {tablename}.subcategoryid - AND fdummyfunnelstg.isdummyflag = {tablename}.isdummyflag - WHEN MATCHED THEN - UPDATE SET - {tablename}.numattempts = fdummyfunnelstg.numattempts - , {tablename}.numtransactions = fdummyfunnelstg.numtransactions - , {tablename}.grossmerchandisevalue = fdummyfunnelstg.grossmerchandisevalue - WHEN NOT MATCHED - THEN INSERT ( - customerid, - product, - dateid, - countryid, - cityid, - dummytypeid, - numattempts, - numtransactions, - grossmerchandisevalue, - subcategoryid, - isdummyflag - ) - VALUES ( - fdummyfunnelstg.customerid, - fdummyfunnelstg.product, - fdummyfunnelstg.dateid, - fdummyfunnelstg.countryid, - fdummyfunnelstg.cityid, - fdummyfunnelstg.dummytypeid, - fdummyfunnelstg.numattempts, - fdummyfunnelstg.numtransactions, - fdummyfunnelstg.grossmerchandisevalue, - fdummyfunnelstg.subcategoryid, - fdummyfunnelstg.isdummyflag - ) - """)`` -In cases like this, I notice that the full lineage is not actually captured... I'd expect to see this having 5 upstreams:dummyfunnelone, dummyfunneltwo, dummyfunnelthree, dummyfunnelfour, dummyfunnel_five` , but I notice only 1-2 upstreams for this case... -Would like to learn more about why this might happen, and whether this is expected behaviour or not. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-15 06:48:43
-
-

*Thread Reply:* Would be useful to see generated event or any logs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-16 03:09:05
-
-

*Thread Reply:* @Anirudh Shrinivason what if there is just one union instead of four? What if there are just two columns selected instead of 10? What if inner join is skipped? Does merge into matter?

- -

The smaller SQL to reproduce the problem, the easier it is to find the root cause. Most of the issues are reproducible with just few lines of code.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-08-16 03:34:30
-
-

*Thread Reply:* Yup let me try to identify the cause from my end. Give me some time haha. I'll reach out again once there is more clarity on the occurence

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-16 07:33:21
-
-

Hello,

- -

The OpenLineage Databricks integration is not working properly in our side which due to filtering adaptive_spark_plan

- -

Please find the issue link.

- -

https://github.com/OpenLineage/OpenLineage/issues/2058

-
- - - - - - - - - - - - - - - - -
- - - -
- ⬆️ Mouad MOUSSABBIH, Abdallah -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-16 09:24:09
-
-

*Thread Reply:* thanks @Abdallah for the thoughtful issue that you submitted! -was wondering if you’d consider opening up a PR? would love to help you as a contributor is that’s something you are interested in.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 11:59:51
-
-

*Thread Reply:* Hello

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 11:59:58
-
-

*Thread Reply:* Yes I am working on it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 12:00:14
-
-

*Thread Reply:* I deleted the line that has that filter.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 12:00:24
-
-

*Thread Reply:* I am adding some tests now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 12:00:45
-
-

*Thread Reply:* But running -./gradlew --no-daemon databricksIntegrationTest -x test -Pspark.version=3.4.0 -PdatabricksHost=$DATABRICKS_HOST -PdatabricksToken=$DATABRICKS_TOKEN

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 12:01:11
-
-

*Thread Reply:* gives me -A problem occurred evaluating project ':app'. -&gt; Could not resolve all files for configuration ':app:spark33'. - &gt; Could not resolve io.openlineage:openlineage_java:1.1.0-SNAPSHOT. - Required by: - project :app &gt; project :shared - &gt; Could not resolve io.openlineage:openlineage_java:1.1.0-SNAPSHOT. - &gt; Unable to load Maven meta-data from <https://astronomer.jfrog.io/artifactory/maven-public-libs-snapshot/io/openlineage/openlineage-java/1.1.0-SNAPSHOT/maven-metadata.xml>. - &gt; org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 326; The reference to entity "display" must end with the ';' delimiter. - &gt; Could not resolve io.openlineage:openlineage_sql_java:1.1.0-SNAPSHOT. - Required by: - project :app &gt; project :shared - &gt; Could not resolve io.openlineage:openlineage_sql_java:1.1.0-SNAPSHOT. - &gt; Unable to load Maven meta-data from <https://astronomer.jfrog.io/artifactory/maven-public-libs-snapshot/io/openlineage/openlineage-sql-java/1.1.0-SNAPSHOT/maven-metadata.xml>. - &gt; org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 326; The reference to entity "display" must end with the ';' delimiter.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 12:01:25
-
-

*Thread Reply:* And I am trying to understand what should I do.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 12:13:37
-
-

*Thread Reply:* I am compiling sql integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 13:04:15
-
-

*Thread Reply:* I built the java client

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 13:04:29
-
-

*Thread Reply:* but having -A problem occurred evaluating project ':app'. -&gt; Could not resolve all files for configuration ':app:spark33'. - &gt; Could not resolve io.openlineage:openlineage_java:1.1.0-SNAPSHOT. - Required by: - project :app &gt; project :shared - &gt; Could not resolve io.openlineage:openlineage_java:1.1.0-SNAPSHOT. - &gt; Unable to load Maven meta-data from <https://astronomer.jfrog.io/artifactory/maven-public-libs-snapshot/io/openlineage/openlineage-java/1.1.0-SNAPSHOT/maven-metadata.xml>. - &gt; org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 326; The reference to entity "display" must end with the ';' delimiter. - &gt; Could not resolve io.openlineage:openlineage_sql_java:1.1.0-SNAPSHOT. - Required by: - project :app &gt; project :shared - &gt; Could not resolve io.openlineage:openlineage_sql_java:1.1.0-SNAPSHOT. - &gt; Unable to load Maven meta-data from <https://astronomer.jfrog.io/artifactory/maven-public-libs-snapshot/io/openlineage/openlineage-sql-java/1.1.0-SNAPSHOT/maven-metadata.xml>. - &gt; org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 326; The reference to entity "display" must end with the ';' delimiter.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-17 14:47:41
-
-

*Thread Reply:* Please do ./gradlew publishToMavenLocal in client/java directory

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 14:47:59
-
-

*Thread Reply:* Okay thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-17 14:48:01
-
-

*Thread Reply:* will do

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:33:02
-
-

*Thread Reply:* Hello back

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:33:12
-
-

*Thread Reply:* I created a databricks cluster.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:35:00
-
-

*Thread Reply:* And I had somme issues that -PdatabricksHost doesn't work with System.getProperty("databricksHost") So I changed to -DdatabricksHost with System.getenv("databricksHost")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:36:19
-
-

*Thread Reply:* Then I had some issue that the path dbfs:/databricks/openlineage/ doesn't exist, I, then, created the folder /dbfs/databricks/openlineage/

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:38:03
-
-

*Thread Reply:* And now I am investigating this issue : -java.lang.NullPointerException - at io.openlineage.spark.agent.DatabricksUtils.uploadOpenlineageJar(DatabricksUtils.java:226) - at io.openlineage.spark.agent.DatabricksUtils.init(DatabricksUtils.java:66) - at io.openlineage.spark.agent.DatabricksIntegrationTest.setup(DatabricksIntegrationTest.java:54) - at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) - at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) - at ... -worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74) - Suppressed: com.databricks.sdk.core.DatabricksError: Missing required field: cluster_id - at app//com.databricks.sdk.core.error.ApiErrors.readErrorFromResponse(ApiErrors.java:48) - at app//com.databricks.sdk.core.error.ApiErrors.checkForRetry(ApiErrors.java:22) - at app//com.databricks.sdk.core.ApiClient.executeInner(ApiClient.java:236) - at app//com.databricks.sdk.core.ApiClient.getResponse(ApiClient.java:197) - at app//com.databricks.sdk.core.ApiClient.execute(ApiClient.java:187) - at app//com.databricks.sdk.core.ApiClient.POST(ApiClient.java:149) - at app//com.databricks.sdk.service.compute.ClustersImpl.delete(ClustersImpl.java:31) - at app//com.databricks.sdk.service.compute.ClustersAPI.delete(ClustersAPI.java:191) - at app//com.databricks.sdk.service.compute.ClustersAPI.delete(ClustersAPI.java:180) - at app//io.openlineage.spark.agent.DatabricksUtils.shutdown(DatabricksUtils.java:96) - at app//io.openlineage.spark.agent.DatabricksIntegrationTest.shutdown(DatabricksIntegrationTest.java:65) - at -...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:39:22
-
-

*Thread Reply:* Suppressed: com.databricks.sdk.core.DatabricksError: Missing required field: cluster_id

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:40:18
-
-

*Thread Reply:* at io.openlineage.spark.agent.DatabricksUtils.uploadOpenlineageJar(DatabricksUtils.java:226)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:54:51
-
-

*Thread Reply:* I did this !echo "xxx" &gt; /dbfs/databricks/openlineage/openlineage-spark-V.jar

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:55:29
-
-

*Thread Reply:* To create some fake file that can be deleted in uploadOpenlineageJar function.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 10:56:09
-
-

*Thread Reply:* Because if there is no file, this part fails -StreamSupport.stream( - workspace.dbfs().list("dbfs:/databricks/openlineage/").spliterator(), false) - .filter(f -&gt; f.getPath().contains("openlineage-spark")) - .filter(f -&gt; f.getPath().endsWith(".jar")) - .forEach(f -&gt; workspace.dbfs().delete(f.getPath()));

- - - -
- 😬 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-22 11:47:17
-
-

*Thread Reply:* does this work after -!echo "xxx" &gt; /dbfs/databricks/openlineage/openlineage-spark-V.jar -?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 11:47:36
-
-

*Thread Reply:* Yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 19:02:05
-
-

*Thread Reply:* I am now having another error in the driver

- -

23/08/22 22:56:26 ERROR SparkContext: Error initializing SparkContext. -org.apache.spark.SparkException: Exception when registering SparkListener - at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:3121) - at org.apache.spark.SparkContext.&lt;init&gt;(SparkContext.scala:835) - at com.databricks.backend.daemon.driver.DatabricksILoop$.$anonfun$initializeSharedDriverContext$1(DatabricksILoop.scala:362) -... - at com.databricks.DatabricksMain.main(DatabricksMain.scala:146) - at com.databricks.backend.daemon.driver.DriverDaemon.main(DriverDaemon.scala) -Caused by: java.lang.ClassNotFoundException: io.openlineage.spark.agent.OpenLineageSparkListener not found in com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader@298cfe89 - at com.databricks.backend.daemon.driver.ClassLoaders$MultiReplClassLoader.loadClass(ClassLoaders.scala:115) - at java.lang.ClassLoader.loadClass(ClassLoader.java:352) - at java.lang.Class.forName0(Native Method) - at java.lang.Class.forName(Class.java:348) - at org.apache.spark.util.Utils$.classForName(Utils.scala:263)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 19:19:29
-
-

*Thread Reply:* Can you please share with me your json conf for the cluster ?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 19:55:57
-
-

*Thread Reply:* It's because in mu build file I have

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 19:56:27
-
-

*Thread Reply:* and the one that was copied is

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 20:01:12
-
-

*Thread Reply:* due to the findAny 😕 -private static void uploadOpenlineageJar(WorkspaceClient workspace) { - Path jarFile = - Files.list(Paths.get("../build/libs/")) - .filter(p -&gt; p.getFileName().toString().startsWith("openlineage-spark-")) - .filter(p -&gt; p.getFileName().toString().endsWith("jar")) - .findAny() - .orElseThrow(() -&gt; new RuntimeException("openlineage-spark jar not found"));

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-22 20:35:10
-
-

*Thread Reply:* It works finally 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 05:16:19
-
-

*Thread Reply:* The PR 😄 -https://github.com/OpenLineage/OpenLineage/pull/2061

-
- - - - - - - -
-
Labels
- integration/spark -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 08:23:49
-
-

*Thread Reply:* thanks for the pr 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 08:24:02
-
-

*Thread Reply:* code formatting checks complain now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 08:25:09
-
-

*Thread Reply:* for the JAR issues, do you also want to create PR as you've fixed the issue on your end?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 09:06:26
-
-

*Thread Reply:* @Abdallah you're using newer version of Java than 8, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 09:07:07
-
-

*Thread Reply:* AFAIK googleJavaFormat behaves differently between Java versions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 09:15:41
-
-

*Thread Reply:* Okay I will switch back to another java version

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 09:25:06
-
-

*Thread Reply:* terra@MacBook-Pro-M3 spark % java -version -java version "1.8.0_381" -Java(TM) SE Runtime Environment (build 1.8.0_381-b09) -Java HotSpot(TM) 64-Bit Server VM (build 25.381-b09, mixed mode)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 09:28:28
-
-

*Thread Reply:* Can you tell me which java version should I use ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 09:49:42
-
-

*Thread Reply:* Hello, I have -@mobuchowski ERROR: Missing environment variable {i} -Can you please check what does it come from ?

-
- - - - - - - -
-
Company
- @getindata -
- -
-
Location
- Warsaw -
- -
-
Repositories
- 16 -
- -
-
Followers
- 25 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - - - - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 09:50:24
-
-

*Thread Reply:* Can you help please ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 10:08:43
-
-

*Thread Reply:* Java 8

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 10:10:14
-
-

*Thread Reply:* ```Hello, I have

- -

@mobuchowski ERROR: Missing environment variable {i} -Can you please check what does it come from ? (edited) ``` -Yup, for now I have to manually make our CI account pick your changes up if you make PR from fork. Just did that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 10:11:10
- -
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 10:53:34
-
-

*Thread Reply:* @Abdallah merged 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 10:59:22
-
-

*Thread Reply:* Thank you !

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-16 14:21:26
-
-

@channel -Meetup notice: on Monday, 9/18, at 5:00 pm ET OpenLineage will be gathering in Toronto at Airflow Summit. Coming to the summit? Based in or near Toronto? Please join us to discuss topics such as: -• recent developments in the project including the addition of static lineage support and the OpenLineage Airflow Provider, -• the project’s history and architecture, -• opportunities to contribute, -• resources for getting started, -• + more. -Please visit medium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|the meetup page> for the specific location (which is not the conference hotel) and to sign up. Hope to see some of you there! (Please note that the start time is 5:00 pm ET.)

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
- ❤️ Julien Le Dem, Maciej Obuchowski, Harel Shein, Paweł Leszczyński, Athitya Kumar, tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-20 17:45:41
-
-

i saw OpenLineage was built into Airflow recently as a provider but the documentation seems really light (https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html), is the documentation from openlineage the correct way I should proceed?

- -

https://openlineage.io/docs/integrations/airflow/usage

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-08-21 20:26:56
-
-

*Thread Reply:* openlineage-airflow is the package maintained in the OpenLineage project and to be used for versions of Airflow before 2.7. You could use it with 2.7 as well but you’d be staying on the “old” integration. -apache-airflow-providers-openlineage is the new package, maintained in the Airflow project that can be used starting Airflow 2.7 and is the recommended package moving forward. It is compatible with the configuration of the old package described in that usage page. CC: @Maciej Obuchowski @Jakub Dardziński It looks like this page needs improvement.

-
-
PyPI
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-22 05:03:28
-
-

*Thread Reply:* Yeah, I'll fix that

- - - -
- :gratitude_thank_you: Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-22 17:55:08
-
-

*Thread Reply:* https://github.com/apache/airflow/pull/33610

- -

fyi

-
- - - - - - - -
-
Labels
- area:providers, kind:documentation, provider:openlineage -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- 🙌 ldacey, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-22 17:54:20
-
-

do I label certain raw data sources as a dataset, for example SFTP/FTP sites, 0365 emails, etc? I extract that data into a bucket for the client in a "folder" called "raw" which I know will be an OL Dataset. Would this GCS folder (after extracting the data with Airflow) be the first Dataset OL is aware of?

- -

<gcs://client-bucket/source-system-lob/raw>

- -

I then process that data into partitioned parquet datasets which would also be OL Datasets: -<gcs://client-bucket/source-system-lob/staging> -<gcs://client-bucket/source-system-lob/analytics>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-22 18:02:46
-
-

*Thread Reply:* that really depends on the use case IMHO -if you consider a whole directory/folder as a dataset (meaning that each file inside folds into a larger whole) you should label dataset as directory

- -

you might as well have directory with each file being something different - in this case it would be best to set each file separately as dataset

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-22 18:04:32
-
-

*Thread Reply:* there was also SymlinksDatasetFacet introduced to store alternative dataset names, might be useful: https://github.com/OpenLineage/OpenLineage/pull/936

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-22 18:07:26
-
-

*Thread Reply:* cool, yeah in general each file is just a snapshot of data from a client (for example, daily dump). the parquet datasets are normally partitioned and might have small fragments and I definitely picture it as more of a table than individual files

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 08:22:09
-
-

*Thread Reply:* Agree with Jakub here - with object storage, people use different patterns, but usually some directory layer vs file is the valid abstraction level, especially if your pattern is adding files with new data inside

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-25 10:26:52
-
-

*Thread Reply:* I tested a dataset for each raw file versus the folder and the folder looks much cleaner (not sure if I can collapse individual datasets/files into a group?)

- -

from 2022, this particular source had 6 raw schema changes (client controlled, no warning). what should I do to make that as obvious as possible if I track the dataset at a folder level?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-25 10:32:19
-
-

*Thread Reply:* I was thinking that I could name the dataset based on the schema_version (identified by the raw column names), so in this example I would have 6 OL datasets feeding into one "staging" dataset

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-25 10:32:57
-
-

*Thread Reply:* not sure what the best practice would be in this scenario though

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-22 17:55:38
-
-

• also saw the docs reference URI = gs://{bucket name}{path} and I wondered if the path would include the filename, or if it was just the base path like I showed above

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-22 18:35:45
-
-

Has anyone managed to get the OL Airflow integration to work on AWS MWAA? We've tried pretty much every trick but still ended up with the following error: -Broken plugin: [openlineage.airflow.plugin] No module named 'openlineage.airflow'; 'openlineage' is not a package

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 05:22:18
-
-

*Thread Reply:* Which version are you trying to use?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 05:22:45
-
-

*Thread Reply:* Both OL and MWAA/Airflow 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 05:23:52
-
-

*Thread Reply:* 'openlineage' is not a package -suggests that something went wrong with import process, for example cycle in import path

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-23 16:50:34
-
-

*Thread Reply:* MWAA: 2.6.3 -OL: 1.0.0

- -

I can see from the log that OL has been successfully installed to the webserver: -Successfully installed openlineage-airflow-1.0.0 openlineage-integration-common-1.0.0 openlineage-python-1.0.0 openlineage-sql-1.0.0 -This is the full stacktrace: -```Traceback (most recent call last):

- -

File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/pluginsmanager.py", line 229, in loadentrypointplugins -pluginclass = entrypoint.load() -File "/usr/local/airflow/.local/lib/python3.10/site-packages/importlibmetadata/init.py", line 209, in load -module = importmodule(match.group('module')) -File "/usr/lib/python3.10/importlib/init.py", line 126, in importmodule -return bootstrap.gcdimport(name[level:], package, level) -File "<frozen importlib.bootstrap>", line 1050, in gcdimport -File "<frozen importlib.bootstrap>", line 1027, in _findandload -File "<frozen importlib.bootstrap>", line 992, in findandloadunlocked -File "<frozen importlib.bootstrap>", line 241, in _callwithframesremoved -File "<frozen importlib.bootstrap>", line 1050, in _gcdimport -File "<frozen importlib.bootstrap>", line 1027, in _findandload -File "<frozen importlib.bootstrap>", line 1001, in findandloadunlocked -ModuleNotFoundError: No module named 'openlineage.airflow'; 'openlineage' is not a package```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-24 08:18:36
-
-

*Thread Reply:* It’s taking long to update MWAA environment but I tested 2.6.3 version with the followingrequirements.txt: -openlineage-airflow -and -openlineage-airflow==1.0.0 -is there any step that might lead to some unexpected results?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 08:29:30
-
-

*Thread Reply:* Yeah, it takes forever to update MWAA even for a simple change. If you open either the webserver log (in CloudWatch) or the AirFlow UI, you should see the above error message.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-24 08:33:53
-
-

*Thread Reply:* The thing is that I don’t see any error messages. -I wrote simple DAG to test too: -```from future import annotations

- -

from datetime import datetime

- -

from airflow.models import DAG

- -

try: - from airflow.operators.empty import EmptyOperator -except ModuleNotFoundError: - from airflow.operators.dummy import DummyOperator as EmptyOperator # type: ignore

- -

from openlineage.airflow.adapter import OpenLineageAdapter -from openlineage.client.client import OpenLineageClient

- -

from airflow.operators.python import PythonOperator

- -

DAGID = "exampleol"

- -

def callable(): - client = OpenLineageClient() - adapter = OpenLineageAdapter() - print(client, adapter)

- -

with DAG( - dagid=DAGID, - startdate=datetime(2021, 1, 1), - schedule="@once", - catchup=False, -) as dag: - begin = EmptyOperator(taskid="begin")

- -
test = PythonOperator(task_id='print_client', python_callable=callable)```
-
- -

and it gives expected results as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 08:48:11
-
-

*Thread Reply:* Oh how interesting. I did have a plugin that sets the endpoint & key via env var. Let me try to disable that to see if it fixes the issue. Will report back after 30 mins, or however long it takes to update MWAA 😉

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-24 08:50:05
-
-

*Thread Reply:* ohh, I see -you probably followed this guide: https://aws.amazon.com/blogs/big-data/automate-data-lineage-on-amazon-mwaa-with-openlineage/?

-
-
Amazon Web Services
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 09:04:27
-
-

*Thread Reply:* Actually no. I'm not aware of this guide. I assume it's outdated already?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-24 09:04:54
-
-

*Thread Reply:* tbh I don’t know

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 09:04:55
-
-

*Thread Reply:* Actually while we're on that topic, what's the recommended way to pass the URL & API Key in MWAA?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-24 09:28:00
-
-

*Thread Reply:* I think it's still a plugin that sets env vars

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 09:32:18
-
-

*Thread Reply:* Yeah based on the page you shared, secret manager + plugin seems like the way to go.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 10:31:50
-
-

*Thread Reply:* Alas after disabling the plugin and restarting the cluster, I'm still getting the same error. Do you mind to share a screenshot of your cluster's settings so I can compare?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-24 11:57:04
-
-

*Thread Reply:* Are you maybe importing some top level OpenLineage code anywhere? This error is most likely circular import

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 12:01:12
-
-

*Thread Reply:* Let me try removing all the dags to see if it helps.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 18:42:49
-
-

*Thread Reply:* @Maciej Obuchowski you were correct! It was indeed the DAGs. The errors are gone after removing all the dags. Now just need to figure what caused the circular import since I didn't import OL directly in DAG.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-24 18:44:33
-
-

*Thread Reply:* Could this be the issue? -from airflow.lineage.entities import File, Table -How could I declare lineage manually if I can't import these classes?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-25 06:52:47
-
-

*Thread Reply:* @Mars Lan I'll look in more details next week, as I'm in transit now

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-25 06:53:18
-
-

*Thread Reply:* but if you could narrow down a problem to single dag that I or @Jakub Dardziński could reproduce, ideally locally, it would help a lot

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-08-25 07:07:11
-
-

*Thread Reply:* Thanks. I think I understand how this works much better now. Found a few useful BQ example dags. Will give them a try and report back.

- - - -
- 🔥 Jakub Dardziński, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Nitin - (nitinkhannain@yahoo.com) -
-
2023-08-23 07:14:44
-
-

Hi All, -I want to capture, source and target table details as lineage information with openlineage for Amazon Redshift. Please let me know, if anyone has done it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-23 07:32:19
-
-

*Thread Reply:* are you using Airflow to connect to Redshift?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nitin - (nitinkhannain@yahoo.com) -
-
2023-08-24 06:50:05
-
-

*Thread Reply:* Hi @Jakub Dardziński, -Thank you for your reply. -No, we are not using Airflow. -We are using load/Unload cmd with Pyspark and also Pandas with JDBC connection

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-25 13:28:37
-
-

*Thread Reply:* @Paweł Leszczyński might know answer if Spark<->OL integration works with Redshift. Eventually JDBC is supported with sqlparser

- -

for Pandas I think there wasn’t too much work done

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-28 02:18:49
-
-

*Thread Reply:* @Nitin If you're using jdbc within Spark, the lineage should be obtained via sqlparser-rs library https://github.com/sqlparser-rs/sqlparser-rs. In case it's not, please try to provide some minimal SQL code (or pyspark) which leads to uncaught lineage.

-
- - - - - - - -
-
Stars
- 1980 -
- -
-
Language
- Rust -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nitin - (nitinkhannain@yahoo.com) -
-
2023-08-28 04:53:03
-
-

*Thread Reply:* Hi @Jakub Dardziński / @Paweł Leszczyński, thank you for taking out time to reply on my query. We need to capture only load and unload query lineage which we are running using Spark.

- -

If you have any sample implementation for reference, it will be indeed helpful

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-28 06:12:46
-
-

*Thread Reply:* I think we don't support load yet on our side: https://github.com/OpenLineage/OpenLineage/blob/main/integration/sql/impl/src/visitor.rs#L8

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nitin - (nitinkhannain@yahoo.com) -
-
2023-08-28 08:18:14
-
-

*Thread Reply:* Yeah! any way you can think of, we can accommodate it specially load and unload statement. -Also, we would like to capture, lineage information where our endpoints are Sagemaker and Redis

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Nitin - (nitinkhannain@yahoo.com) -
-
2023-08-28 13:20:37
-
-

*Thread Reply:* @Paweł Leszczyński can we use this code base integration/common/openlineage/common/provider/redshift_data.py for redshift lineage capture

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-28 14:26:40
-
-

*Thread Reply:* it still expects input and output tables that are usually retrieved from sqlparser

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-28 14:31:00
-
-

*Thread Reply:* for Sagemaker there is an Airflow integration written, might be an example possibly -https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/openlineage/airflow/extractors/sagemaker_extractors.py

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-08-23 10:55:10
-
-

Approve a new release please 🙂 -• Fix spark integration filtering Databricks events.

- - - -
- ➕ Abdallah, Tristan GUEZENNEC -CROIX-, Mouad MOUSSABBIH, Ayoub Oudmane, Asmae Tounsi, Jakub Dardziński, Michael Robinson, Harel Shein, Willy Lulciuc, Maciej Obuchowski, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-23 12:27:15
-
-

*Thread Reply:* Thank you for requesting a release @Abdallah. Three +1s from committers will authorize.

- - - -
- 🙌 Abdallah -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-23 13:13:18
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated within 2 business days.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-23 13:08:48
-
-

Hey folks! Do we have clear step-by-step documentation on how we can leverage the ServiceLoader based approach for injecting specific OpenLineage customisations for tweaking the transport type with defaults / tweaking column level lineage etc?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 13:24:32
- -
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 13:29:05
-
-

*Thread Reply:* For custom transport, you have to provide implementation of interface https://github.com/OpenLineage/OpenLineage/blob/4a1a5c3bf9767467b71ca0e1b6d820ba9e[…]ain/java/io/openlineage/client/transports/TransportBuilder.java and point to it in META_INF file

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 13:29:52
-
-

*Thread Reply:* But if I understand correctly, if you want to change behavior rather than extend, the correct way may be to either contribute it to repo - if that behavior is useful to anyone, or fork the repo

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-23 15:14:43
-
-

*Thread Reply:* @Maciej Obuchowski - Can you elaborate more on the "point to it in META_INF file"? Let's say we have the custom transport type built in a standalone jar by extending transport builder - what're the exact next steps to use this custom transport in the standalone jar when doing spark-submit?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-23 15:23:13
-
-

*Thread Reply:* @Athitya Kumar your jar needs to have META-INF/services/io.openlineage.client.transports.TransportBuilder with fully qualified class names of your custom TransportBuilders there - like openlineage-spark has -io.openlineage.client.transports.HttpTransportBuilder -io.openlineage.client.transports.KafkaTransportBuilder -io.openlineage.client.transports.ConsoleTransportBuilder -io.openlineage.client.transports.FileTransportBuilder -io.openlineage.client.transports.KinesisTransportBuilder

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-08-25 01:49:29
-
-

*Thread Reply:* @Maciej Obuchowski - I think this change may be required for consumers to leverage custom transports, can you check & verify this GH comment? -https://github.com/OpenLineage/OpenLineage/issues/2007#issuecomment-1690350630

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-25 06:52:30
-
-

*Thread Reply:* Probably, I will look at more details next week @Athitya Kumar as I'm in transit

- - - -
- 👍 Athitya Kumar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-23 15:04:10
-
-

@channel -We released OpenLineage 1.1.0, including: -Additions: -• Flink: create Openlineage configuration based on Flink configuration #2033 @pawel-big-lebowski -• Java: add Javadocs to the Java client #2004 @julienledem -• Spark: append output dataset name to a job name #2036 @pawel-big-lebowski -• Spark: support Spark 3.4.1 #2057 @pawel-big-lebowski -Fixes: -• Flink: fix a bug when getting schema for KafkaSink #2042 @pentium3 -• Spark: fix ignored event adaptive_spark_plan in Databricks #2061 @algorithmy1 -Plus additional bug fixes, doc changes and more. -Thanks to all the contributors, especially new contributors @pentium3 and @Abdallah! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.1.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.0.0...1.1.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👏 Ayoub Oudmane, Abdallah, Yuanli Wang, Athitya Kumar, Mars Lan, Maciej Obuchowski, Harel Shein, Kiran Hiremath, Thomas Abraham -
- -
- :gratitude_thank_you: GitHubOpenLineageIssues -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-25 10:29:23
-
-

@channel -Friendly reminder: our next in-person meetup is next Wednesday, August 30th in San Francisco at Astronomer’s offices in the Financial District. You can sign up and find the details on the medium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|meetup event page>.

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-08-25 10:57:30
-
-

hi Openlineage team , we would like to join one of your meetups(me and @Madhav Kakumani nad @Phil Rolph and we're wondering if you are hosting any meetups after the 18/9 ? We are trying to join this but air - tickets are quite expensive

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-25 11:32:12
-
-

*Thread Reply:* there will certainly be more meetups, don’t worry about that!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-08-25 11:32:30
-
-

*Thread Reply:* where are you located? perhaps we can try to organize a meetup closer to where you are.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
George Polychronopoulos - (george.polychronopoulos@6point6.co.uk) -
-
2023-08-25 11:49:37
-
-

*Thread Reply:* Thanks a lot for the response, we are in London. We'd be glad to help you organise a meetup and also meet in person!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-25 11:51:39
-
-

*Thread Reply:* This is awesome, thanks @George Polychronopoulos. I’ll start a channel and invite you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-28 04:47:53
-
-

hi folks, I'm looking into exporting static metadata, and found that DatasetEvent requires a eventTime, which in my mind doesn't make sense for static events. I'm setting it to None and the Python client seems to work, but wanted to ask if I'm missing something.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-28 05:59:10
-
-

*Thread Reply:* Although you emit DatasetEvent, you still emit an event and eventTime is a valid marker.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-28 06:01:40
-
-

*Thread Reply:* so, should I use the current time at the moment of emitting it and that's it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-28 06:01:53
-
-

*Thread Reply:* yes, that should be it

- - - -
- :gratitude_thank_you: Juan Luis Cano Rodríguez -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-28 04:49:21
-
-

and something else: I understand that Marquez does not yet support the 2.0 spec, hence it's incompatible with static metadata right? I tried to emit a list of DatasetEvent s and got HTTPError: 422 Client Error: Unprocessable Entity for url: <http://localhost:3000/api/v1/lineage> (I'm using a FileTransport for now)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-28 06:02:49
-
-

*Thread Reply:* marquez is not capable of reflecting DatasetEvents in DB but it should respond with Unsupported event type

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-28 06:03:15
-
-

*Thread Reply:* and return 200 instead of 201 created

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-28 06:05:41
-
-

*Thread Reply:* I'll have a deeper look then, probably I'm doing something wrong. thanks @Paweł Leszczyński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Joshua Dotson - (josdotso@cisco.com) -
-
2023-08-28 13:25:58
-
-

Hi folks. I have some pure golang jobs from which I need to emit OL events to Marquez. Is the right way to go about this to generate a Golang client from the Marquez OpenAPI spec and use that client from my go jobs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-28 14:23:24
-
-

*Thread Reply:* I'd rather generate them from OL spec (compliant with JSON Schema)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Joshua Dotson - (josdotso@cisco.com) -
-
2023-08-28 15:12:21
-
-

*Thread Reply:* I'll look into this. I take you to mean that I would use the OL spec which is available as a set of JSON schemas to create the data object and then HTTP POST it using vanilla Golang. Is that correct? Thank you for your help!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-28 15:30:05
-
-

*Thread Reply:* Correct! You’re also very welcome to contribute Golang client (currently we have Python & Java clients) if you manage to send events using golang 🙂

- - - -
- 👏 Joshua Dotson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-28 17:28:31
-
-

@channel -The agenda for the medium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|Toronto Meetup at Airflow Summit> on 9/18 has been updated. This promises to be an exciting, richly productive discussion. Don’t miss it if you’ll be in the area!

- -
  1. Intros
  2. Evolution of spec presentation/discussion (project background/history)
  3. State of the community
  4. Spark/Column lineage update
  5. Airflow Provider update
  6. Roadmap Discussion
  7. Action items review/next steps
  8. -
-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
- ❤️ Jarek Potiuk, Paweł Leszczyński, tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-28 20:05:37
-
-

New on the OpenLineage blog: a close look at the new OpenLineage Airflow Provider, including: -• the critical improvements it brings to the integration -• the high-level design -• implementation details -• an example operator -• planned enhancements -• a list of supported operators -• more. -The post, by @Maciej Obuchowski, @Julien Le Dem and myself is live now on the OpenLineage blog.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🎉 Drew Meyers, Harel Shein, Maciej Obuchowski, Julian LaNeve, Mars Lan -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sarwat Fatima - (sarwatfatimam@gmail.com) -
-
2023-08-29 03:18:04
-
-

Hello, I'm currently in the process of following the instructions outlined in the provided getting started guide at https://openlineage.io/getting-started/. However, I've encountered a problem while attempting to complete *Step 1* of the guide. Unfortunately, I'm encountering an internal server error at this stage. I did manage to successfully run Marquez, but it appears that there might be an issue that needs to be addressed. I have attached screen shots.

- -
- - - - - - - - - -
-
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-08-29 03:20:18
-
-

*Thread Reply:* is 5000 port taken by any other application? or ./docker/up.sh has some errors in logs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sarwat Fatima - (sarwatfatimam@gmail.com) -
-
2023-08-29 05:23:01
-
-

*Thread Reply:* @Jakub Dardziński 5000 port is not taken by any other application. The logs show some errors but I am not sure what is the issue here.

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-29 10:02:38
-
-

*Thread Reply:* I think Marquez is running on WSL while you're trying to connect from host computer?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-29 05:20:39
-
-

hi folks, for now I'm producing .jsonl (or .ndjson ) files with one event per line, do you know if there's any way to validate those? would standard JSON Schema tools work?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-08-29 10:58:29
-
-

*Thread Reply:* reply by @Julian LaNeve: yes 🙂💯

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-29 13:12:32
-
-

for namespaces, if my data is moving between sources (SFTP -> GCS -> Azure Blob (synapse connects to parquet datasets) then should my namespace be based on the client I am working with? my current namespace has been to refer to the bucket, but that falls apart when considering the data sources and some destinations. perhaps I should just add a field for client-name instead to have a consolidated view?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-30 10:53:08
-
-

*Thread Reply:* > then should my namespace be based on the client I am working with? -I think each of those sources should be a different namespace?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-30 12:59:53
-
-

*Thread Reply:* got it, yeah I was kind of picturing as one namespace for the client (we handle many clients but they are completely distinct entities). I was able to get it to work with multiple namespaces like you suggested and Marquez was able to plot everything correctly in the visualization

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-08-30 13:01:18
-
-

*Thread Reply:* I noticed some of my Dataset facets make more sense as Run facets, for example, the name of the specific file I processed and how many rows of data / size of the data for that schedule. that won't impact the Run facets Airflow provides right? I can still have the schedule information + my custom run facets?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-08-30 13:06:38
-
-

*Thread Reply:* Yes, unless you name it the same as one of the Airflow facets 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
GitHubOpenLineageIssues - (githubopenlineageissues@gmail.com) -
-
2023-08-30 08:15:29
-
-

Hi, Will really appreciate if someone can guide me or provide me any pointer - if they have been able to implement authentication/authorization for access to Marquez. Have not seen much info around it. Any pointers greatly appreciated. Thanks in advance.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-08-30 12:23:18
-
-

*Thread Reply:* I’ve seen people do this through the ingress controller in Kubernetes. Unfortunately I don’t have documentation besides k8s specific ones you would find for the ingress controller you’re using. You’d redirect any unauthenticated request to your identity provider

- - - -
- :gratitude_thank_you: GitHubOpenLineageIssues -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-08-30 11:50:05
-
-

@channel -Friendly reminder: there’s a meetup tonight at Astronomer’s offices in SF!

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-08-30 12:15:31
-
-

*Thread Reply:* I’ll be there and looking forward to see @John Lukenoff ‘s presentation

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Barrientos - (mbarrien@gmail.com) -
-
2023-08-30 21:38:31
-
-

Can anyone let 3 people stuck downstairs into the 7th floor?

- - - -
- 👍 Willy Lulciuc -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-08-30 23:25:21
-
-

*Thread Reply:* Sorry about that!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yunhe - (yunhe52203334@outlook.com) -
-
2023-08-31 02:31:48
-
-

hello,everyone,i can run openLineage spark code in my notebook with python,but when use my idea to execute scala code like this: -import org.apache.spark.internal.Logging -import org.apache.spark.sql.SparkSession -import io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml -import org.apache.spark.scheduler.{SparkListener, SparkListenerApplicationEnd, SparkListenerApplicationStart} -import sun.java2d.marlin.MarlinUtils.logInfo -object Test { - def main(args: Array[String]): Unit = {

- -
val spark = SparkSession
-  .builder()
-  .master("local")
-  .appName("test")
-  .config("spark.jars.packages","io.openlineage:openlineage_spark:0.12.0")
-  .config("spark.extraListeners","io.openlineage.spark.agent.OpenLineageSparkListener")
-  .config("spark.openlineage.transport.type","console")
-  .getOrCreate()
-
-spark.sparkContext.setLogLevel("INFO")
-
-//spark.sparkContext.addSparkListener(new MySparkAppListener)
-import spark.implicits._
-val input = Seq((1, "zs", 2020), (2, "ls", 2023)).toDF("id", "name", "year")
-
-input.select("id", "name").orderBy("id").show()
-
- -

}

- -

}

- -

there is something wrong: -Exception in thread "spark-listener-group-shared" java.lang.NoSuchMethodError: io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml(Ljava/io/InputStream;)Lio/openlineage/client/OpenLineageYaml; - at io.openlineage.spark.agent.ArgumentParser.extractOpenlineageConfFromSparkConf(ArgumentParser.java:114) - at io.openlineage.spark.agent.ArgumentParser.parse(ArgumentParser.java:78) - at io.openlineage.spark.agent.OpenLineageSparkListener.initializeContextFactoryIfNotInitialized(OpenLineageSparkListener.java:277) - at io.openlineage.spark.agent.OpenLineageSparkListener.onApplicationStart(OpenLineageSparkListener.java:267) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:55) - at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) - at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) - at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) - at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) - at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) - at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) - at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446) - at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

- -

i want to know how can i set idea scala environment correctly

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-08-31 02:58:41
-
-

*Thread Reply:* io.openlineage:openlineage_spark:0.12.0 -> could you repeat the steps with newer version?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yunhe - (yunhe52203334@outlook.com) -
-
2023-08-31 03:51:52
-
-

ok,it`s my first use thie lineage tool. first,I added dependences in my pom.xml like this: -<dependency> - <groupId>io.openlineage</groupId> - <artifactId>openlineage-java</artifactId> - <version>0.12.0</version> - </dependency> - <dependency> - <groupId>org.apache.logging.log4j</groupId> - <artifactId>log4j-api</artifactId> - <version>2.7</version> - </dependency> - <dependency> - <groupId>org.apache.logging.log4j</groupId> - <artifactId>log4j-core</artifactId> - <version>2.7</version> - </dependency> - <dependency> - <groupId>org.apache.logging.log4j</groupId> - <artifactId>log4j-slf4j-impl</artifactId> - <version>2.7</version> - </dependency> - <dependency> - <groupId>io.openlineage</groupId> - <artifactId>openlineage-spark</artifactId> - <version>0.30.1</version> - </dependency>

- -

my spark version is 3.3.1 and the version can not change

- -

second, in file Openlineage/intergration/spark I enter command : docker-compose up and follow the steps in this doc: -https://openlineage.io/docs/integrations/spark/quickstart_local -there is no erro when i use notebook to execute pyspark for openlineage and I could get json message. -but after I enter "docker-compose up" ,I want to use my Idea tool to execute scala code like above,the erro happend like above. It seems that I does not configure the environment correctly. so how can i fix the problem .

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-01 05:15:28
-
-

*Thread Reply:* please use latest io.openlineage:openlineage_spark:1.1.0 instead. openlineage-java is already contained in the jar, no need to add it on your own.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sheeri Cabral (Collibra) - (sheeri.cabral@collibra.com) -
-
2023-08-31 15:33:19
-
-

Will the August meeting be put up at https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting soon? (usually it’s up in a few days 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-01 06:00:53
-
-

*Thread Reply:* @Michael Robinson

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-01 17:13:32
-
-

*Thread Reply:* The recording is on the youtube channel here. I’ll update the wiki ASAP

-
-
YouTube
- -
- - - } - - OpenLineage Project - (https://www.youtube.com/@openlineageproject6897) -
- - - - - - - - - - - - - - - - - -
- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-08-31 18:10:20
-
-

It sounds like there have been a few announcements at Google Next: -https://cloud.google.com/data-catalog/docs/how-to/open-lineage -https://cloud.google.com/dataproc/docs/guides/lineage

-
-
Google Cloud
- - - - - - - - - - - - - - - - - -
-
-
Google Cloud
- - - - - - - - - - - - - - - - - -
- - - -
- 🎉 Harel Shein, Willy Lulciuc, Kevin Languasco, Peter Hicks, Maciej Obuchowski, Paweł Leszczyński, Sheeri Cabral (Collibra), Ross Turk, Michael Robinson, Jakub Dardziński, Kiran Hiremath, Laurent Paris, Anastasia Khomyakova -
- -
- 🙌 Harel Shein, Willy Lulciuc, Mars Lan, Peter Hicks, Maciej Obuchowski, Paweł Leszczyński, Eric Veleker, Sheeri Cabral (Collibra), Ross Turk, Michael Robinson -
- -
- ❤️ Willy Lulciuc, Maciej Obuchowski, ldacey, Ross Turk, Michael Robinson -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-09-01 23:09:55
-
-

*Thread Reply:* https://www.youtube.com/watch?v=zvCdrNJsxBo&t=2260s

-
-
YouTube
- -
- - - } - - Google Cloud - (https://www.youtube.com/@googlecloud) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-01 17:16:21
-
-

@channel -The latest issue of OpenLineage News is out now! Please subscribe to get it directly in your inbox each month.

-
-
apache.us14.list-manage.com
- - - - - - - - - - - - - - - -
- - - -
- 🙌 Jakub Dardziński, Maciej Obuchowski -
- -
- 🙌:skin_tone_3: Juan Luis Cano Rodríguez -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-09-04 03:38:28
-
-

Hi guys, I'd like to capture the spark.databricks.clusterUsageTags.clusterAllTags property from databricks. However, the value of this is a list of keys, and therefore cannot be supported by custom environment facet builder. -I was thinking that capturing this property might be useful for most databricks workloads, and whether it might make sense to auto-capture it along with other databricks variables, similar to how we capture mount points for the databricks jobs. -Does this sound okay? If so, then I can help to contribute this functionality

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-04 06:43:47
-
-

*Thread Reply:* Sounds good to me

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-09-11 05:15:03
-
-

*Thread Reply:* Added this here: https://github.com/OpenLineage/OpenLineage/pull/2099

-
- - - - - - - -
-
Labels
- integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-09-04 06:39:05
-
-

Also, another small clarification is that when using MergeIntoCommand, I'm receiving the lineage events on the backend, but I cannot seem to find any logging of the payload when I enable debug mode in openlineage. I remember there was a similar issue reported by another user in the past. May I check if it might be possible to help with this? It's making debugging quite hard for these cases. Thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-04 06:54:12
-
-

*Thread Reply:* I think it only depends on log4j configuration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-04 06:57:15
-
-

*Thread Reply:* ```# Set everything to be logged to the console -log4j.rootCategory=INFO, console -log4j.appender.console=org.apache.log4j.ConsoleAppender -log4j.appender.console.target=System.err -log4j.appender.console.layout=org.apache.log4j.PatternLayout -log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

- -

set the log level for the openlineage spark library

- -

log4j.logger.io.openlineage.spark=DEBUG`` -this is what we have inlog4j.properties` in test environment and it works

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-09-04 11:28:11
-
-

*Thread Reply:* Hmm... I can see the logs for the other commands, like createViewCommand etc. I just cannot see it for any of the delta runs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-05 03:33:03
-
-

*Thread Reply:* that's interesting. So, logging is done here: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/main/java/io/openlineage/spark/agent/EventEmitter.java#L63 and this code is unaware of delta.

- -

The possible problem could be filtering delta events (which we do bcz of delta being noisy)

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-05 03:33:36
-
-

*Thread Reply:* Recently, we've closed that https://github.com/OpenLineage/OpenLineage/issues/1982 which prevents generating events for ` -createOrReplaceTempView

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- -
-
Labels
- integration/spark -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-05 03:35:12
-
-

*Thread Reply:* and this is the code change: https://github.com/OpenLineage/OpenLineage/pull/1987/files

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-09-05 05:19:22
-
-

*Thread Reply:* Hmm I'm a little confused here. I thought we are only filtering out events for certain specific commands, like show table etc. because its noisy right? Some important commands like MergeInto or SaveIntoDataSource used to be logged before, but I notice now that its not being logged anymore... -I'm using 0.23.0 openlineage version.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-05 05:47:51
-
-

*Thread Reply:* yes, we do. it's just sometimes when doing a filter, we can remove too much. but SaveIntoDataSource and MergeInto should be fine, as we do check them within the tests

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-04 21:35:05
-
-

it looks like my dynamic task mapping in Airflow has the same run ID in marquez, so even if I am processing 100 files, there is only one version of the data. is there a way to have a separate version of each dynamic task so I can track the filename etc?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-05 08:54:57
-
-

*Thread Reply:* map_index should be indeed included when calculating run ID (it’s deterministic in Airflow integration) -what version of Airflow are you using btw?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-05 09:04:14
-
-

*Thread Reply:* 2.7.0

- -

I do see this error log in all of my dynamic tasks which might explain it:

- -

[2023-09-05, 00:31:57 UTC] {manager.py:200} ERROR - Extractor returns non-valid metadata: None -[2023-09-05, 00:31:57 UTC] {utils.py:401} ERROR - cannot import name 'get_operator_class' from 'airflow.providers.openlineage.utils' (/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/__init__.py) -Traceback (most recent call last): - File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/utils.py", line 399, in wrapper - return f(**args, ****kwargs) - ^^^^^^^^^^^^^^^^^^ - File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/plugins/listener.py", line 93, in on_running - ****get_custom_facets(task_instance), - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/utils.py", line 148, in get_custom_facets - custom_facets["airflow_mappedTask"] = AirflowMappedTaskRunFacet.from_task_instance(task_instance) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/plugins/facets.py", line 36, in from_task_instance - from airflow.providers.openlineage.utils import get_operator_class -ImportError: cannot import name 'get_operator_class' from 'airflow.providers.openlineage.utils' (/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/__init__.py)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-05 09:05:34
-
-

*Thread Reply:* I only have a few custom operators with the on_complete facet so I think this is a built in one - it runs before my task custom logs for example

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-05 09:06:05
-
-

*Thread Reply:* and any time I messed up my custom facet, the error would be at the bottom of the logs. this is on top, probably an on_start facet?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-05 09:16:32
-
-

*Thread Reply:* seems like some circular import

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-05 09:19:47
-
-

*Thread Reply:* I just tested it manually, it’s a bug in OL provider. let me fix that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-05 10:53:28
-
-

*Thread Reply:* cool, thanks. I am glad it is just a bug, I was afraid dynamic tasks were not supported for a minute there

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-07 11:46:20
-
-

*Thread Reply:* how do the provider updates work? they can be released in between Airflow releases and issues for them are raised on the main Airflow repo?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-07 11:50:07
-
-

*Thread Reply:* generally speaking anything related to OL-Airflow should be placed to Airflow repo, important changes/bug fixes would be implemented in OL repo as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-07 15:40:31
-
-

*Thread Reply:* got it, thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-07 19:43:46
-
-

*Thread Reply:* is there a way for me to install the openlineage provider based on the commit you made to fix the circular imports?

- -

i was going to try to install from Airflow main branch but didnt want to mess anything up

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-07 19:44:39
-
-

*Thread Reply:* I saw it was merged to airflow main but it is not in 2.7.1 and there is no 1.0.3 provider version yet, so I wondered if I could manually install it for the time being

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-08 05:45:48
-
-

*Thread Reply:* https://github.com/apache/airflow/blob/main/BREEZE.rst#preparing-provider-packages -building the provider package on your own could be best idea probably? that depends on how you manage your Airflow instance

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-08 12:01:53
-
-

*Thread Reply:* there's 1.1.0rc1 btw

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-08 13:44:44
-
-

*Thread Reply:* perfect, thanks. I got started with breeze but then stopped haha

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-10 20:29:00
-
-

*Thread Reply:* The dynamic task mapping error is gone, I did run into this:

- -

File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/extractors/base.py", line 70, in disabledoperators - operator.strip() for operator in conf.get("openlineage", "disabledfor_operators").split(";") - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/airflow/.local/lib/python3.11/site-packages/airflow/configuration.py", line 1065, in get - raise AirflowConfigException(f"section/key [{section}/{key}] not found in config")

- -

I am redeploying now with that option added to my config. I guess it did not use the default which should be ""

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-09-10 20:49:17
-
-

*Thread Reply:* added "disabledforoperators" to my openlineage config and it worked (using Airflow helm chart - not sure if that means there is an error because the value I provided should just be the default value, not sure why I needed to explicitly specify it)

- -

openlineage: - disabledforoperators: "" - ...

- -

this is so much better and makes a lot more sense. most of my tasks are dynamic so I was missing a lot of metadata before the fix, thanks!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Abdallah - (abdallah@terrab.me) -
-
2023-09-06 16:43:07
-
-

Hello Everyone,

- -

I've been diving into the Marquez codebase and found a performance bottleneck in JobDao.java for the query related to namespaceName=MyNameSpace limit=10 and 12s with limit=25. I managed to optimize it using CTEs, and the execution times dropped dramatically to 300ms (for limit=100) and under 100ms (for limit=25 ) on the same cluster. -Issue link : https://github.com/MarquezProject/marquez/issues/2608

- -

I believe there's even more room for optimization, especially if we adjust the job_facets_view to include the namespace_name column.

- -

Would the team be open to a PR where I share the optimized query and discuss potential further refinements? I believe these changes could significantly enhance the Marquez web UI experience.

- -

PR link : https://github.com/MarquezProject/marquez/pull/2609

- -

Looking forward to your feedback.

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
- 🔥 Jakub Dardziński, Harel Shein, Paweł Leszczyński, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-06 18:03:01
-
-

*Thread Reply:* @Willy Lulciuc wdyt?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-09-06 17:44:12
-
-

Has there been any conversation on the extensibility of facets/concepts? E.g.: -• how does one extends the list of run states https://openlineage.io/docs/spec/run-cycle to add a paused/resumed state? -• how does one extend https://openlineage.io/docs/spec/facets/run-facets/nominal_time to add a created at field?

-
-
openlineage.io
- - - - - - - - - - - - - - - -
-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-09-06 18:28:17
-
-

*Thread Reply:* Hello Bernat,

- -

The primary mechanism to extend the model is through facets. You can either: -• create new standard facets in the spec: https://github.com/OpenLineage/OpenLineage/tree/main/spec/facets -• create custom facets defined somewhere else with a prefix in their name: https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md#custom-facet-naming -• Update existing facets with a backward compatible change (example: adding an optional field). -The core spec can also be modified. Here is an example of adding a state -That being said I think more granular states like pause/resume are probably better suited in a run facet. There was an issue opened for that particular one a while ago: https://github.com/OpenLineage/OpenLineage/issues/9 maybe that particular discussion can continue there.

- -

For the nominal time facet, You could open an issue describing the use case and on community agreement follow up with a PR on the facet itself: https://github.com/OpenLineage/OpenLineage/blob/main/spec/facets/NominalTimeRunFacet.json -(adding an optional field is backwards compatible)

- - - -
- 👀 Juan Luis Cano Rodríguez -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Bernat Gabor - (gaborjbernat@gmail.com) -
-
2023-09-06 18:31:12
-
-

*Thread Reply:* I see, so in general one is best copying a standard facet and maintain it under a different name. That way can be made mandatory 🙂 and one does not need to be blocked for a long time until there's a community agreement 🤔

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-09-06 18:35:43
-
-

*Thread Reply:* Yes, The goal of custom facets is to allow you to experiment and extend the spec however you want without having to wait for approval. -If the custom facet is very specific to a third party project/product then it makes sense for it to stay a custom facet. -If it is more generic then it makes sense to add it to the core facets as part of the spec. -Hopefully community agreement can be achieved relatively quickly. Unless someone is strongly against something, it can be added without too much red tape. Typically with support in at least one of the integrations to validate the model.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-07 15:12:20
-
-

@channel -This month’s TSC meeting is next Thursday the 14th at 10am PT. On the tentative agenda: -• announcements -• recent releases -• demo: Spark integration tests in Databricks runtime -• open discussion -• more (TBA) -More info and the meeting link can be found on the website. All are welcome! Also, feel free to reply or DM me with discussion topics, agenda items, etc.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-11 10:07:41
-
-

@channel -The first Toronto OpenLineage Meetup, featuring a presentation by recent adopter Metaphor, is just one week away. On the agenda:

- -
  1. Evolution of spec presentation/discussion (project background/history)
  2. State of the community
  3. Integrating OpenLineage with Metaphor (by special guests Ye & Ivan)
  4. Spark/Column lineage update
  5. Airflow Provider update
  6. Roadmap Discussion -Find more details and RSVP https://www.meetup.com/openlineage/events/295488014/?utmmedium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|here.
  7. -
-
-
metaphor.io
- - - - - - - - - - - - - - - - - -
-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
- 🙌 Mars Lan, Jarek Potiuk, Harel Shein, Maciej Obuchowski, Peter Hicks, Paweł Leszczyński, Dongjin Seo -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-09-11 17:07:26
-
-

I’m seeing some odd behavior with my http transport when upgrading airflow/openlineage-airflow from 2.3.2 -> 2.6.3 and 0.24.0 -> 0.28.0. Previously I had a config like this that let me provide my own auth tokens. However, after upgrading I’m getting a 401 from the endpoint and further debugging seems to reveal that we’re not using the token provided in my TokenProvider. Does anyone know if something changed between these versions that could be causing this? (more details in 🧵 ) -transport: - type: http - url: <https://my.fake-marquez-endpoint.com> - auth: - type: some.fully.qualified.classpath

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-09-11 17:09:40
-
-

*Thread Reply:* If I log this line I can tell the TokenProvider is the class instance I would expect: https://github.com/OpenLineage/OpenLineage/blob/45d94fb73b5488d34b8ca544b58317382ceb3980/client/python/openlineage/client/transport/http.py#L55

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-09-11 17:11:14
-
-

*Thread Reply:* However, if I log the token_provider here I get the origin TokenProvider: https://github.com/OpenLineage/OpenLineage/blob/45d94fb73b5488d34b8ca544b58317382ceb3980/client/python/openlineage/client/transport/http.py#L154

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-09-11 17:18:56
-
-

*Thread Reply:* Ah I think I see the issue. Looks like this was introduced here, we are instantiating with the base token provider here when we should be using the subclass: https://github.com/OpenLineage/OpenLineage/pull/1869/files#diff-2f8ea6f9a22b5567de8ab56c6a63da8e7adf40cb436ee5e7e6b16e70a82afe05R57

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-09-11 17:37:42
-
-

*Thread Reply:* Opened a PR for this here: https://github.com/OpenLineage/OpenLineage/pull/2100

-
- - - - - - - -
-
Labels
- client/python -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
- ❤️ Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Sarwat Fatima - (sarwatfatimam@gmail.com) -
-
2023-09-12 08:14:06
-
-

This particular code in docker-compose exits with code 1 because it is unable to find wait-for-it.sh, file in the container. I have checked the mounting path from the local machine, It is correct and the path on the container for Marquez is also correct i.e. /usr/src/app but it is unable to mount the wait-for-it.sh. Does anyone know why is this? This code exists in the open lineage repository as well https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/docker-compose.yml -# Marquez as an OpenLineage Client - api: - image: marquezproject/marquez - container_name: marquez-api - ports: - - "5000:5000" - - "5001:5001" - volumes: - - ./docker/wait-for-it.sh:/usr/src/app/wait-for-it.sh - links: - - "db:postgres" - depends_on: - - db - entrypoint: [ "./wait-for-it.sh", "db:5432", "--", "./entrypoint.sh" ]

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sarwat Fatima - (sarwatfatimam@gmail.com) -
-
2023-09-12 08:15:19
-
-

*Thread Reply:* This is the error message:

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-12 10:38:41
-
-

*Thread Reply:* no permissions?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-12 15:11:45
-
-

I am trying to run Google Cloud Composer where i have added the openlineage-airflow pypi packagae as a dependency and have added the env OPENLINEAGEEXTRACTORS to point to my custom extractor. I have added a folder by name dependencies and inside that i have placed my extractor file, and the path given to OPENLINEAGEEXTRACTORS is dependencies.<filename>.<extractorclass_name>…still it fails with the exception saying No module named ‘dependencies’. Can anyone kindly help me out on correcting my mistake

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-12 17:15:36
-
-

*Thread Reply:* Hey @Guntaka Jeevan Paul, can you share some details on which versions of airflow and openlineage you’re using?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-12 17:16:26
-
-

*Thread Reply:* airflow ---> 2.5.3, openlinegae-airflow ---> 1.1.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-12 17:45:08
-
-

*Thread Reply:* ```import traceback -import uuid -from typing import List, Optional

- -

from openlineage.airflow.extractors.base import BaseExtractor, TaskMetadata -from openlineage.airflow.utils import getjobname

- -

class BigQueryInsertJobExtractor(BaseExtractor): - def init(self, operator): - super().init(operator)

- -
@classmethod
-def get_operator_classnames(cls) -&gt; List[str]:
-    return ['BigQueryInsertJobOperator']
-
-def extract(self) -&gt; Optional[TaskMetadata]:
-    return None
-
-def extract_on_complete(self, task_instance) -&gt; Optional[TaskMetadata]:
-    self.log.debug(f"JEEVAN ---&gt; extract_on_complete({task_instance})")
-    random_uuid = str(uuid.uuid4())
-    self.log.debug(f"JEEVAN ---&gt; Randomly Generated UUID --&gt; {random_uuid}")
-
-    self.operator.job_id = random_uuid
-
-    return TaskMetadata(
-        name=get_job_name(task=self.operator)
-    )```
-
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-12 17:45:24
-
-

*Thread Reply:* this is the custom extractor code that im trying with

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-12 21:10:02
-
-

*Thread Reply:* thanks @Guntaka Jeevan Paul, will try to take a deeper look tomorrow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 07:54:26
-
-

*Thread Reply:* No module named 'dependencies'. -This sounds like general Python problem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 07:55:12
-
-

*Thread Reply:* https://stackoverflow.com/questions/69991553/how-to-import-custom-modules-in-cloud-composer

-
-
Stack Overflow
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 07:56:28
-
-

*Thread Reply:* basically, if you're able to import the file from your dag code, OL should be able too

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:01:12
-
-

*Thread Reply:* The Problem is in the GCS Composer there is a component called Triggerer, which they say is used for deferrable operators…i have logged into that pod and i could see that the GCS Bucket is not mounted on this, but i am unable to understand why is the initialisation happening inside the triggerer pod

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:01:32
-
-

*Thread Reply:*

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:01:47
-
-

*Thread Reply:* > The Problem is in the GCS Composer there is a component called Triggerer, which they say is used for deferrable operators…i have logged into that pod and i could see that the GCS Bucket is not mounted on this, but i am unable to understand why is the initialisation happening inside the triggerer pod -OL integration is not running on triggerer, only on worker and scheduler pods

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:01:53
-
-

*Thread Reply:*

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:03:26
-
-

*Thread Reply:* As you can see in this screenshot i am seeing the logs of the triggerer and it says clearly unable to import plugin openlineage

- - - -
-
-
-
- - - - - -
-
- - - - - -
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:10:32
-
-

*Thread Reply:* I see. There are few possible things to do here - composer could mount the user files, Airflow could not start plugins on triggerer, or we could detect we're on triggerer and not import anything there. However, does it impact OL or Airflow operation in other way than this log?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:12:06
-
-

*Thread Reply:* Probably we'd have to do something if that really bothers you as there won't be further changes to Airflow 2.5

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:18:14
-
-

*Thread Reply:* The Problem is it is actually not registering this custom extractor written by me, henceforth i am just receiving the DefaultExtractor things and my piece of extractor code is not even getting triggered

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:22:49
-
-

*Thread Reply:* any suggestions to try @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:27:48
-
-

*Thread Reply:* Could you share worker logs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:27:56
-
-

*Thread Reply:* and check if module is importable from your dag code?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:31:25
-
-

*Thread Reply:* these are the worker pod logs…where there is no log of openlineageplugin

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:31:52
-
-

*Thread Reply:* https://openlineage.slack.com/archives/C01CK9T7HKR/p1694608076879469?thread_ts=1694545905.974339&cid=C01CK9T7HKR --> sure will check now on this one

-
- - -
- - - } - - Maciej Obuchowski - (https://openlineage.slack.com/team/U01RA9B5GG2) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:38:32
-
-

*Thread Reply:* { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py\", line 427, in import_from_string module = importlib.import_module(module_path) File \"/opt/python3.8/lib/python3.8/importlib/__init__.py\", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File \"&lt;frozen importlib._bootstrap&gt;\", line 1014, in _gcd_import File \"&lt;frozen importlib._bootstrap&gt;\", line 991, in _find_and_load File \"&lt;frozen importlib._bootstrap&gt;\", line 961, in _find_and_load_unlocked File \"&lt;frozen importlib._bootstrap&gt;\", line 219, in _call_with_frames_removed File \"&lt;frozen importlib._bootstrap&gt;\", line 1014, in _gcd_import File \"&lt;frozen importlib._bootstrap&gt;\", line 991, in _find_and_load File \"&lt;frozen importlib._bootstrap&gt;\", line 961, in _find_and_load_unlocked File \"&lt;frozen importlib._bootstrap&gt;\", line 219, in _call_with_frames_removed File \"&lt;frozen importlib._bootstrap&gt;\", line 1014, in _gcd_import File \"&lt;frozen importlib._bootstrap&gt;\", line 991, in _find_and_load File \"&lt;frozen importlib._bootstrap&gt;\", line 973, in _find_and_load_unlockedModuleNotFoundError: No module named 'airflow.gcs'", - "insertId": "pt2eu6fl9z5vw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:44.131577764Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, -it doesn't see No module named 'airflow.gcs' that is part of your extractor path airflow.gcs.dags.big_query_insert_job_extractor.BigQueryInsertJobExtractor -however, is it necessary? I generally see people using imports directly from dags folder

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:44:11
-
-

*Thread Reply:* this is one of the experimentation that i have did, but then i reverted it back to keeping it to dependencies.bigqueryinsertjobextractor.BigQueryInsertJobExtractor…where dependencies is a module i have created inside my dags folder

- - - - -
-
-
-
- - - - - -
-
- - - - - -
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:45:46
-
-

*Thread Reply:* these are the logs of the triggerer pod specifically

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:46:31
-
-

*Thread Reply:* yeah it would be expected to have this in triggerer where it's not mounted, but will it behave the same for worker where it's mounted?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:47:09
-
-

*Thread Reply:* maybe ___init___.py is missing for top-level dag path?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:49:01
-
-

*Thread Reply:* these are the logs of the worker pod at startup, where it does not complain of the plugin like in triggerer, but when tasks are run on this worker…somehow it is not picking up the extractor for the operator that i have written it for

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 08:49:54
-
-

*Thread Reply:* https://openlineage.slack.com/archives/C01CK9T7HKR/p1694609229577469?thread_ts=1694545905.974339&cid=C01CK9T7HKR --> you mean to make the dags folder as well like a module by adding the init.py?

-
- - -
- - - } - - Maciej Obuchowski - (https://openlineage.slack.com/team/U01RA9B5GG2) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:55:24
-
-

*Thread Reply:* yes, I would put whole custom code directly in dags folder, to make sure import paths are the problem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 08:55:48
-
-

*Thread Reply:* and would be nice if you could set -AIRFLOW__LOGGING__LOGGING_LEVEL="DEBUG"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 09:14:58
-
-

*Thread Reply:* ```Starting the process, got command: triggerer -Initializing airflow.cfg. -airflow.cfg initialization is done. -[2023-09-13T13:11:46.620+0000] {settings.py:267} DEBUG - Setting up DB connection pool (PID 8) -[2023-09-13T13:11:46.622+0000] {settings.py:372} DEBUG - settings.prepareengineargs(): Using pool settings. poolsize=5, maxoverflow=10, poolrecycle=570, pid=8 -[2023-09-13T13:11:46.742+0000] {cliactionloggers.py:39} DEBUG - Adding <function defaultactionlog at 0x7ff39ca1d3a0> to pre execution callback -[2023-09-13T13:11:47.638+0000] {cliactionloggers.py:65} DEBUG - Calling callbacks: [<function defaultactionlog at 0x7ff39ca1d3a0>] - __ ___ - _ |( )__ / /_ _ -_ /| |_ / / /_ _ / _ _ | /| / / -_ | / _ / _ _/ _ / / // /_ |/ |/ / - // |// // // // _/_/|/ -[2023-09-13T13:11:50.527+0000] {pluginsmanager.py:300} DEBUG - Loading plugins -[2023-09-13T13:11:50.580+0000] {pluginsmanager.py:244} DEBUG - Loading plugins from directory: /home/airflow/gcs/plugins -[2023-09-13T13:11:50.581+0000] {pluginsmanager.py:224} DEBUG - Loading plugins from entrypoints -[2023-09-13T13:11:50.587+0000] {pluginsmanager.py:227} DEBUG - Importing entrypoint plugin OpenLineagePlugin -[2023-09-13T13:11:50.740+0000] {utils.py:430} WARNING - No module named 'boto3' -[2023-09-13T13:11:50.743+0000] {utils.py:430} WARNING - No module named 'botocore' -[2023-09-13T13:11:50.833+0000] {utils.py:430} WARNING - No module named 'airflow.providers.sftp' -[2023-09-13T13:11:51.144+0000] {utils.py:430} WARNING - No module named 'bigqueryinsertjobextractor' -[2023-09-13T13:11:51.145+0000] {pluginsmanager.py:237} ERROR - Failed to import plugin OpenLineagePlugin -Traceback (most recent call last): - File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py", line 427, in importfromstring - module = importlib.importmodule(modulepath) - File "/opt/python3.8/lib/python3.8/importlib/init.py", line 127, in importmodule - return bootstrap.gcdimport(name[level:], package, level) - File "<frozen importlib.bootstrap>", line 1014, in gcdimport - File "<frozen importlib.bootstrap>", line 991, in _findandload - File "<frozen importlib.bootstrap>", line 973, in findandloadunlocked -ModuleNotFoundError: No module named 'bigqueryinsertjobextractor'

- -

The above exception was the direct cause of the following exception:

- -

Traceback (most recent call last): - File "/opt/python3.8/lib/python3.8/site-packages/airflow/pluginsmanager.py", line 229, in loadentrypointplugins - pluginclass = entrypoint.load() - File "/opt/python3.8/lib/python3.8/site-packages/setuptools/vendor/importlibmetadata/init.py", line 194, in load - module = importmodule(match.group('module')) - File "/opt/python3.8/lib/python3.8/importlib/init.py", line 127, in importmodule - return _bootstrap.gcdimport(name[level:], package, level) - File "<frozen importlib.bootstrap>", line 1014, in gcdimport - File "<frozen importlib.bootstrap>", line 991, in _findandload - File "<frozen importlib.bootstrap>", line 975, in findandloadunlocked - File "<frozen importlib.bootstrap>", line 671, in _loadunlocked - File "<frozen importlib.bootstrapexternal>", line 843, in execmodule - File "<frozen importlib.bootstrap>", line 219, in callwithframesremoved - File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/plugin.py", line 32, in <module> - from openlineage.airflow import listener - File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/listener.py", line 75, in <module> - extractormanager = ExtractorManager() - File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/extractors/manager.py", line 16, in init - self.tasktoextractor = Extractors() - File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/extractors/extractors.py", line 122, in init - extractor = importfromstring(extractor.strip()) - File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py", line 431, in importfromstring - raise ImportError(f"Failed to import {path}") from e -ImportError: Failed to import bigqueryinsertjobextractor.BigQueryInsertJobExtractor -[2023-09-13T13:11:51.235+0000] {pluginsmanager.py:227} DEBUG - Importing entrypoint plugin composermenuplugin -[2023-09-13T13:11:51.719+0000] {pluginsmanager.py:316} DEBUG - Loading 1 plugin(s) took 1.14 seconds -[2023-09-13T13:11:51.733+0000] {triggererjob.py:101} INFO - Starting the triggerer -[2023-09-13T13:11:51.734+0000] {selectorevents.py:59} DEBUG - Using selector: EpollSelector -[2023-09-13T13:11:56.118+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:01.359+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:06.665+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:11.880+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:17.098+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:22.323+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:27.597+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:32.826+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:38.049+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:43.275+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:48.509+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:53.867+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:12:59.087+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:04.300+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:09.539+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:14.785+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:20.007+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:25.274+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:30.510+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:35.729+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:40.960+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:46.444+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:51.751+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:13:57.084+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:02.310+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:07.535+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:12.754+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:17.967+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:23.185+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:28.406+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:33.661+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:38.883+0000] {basejob.py:240} DEBUG - [heartbeat] -[2023-09-13T13:14:44.247+0000] {base_job.py:240} DEBUG - [heartbeat]```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 09:15:10
-
-

*Thread Reply:* still the same error in the triggerer pod

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 09:16:23
-
-

*Thread Reply:* have changed the dags folder where i have added the init file as you suggested and then have updated the OPENLINEAGEEXTRACTORS to bigqueryinsertjob_extractor.BigQueryInsertJobExtractor…still the same thing

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 09:36:27
-
-

*Thread Reply:* > still the same error in the triggerer pod -it won't change, we're not trying to fix the triggerer import but worker, and should look only at worker pod at this point

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 09:43:34
-
-

*Thread Reply:* ```extractor for <class 'airflow.providers.google.cloud.operators.bigquery.BigQueryInsertJobOperator'> is <class 'bigqueryinsertjobextractor.BigQueryInsertJobExtractor'

- -

Using extractor BigQueryInsertJobExtractor tasktype=BigQueryInsertJobOperator airflowdagid=dataanalyticsdag taskid=joinbqdatasets.bqjoinholidaysweatherdata2021 airflowrunid=manual_2023-09-13T13:24:08.946947+00:00

- -

fatal: not a git repository (or any parent up to mount point /home/airflow) -Stopping at filesystem boundary (GITDISCOVERYACROSSFILESYSTEM not set). -fatal: not a git repository (or any parent up to mount point /home/airflow) -Stopping at filesystem boundary (GITDISCOVERYACROSSFILESYSTEM not set).```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 09:44:44
-
-

*Thread Reply:* able to see these logs in the worker pod…so what you said is right that it is able to get the extractor but i get the below error immediately where it says not a git repository

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 09:45:24
-
-

*Thread Reply:* seems like we are almost there nearby…am i missing something obvious

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 10:06:35
-
-

*Thread Reply:* > fatal: not a git repository (or any parent up to mount point /home/airflow) -&gt; Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). -&gt; fatal: not a git repository (or any parent up to mount point /home/airflow) -&gt; Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). -hm, this could be the actual bug?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:06:51
-
-

*Thread Reply:* that’s casual log in composer

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:12:16
-
-

*Thread Reply:* extractor for &lt;class 'airflow.providers.google.cloud.operators.bigquery.BigQueryInsertJobOperator'&gt; is &lt;class 'big_query_insert_job_extractor.BigQueryInsertJobExtractor' -that’s actually class from your custom module, right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:14:03
-
-

*Thread Reply:* I’ve done experiment, that’s how gcs looks like

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:14:09
-
-

*Thread Reply:* and env vars

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:14:19
-
-

*Thread Reply:* I have this extractor detected as expected

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:15:06
-
-

*Thread Reply:* seens as &lt;class 'dependencies.bq.BigQueryInsertJobExtractor'&gt;

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:16:02
-
-

*Thread Reply:* no __init__.py in base dags folder

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:17:02
-
-

*Thread Reply:* I also checked that triggerer pod indeed has no gcsfuse set up, tbh no idea why, maybe some kind of optimization -the only effect is that when loading plugins in triggerer it throws some errors in logs, we don’t do anything at the moment there

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 10:19:26
-
-

*Thread Reply:* okk…got it @Jakub Dardziński…so the init at the top level of dags is as well not reqd, got it. Just one more doubt, there is a requirement where i want to change the operators property in the extractor inside the extract function, will that be taken into account and the operator’s execute be called with the property that i have populated in my extractor?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 10:21:28
-
-

*Thread Reply:* for example i want to add a custom jobid to the BigQueryInsertJobOperator, so wheneerv someone uses the BigQueryInsertJobOperator operator i want to intercept that and add this jobid property to the operator…will that work?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:24:46
-
-

*Thread Reply:* I’m not sure if using OL for such thing is best choice. Wouldn’t it be better to subclass the operator?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:25:37
-
-

*Thread Reply:* but the answer is: it dependes on the airflow version, in 2.3+ I’m pretty sure the changed property stays in execute method

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-13 10:27:49
-
-

*Thread Reply:* yeah ideally that is how we should have done this but the problem is our client is having around 1000+ Dag’s in different google cloud projects, which are owned by multiple teams…so they are not willing to change anything in their dag. Thankfully they are using airflow 2.4.3

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 10:31:15
-
-

*Thread Reply:* task_policy might be better tool for that: https://airflow.apache.org/docs/apache-airflow/2.6.0/administration-and-deployment/cluster-policies.html

- - - -
- ➕ Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-13 10:35:30
-
-

*Thread Reply:* btw I double-checked - execute method is in different process so this would not change task’s attribute there

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-16 03:32:49
-
-

*Thread Reply:* @Jakub Dardziński any idea how can we achieve this one. ---> https://openlineage.slack.com/archives/C01CK9T7HKR/p1694849427228709

-
- - -
- - - } - - Guntaka Jeevan Paul - (https://openlineage.slack.com/team/U05QL7LN2GH) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-12 17:26:01
-
-

@here has anyone succeded in getting a custom extractor to work in GCP Cloud Composer or AWS MWAA, seems like there is no way

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-09-12 17:34:29
-
-

*Thread Reply:* I'm getting quite close with MWAA. See https://openlineage.slack.com/archives/C01CK9T7HKR/p1692743745585879.

-
- - -
- - - } - - Mars Lan - (https://openlineage.slack.com/team/U01HVNU6A4C) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 01:44:27
-
-

I am exploring Spark - OpenLineage integration (using the latest PySpark and OL versions). I tested a simple pipeline which: -• Reads JSON data into PySpark DataFrame -• Apply data transformations -• Write transformed data to MySQL database -Observed that we receive 4 events (2 START and 2 COMPLETE) for the same job name. The events are almost identical with a small diff in the facets. All the events share the same runId, and we don't get any parentRunId. -Team, can you please confirm if this behaviour is expected? Seems to be different from the Airflow integration where we relate jobs to Parent Jobs.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-09-13 02:54:37
-
-

*Thread Reply:* The Spark integration requires that two parameters are passed to it, namely:

- -

spark.openlineage.parentJobName -spark.openlineage.parentRunId -You can find the list of parameters here:

- -

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/README.md

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 02:55:51
-
-

*Thread Reply:* Thanks, will check this out

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-09-13 02:57:43
-
-

*Thread Reply:* As for double accounting of events - that's a bit harder to diagnose.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-13 04:33:03
-
-

*Thread Reply:* Can you share the the job and events? -Also @Paweł Leszczyński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 06:03:49
-
-

*Thread Reply:* Sure, sharing Job and events.

- -
- - - - - - - -
-
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 06:06:21
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-13 06:39:02
-
-

*Thread Reply:* Hi @Suraj Gupta,

- -

Thanks for providing such a detailed description of the problem.

- -

It is not expected behaviour, it's an issue. The events correspond to the same logical plan which for some reason lead to sending two OL events. Is it reproducible aka. does it occur each time? If yes, we please feel free to raise an issue for that.

- -

We have added in recent months several tests to verify amount of OL events being generated but we haven't tested it that way with JDBC. BTW. will the same happen if you write your data df_transformed to a file (like parquet file) ?

- - - -
- :gratitude_thank_you: Suraj Gupta -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 07:28:03
-
-

*Thread Reply:* Thanks @Paweł Leszczyński, will confirm about writing to file and get back.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 07:33:35
-
-

*Thread Reply:* And yes, the issue is reproducible. Will raise an issue for this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-13 07:33:54
-
-

*Thread Reply:* even if you write onto a file?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 07:37:21
-
-

*Thread Reply:* Yes, even when I write to a parquet file.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-13 07:49:28
-
-

*Thread Reply:* ok. i think i was able to reproduce it locally with https://github.com/OpenLineage/OpenLineage/pull/2103/files

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-13 07:56:11
-
-

*Thread Reply:* Opened an issue: https://github.com/OpenLineage/OpenLineage/issues/2104

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-25 16:32:09
-
-

*Thread Reply:* @Paweł Leszczyński I see that the PR is work in progress. Any rough estimate on when we can expect this fix to be released?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-26 03:32:03
-
-

*Thread Reply:* @Suraj Gupta put a comment within your issue. it's a bug we need to solve but I cannot bring any estimates today.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-26 04:33:03
-
-

*Thread Reply:* Thanks for update @Paweł Leszczyński, also please look into this comment. It might related and I'm not sure if expected behaviour.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-13 14:20:32
-
-

@channel -This month’s TSC meeting, open to all, is tomorrow: https://openlineage.slack.com/archives/C01CK9T7HKR/p1694113940400549

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-09-14 06:20:15
-
-

Context:

- -

We use Spark with YARN, running on Hadoop 2.x (I can't remember the exact minor version) with Hive support.

- -

Problem:

- -

I'm noticed that CreateDataSourceAsSelectCommand objects are always transformed to an OutputDataset with a namespace value set to file - which is curious, because the inputs always have a (correct) namespace of hdfs://&lt;name-node&gt; - is this a known issue? A flaw with Apache Spark? A bug in the resolution logic?

- -

For reference:

- -

```public class CreateDataSourceTableCommandVisitor - extends QueryPlanVisitor<CreateDataSourceTableCommand, OpenLineage.OutputDataset> {

- -

public CreateDataSourceTableCommandVisitor(OpenLineageContext context) { - super(context); - }

- -

@Override - public List<OpenLineage.OutputDataset> apply(LogicalPlan x) { - CreateDataSourceTableCommand command = (CreateDataSourceTableCommand) x; - CatalogTable catalogTable = command.table();

- -
return Collections.singletonList(
-    outputDataset()
-        .getDataset(
-            PathUtils.fromCatalogTable(catalogTable),
-            catalogTable.schema(),
-            OpenLineage.LifecycleStateChangeDatasetFacet.LifecycleStateChange.CREATE));
-
- -

} -}`` -Running this:cat events.log | jq '{eventTime: .eventTime, eventType: .eventType, runId: .run.runId, jobNamespace: .job.namespace, jobName: .job.name, outputs: .outputs[] | {namespace: .namespace, name: .name}, inputs: .inputs[] | {namespace: .namespace, name: .name}}'`

- -

This is an output: -{ - "eventTime": "2023-09-13T16:01:27.059Z", - "eventType": "START", - "runId": "bbbb5763-3615-46c0-95ca-1fc398c91d5d", - "jobNamespace": "spark.cluster-1", - "jobName": "ol_hadoop_test.execute_create_data_source_table_as_select_command.dhawes_db_ol_test_hadoop_tgt", - "outputs": { - "namespace": "file", - "name": "/user/hive/warehouse/dhawes.db/ol_test_hadoop_tgt" - }, - "inputs": { - "namespace": "<hdfs://nn1>", - "name": "/user/hive/warehouse/dhawes.db/ol_test_hadoop_src" - } -}

- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-14 07:32:25
-
-

*Thread Reply:* Seems like an issue on our side. Do you know how the source is read? What LogicalPlan leaf is used to read src? Would love to find how is this done differently

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-09-14 09:16:58
-
-

*Thread Reply:* Hmm, I'll have to do explain plan to see what exactly it is.

- -

However my sample job uses spark.sql("SELECT ** FROM dhawes.ol_test_hadoop_src")

- -

which itself is created using

- -

spark.sql("SELECT 1 AS id").write.format("orc").mode("overwrite").saveAsTable("dhawes.ol_test_hadoop_src")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-09-14 09:23:59
-
-

*Thread Reply:* ``&gt;&gt;&gt; spark.sql("SELECT ** FROM dhawes.ol_test_hadoop_src").explain(True) -== Parsed Logical Plan == -'Project [**] -+- 'UnresolvedRelationdhawes.oltesthadoop_src`

- -

== Analyzed Logical Plan == -id: int -Project [id#3] -+- SubqueryAlias dhawes.ol_test_hadoop_src - +- Relation[id#3] orc

- -

== Optimized Logical Plan == -Relation[id#3] orc

- -

== Physical Plan == -**(1) FileScan orc dhawes.oltesthadoop_src[id#3] Batched: true, Format: ORC, Location: InMemoryFileIndex[], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:int>```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
tati - (tatiana.alchueyr@astronomer.io) -
-
2023-09-14 10:03:41
-
-

Hey everyone, -Any chance we could have a openlineage-integration-common 1.1.1 release with the following changes..? -• https://github.com/OpenLineage/OpenLineage/pull/2106 -• https://github.com/OpenLineage/OpenLineage/pull/2108

- - - -
- ➕ Michael Robinson, Harel Shein, Maciej Obuchowski, Jakub Dardziński, Paweł Leszczyński, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
tati - (tatiana.alchueyr@astronomer.io) -
-
2023-09-14 10:05:19
-
-

*Thread Reply:* Specially the first PR is affecting users of the astronomer-cosmos library: https://github.com/astronomer/astronomer-cosmos/issues/533

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-14 10:05:24
-
-

*Thread Reply:* Thanks @tati for requesting your first OpenLineage release! Three +1s from committers will authorize

- - - -
- :gratitude_thank_you: tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-14 11:59:55
-
-

*Thread Reply:* The release is authorized and will be initiated within two business days.

- - - -
- 🎉 tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
tati - (tatiana.alchueyr@astronomer.io) -
-
2023-09-15 04:40:12
-
-

*Thread Reply:* Thanks a lot, @Michael Robinson!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-09-14 20:23:01
-
-

Per discussion in the OpenLineage sync today here is a very early strawman proposal for an OpenLineage registry that producers and consumers could be registered in. -Feedback or alternate proposals welcome -https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit -Once this is sufficiently fleshed out, I’ll create an actual proposal on github

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-10-03 20:33:35
-
-

*Thread Reply:* I have cleaned up the registry proposal. -https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit -In particular: -• I clarified that option 2 is preferred at this point. -• I moved discussion notes to the bottom. they will go away at some point -• Once it is stable, I’ll create a proposal with the preferred option. -• we need a good proposal for the core facets prefix. My suggestion is to move core facets to core in the registry. The drawback is prefix would be inconsistent.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-10-05 17:34:12
-
-

*Thread Reply:* I have created a ticket to make this easier to find. Once I get more feedback I’ll turn it into a md file in the repo: https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit#heading=h.enpbmvu7n8gu -https://github.com/OpenLineage/OpenLineage/issues/2161

-
- - - - - - - -
-
Labels
- proposal -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-15 12:03:27
-
-

@channel -Friendly reminder: the next OpenLineage meetup, our first in Toronto, is happening this coming Monday at 5 PM ET https://openlineage.slack.com/archives/C01CK9T7HKR/p1694441261486759

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-16 03:30:27
-
-

@here we have dataproc operator getting called from a dag which submits a spark job, we wanted to maintain that continuity of parent job in the spark job so according to the documentation we can acheive that by using a macro called lineagerunid that requires task and taskinstance as the parameters. The problem we are facing is that our client’s have 1000's of dags, so asking them to change this everywhere it is used is not feasible, so we thought of using the taskpolicy feature in the airflow…but the problem is that taskpolicy gives you access to only the task/operator, but we don’t have the access to the task instance..that is required as a parameter to the lineagerun_id function. Can anyone kindly help us on how should we go about this one -t1 = DataProcPySparkOperator( - task_id=job_name, - <b>#required</b> pyspark configuration, - job_name=job_name, - dataproc_pyspark_properties={ - 'spark.driver.extraJavaOptions': - f"-javaagent:{jar}={os.environ.get('OPENLINEAGE_URL')}/api/v1/namespaces/{os.getenv('OPENLINEAGE_NAMESPACE', 'default')}/jobs/{job_name}/runs/{{{{macros.OpenLineagePlugin.lineage_run_id(task, task_instance)}}}}?api_key={os.environ.get('OPENLINEAGE_API_KEY')}" - dag=dag)

- - - -
- ➕ Abdallah -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-16 04:22:47
-
-

*Thread Reply:* you don't need actual task instance to do that. you only should set additional argument as jinja template, same as above

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-16 04:25:28
-
-

*Thread Reply:* task_instance in this case is just part of string which is evaluated when jinja render happens

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-16 04:27:10
-
-

*Thread Reply:* ohh…then we could use the same example as above inside the task_policy to intercept the Operator and add the openlineage specific additions properties?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-16 04:30:59
-
-

*Thread Reply:* correct, just remember not to override all properties, just add ol specific

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-16 04:32:02
-
-

*Thread Reply:* yeah sure…thank you so much @Jakub Dardziński, will try this out and keep you posted

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-16 05:00:24
-
-

*Thread Reply:* We want to automate setting those options at some point inside the operator itself

- - - -
- ➕ Guntaka Jeevan Paul -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-16 19:40:27
-
-

@here is there a way by which we could add custom headers to openlineage client in airflow, i see that provision is there for spark integration via these properties spark.openlineage.transport.headers.xyz --> abcdef

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-19 16:40:55
-
-

*Thread Reply:* there’s no out-of-the-box possibility to do that yet, you’re very welcome to create an issue in GitHub and maybe contribute as well! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-09-17 09:07:41
-
-

It doesn't seem like there's a way to override the OL endpoint from the default (/api/v1/lineage) in Airflow? I tried setting the OPENLINEAGE_ENDPOINT environment to no avail. Based on this statement, it seems that only OPENLINEAGE_URL was used to construct HttpConfig ?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-18 16:25:11
-
-

*Thread Reply:* That’s correct. For now there’s no way to configure the endpoint via env var. You can do that by using config file

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-09-18 16:30:39
-
-

*Thread Reply:* How do you do that in Airflow? Any particular reason for excluding endpoint override via env var? Happy to create a PR to fix that.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-18 16:52:48
-
-

*Thread Reply:* historical I guess? go for the PR, of course 🚀

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-10-03 08:52:16
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/2151

-
- - - - - - - -
-
Labels
- documentation, client/python -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Terese Larsson - (terese@jclab.se) -
-
2023-09-18 08:22:34
-
-

Hi! I'm in need of help with wrapping my head around OpenLineage. My team have the goal of collecting metadata from the Airflow operators GreatExpectationsOperator, PythonOperator, MsSqlOperator and BashOperator (for dbt). Where can I see the sourcecode for what is collected for each operator, and is there support for these in the new provider apache-airflow-providers-openlineage? I am super confused and feel lost in the docs. 🤯 We are using MSSQL/ODBC to connect to our db, and this data does not seem to appear as datasets in Marquez, do I need to configure this? If so, HOW and WHERE? 🥲

- -

Happy for any help, big or small! 🙏

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-18 16:26:07
-
-

*Thread Reply:* there’s no actual single source of what integrations are currently implemented in openlineage Airflow provider. That’s something we should work on so it’s more visible

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-18 16:26:46
-
-

*Thread Reply:* answering this quickly - GE & MS SQL are not currently implemented yet in the provider

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-18 16:26:58
-
-

*Thread Reply:* but I also invite you to contribute if you’re interested! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sarathch - (sarathch@hpe.com) -
-
2023-09-19 02:47:47
-
-

Hi I need help in extracting OpenLineage for PostgresOperator in json format. -any suggestions or comments would be greatly appreciated

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-19 16:40:06
-
-

*Thread Reply:* If you're using Airflow 2.7, take a look at https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html

- - - -
- ❤️ sarathch -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-19 16:40:54
-
-

*Thread Reply:* If you use one of the lower versions, take a look here https://openlineage.io/docs/integrations/airflow/usage

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
sarathch - (sarathch@hpe.com) -
-
2023-09-20 06:26:56
-
-

*Thread Reply:* Maciej, -Thanks for sharing the link https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html -this should address the issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Juan Luis Cano Rodríguez - (juan_luis_cano@mckinsey.com) -
-
2023-09-20 09:36:54
-
-

congrats folks 🥳 https://lfaidata.foundation/blog/2023/09/20/lf-ai-data-foundation-announces-graduation-of-openlineage-project

- - - -
- 🎉 Jakub Dardziński, Mars Lan, Ross Turk, Guntaka Jeevan Paul, Peter Hicks, Maciej Obuchowski, Athitya Kumar, John Lukenoff, Harel Shein, Francis McGregor-Macdonald, Laurent Paris -
- -
- 👍 Athitya Kumar -
- -
- ❤️ Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-20 17:08:58
-
-

@channel -We released OpenLineage 1.2.2! -Added -• Spark: publish the ProcessingEngineRunFacet as part of the normal operation of the OpenLineageSparkEventListener #2089 @d-m-h -• Spark: capture and emit spark.databricks.clusterUsageTags.clusterAllTags variable from databricks environment #2099 @Anirudh181001 -Fixed -• Common: support parsing dbt_project.yml without target-path #2106 @tatiana -• Proxy: fix Proxy chart #2091 @harels -• Python: fix serde filtering #2044 @xli-1026 -• Python: use non-deprecated apiKey if loading it from env variables @2029 @mobuchowski -• Spark: Improve RDDs on S3 integration. #2039 @pawel-big-lebowski -• Flink: prevent sending running events after job completes #2075 @pawel-big-lebowski -• Spark & Flink: Unify dataset naming from URI objects #2083 @pawel-big-lebowski -• Spark: Databricks improvements #2076 @pawel-big-lebowski -Removed -• SQL: remove sqlparser dependency from iface-java and iface-py #2090 @JDarDagran -Thanks to all the contributors, including new contributors @tati, @xli-1026, and @d-m-h! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.2.2 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.1.0...1.2.2 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 🔥 Maciej Obuchowski, Harel Shein, Anirudh Shrinivason -
- -
- 👍 Guntaka Jeevan Paul, John Rosenbaum, Sangeeta Mishra -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yevhenii Soboliev - (esoboliev@griddynamics.com) -
-
2023-09-22 21:05:20
-
-

*Thread Reply:* Hi @Michael Robinson Thank you! I love the job that you’ve done. If you have a few seconds, please hint at how I can push lineage gathered from Airflow and Spark jobs into DataHub for visualization? I didn’t find any solutions or official support neither at Openlineage nor at DataHub, but I still want to continue using Openlineage

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-22 21:30:22
-
-

*Thread Reply:* Hi Yevhenii, thank you for using OpenLineage. The DataHub integration is new to us, but perhaps the experts on Spark and Airflow know more. @Paweł Leszczyński @Maciej Obuchowski @Jakub Dardziński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-09-23 08:11:17
-
-

*Thread Reply:* @Yevhenii Soboliev at Airflow Summit, Shirshanka Das from DataHub mentioned this as upcoming feature.

- - - -
- 👍 Yevhenii Soboliev -
- -
- 🎯 Yevhenii Soboliev -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-21 02:11:10
-
-

Hi, we're using Custom Operators in airflow(2.5) and are planning to expose lineage via default extractors: https://openlineage.io/docs/integrations/airflow/default-extractors/ -Question: Now if we upgrade our Airflow version to 2.7 in the future, would our code be backward compatible? -Since OpenLineage has now moved inside airflow and I think there is no concept of extractors in the latest version.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-21 02:15:00
-
-

*Thread Reply:* Also, do we have any docs on how OL works with the latest airflow version? Few questions: -• How is it replacing the concept of custom extractors and Manually Annotated Lineage in the latest version? -• Do we have any examples of setting up the integration to emit input/output datasets for non supported Operators like PythonOperator?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-27 10:04:09
-
-

*Thread Reply:* > Question: Now if we upgrade our Airflow version to 2.7 in the future, would our code be backward compatible? -It will be compatible, “default extractors” is generally the same concept as we’re using in the 2.7 integration. -One thing that might be good to update is import paths, from openlineage.airflow to airflow.providers.openlineage but should work both ways

- -

> • Do we have any code samples/docs of setting up the integration to emit input/output datasets for non supported Operators like PythonOperator? -Our experience with that is currently lacking - this means, it works like in bare airflow, if you annotate your PythonOperator tasks with old Airflow lineage like in this doc.

- -

We want to make this experience better - by doing few things -• instrumenting hooks, then collecting lineage from them -• integration with AIP-48 datasets -• allowing to emit lineage collected inside Airflow task by other means, by providing core Airflow API for that -All those things require changing core Airflow in a couple of ways: -• tracking which hooks were used during PythonOperator execution -• just being able to emit datasets (airflow inlets/outlets) from inside of a task - they are now a static thing, so if you try that it does not work -• providing better API for emitting that lineage, preferably based on OpenLineage itself rather than us having to convert that later. -As this requires core Airflow changes, it won’t be live until Airflow 2.8 at the earliest.

- -

thanks to @Maciej Obuchowski for this response

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-09-21 18:36:17
-
-

I am using this accelerator that leverages OpenLineage on Databricks to publish lineage info to Purview, but it's using a rather old version of OpenLineage aka 0.18, anybody has tried it on a newer version of OpenLineage? I am facing some issues with the inputs and outputs for the same object is having different json -https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator/

-
- - - - - - - -
-
Stars
- 77 -
- -
-
Language
- C# -
- - - - - - - - -
- - - -
- ✅ Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-09-21 21:51:41
-
-

I installed 1.2.2 on Databricks, followed the below init script: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/databricks/open-lineage-init-script.sh

- -

my cluster config looks like this:

- -

spark.openlineage.version v1 -spark.openlineage.namespace adb-5445974573286168.8#default -spark.openlineage.endpoint v1/lineage -spark.openlineage.url.param.code 8kZl0bo2TJfnbpFxBv-R2v7xBDj-PgWMol3yUm5iP1vaAzFu9kIZGg== -spark.openlineage.url https://f77b-50-35-69-138.ngrok-free.app

- -

But it is not calling the API, it works fine with 0.18 version

-
- - - - - - - - - - - - - - - - -
- - - -
- ✅ Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-09-21 23:16:10
-
-

I am attaching the log4j, there is no openlineagecontext

- -
- - - - - - - -
- - -
- ✅ Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-09-21 23:47:22
-
-

*Thread Reply:* this issue is resolved, solution can be found here: https://openlineage.slack.com/archives/C01CK9T7HKR/p1691592987038929

-
- - -
- - - } - - Zahi Fail - (https://openlineage.slack.com/team/U05KNSP01TR) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-25 08:59:10
-
-

*Thread Reply:* We were all out at Airflow Summit last week, so apologies for the delayed response. Glad you were able to resolve the issue!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sangeeta Mishra - (sangeeta@acceldata.io) -
-
2023-09-25 05:11:50
-
-

@here I'm presently addressing a particular scenario that pertains to Openlineage authentication, specifically involving the use of an access key and secret.

- -

I've implemented a custom token provider called AccessKeySecretKeyTokenProvider, which extends the TokenProvider class. This token provider communicates with another service, obtaining a token and an expiration time based on the provided access key, secret, and client ID.

- -

My goal is to retain this token in a cache prior to its expiration, thereby eliminating the need for network calls to the third-party service. Is it possible without relying on an external caching system.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-25 08:56:53
-
-

*Thread Reply:* Hey @Sangeeta Mishra, I’m not sure that I fully understand your question here. What do you mean by OpenLineage authentication? -What are you using to generate OL events? What’s your OL receiving backend?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sangeeta Mishra - (sangeeta@acceldata.io) -
-
2023-09-25 09:04:33
-
-

*Thread Reply:* Hey @Harel Shein, -I wanted to clarify the previous message. I apologize for any confusion. When I mentioned "OpenLineage authentication," I was actually referring to the authentication process for the OpenLineage backend, specifically using HTTP transport. This involves using my custom token provider, which utilizes access keys and secrets for authentication. The OL backend is http based backend . I hope this clears things up!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-25 09:05:12
-
-

*Thread Reply:* Are you using Marquez?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sangeeta Mishra - (sangeeta@acceldata.io) -
-
2023-09-25 09:05:55
-
-

*Thread Reply:* We are trying to leverage our own backend here.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-25 09:07:03
-
-

*Thread Reply:* I see.. I’m not sure the OpenLineage community could help here. Which webserver framework are you using?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sangeeta Mishra - (sangeeta@acceldata.io) -
-
2023-09-25 09:08:56
-
-

*Thread Reply:* KTOR framework

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sangeeta Mishra - (sangeeta@acceldata.io) -
-
2023-09-25 09:15:33
-
-

*Thread Reply:* Our backend authentication operates based on either a pair of keys or a single bearer token, with a limited time of expiry. Hence, wanted to cache this information inside the token provider.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-09-25 09:26:57
-
-

*Thread Reply:* I see, I would ask this question here https://ktor.io/support/

-
-
Ktor Framework
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sangeeta Mishra - (sangeeta@acceldata.io) -
-
2023-09-25 10:12:52
-
-

*Thread Reply:* Thank you

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-26 04:13:20
-
-

*Thread Reply:* @Sangeeta Mishra which openlineage client are you using: java or python?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Sangeeta Mishra - (sangeeta@acceldata.io) -
-
2023-09-26 04:19:53
-
-

*Thread Reply:* @Paweł Leszczyński I am using python client

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Suraj Gupta - (suraj.gupta@atlan.com) -
-
2023-09-25 13:36:25
-
-

I'm using the Spark OpenLineage integration. In the outputStatistics output dataset facet we receive rowCount and size. -The Job performs a SQL insert into a MySQL table and I'm receiving the size as 0. -{ - "outputStatistics": - { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/OutputStatisticsOutputDatasetFacet.json#/$defs/OutputStatisticsOutputDatasetFacet>", - "rowCount": 1, - "size": 0 - } -} -I'm not sure what the size means here. Does this mean number of bytes inserted/updated? -Also, do we have any documentation for Spark specific Job and Run facets?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-27 09:56:00
-
-

*Thread Reply:* I am not sure it's stated in the doc. Here's the list of spark facets schemas: https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark/shared/facets/spark/v1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-26 00:51:30
-
-

@here In Airflow Integration we send across a lineage Event for Dag start and complete, but that is not the case with spark integration…we don’t receive any event for the application start and complete in spark…is this expected behaviour or am i missing something?

- - - -
- ➕ Suraj Gupta -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-27 09:47:39
-
-

*Thread Reply:* For spark we do send start and complete for each spark action being run (single operation that causes spark processing being run). However, it is difficult for us to know if we're dealing with the last action within spark job or a spark script.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-27 09:49:35
-
-

*Thread Reply:* I think we need to look deeper into that as there is reoccuring need to capture such information

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-27 09:49:57
-
-

*Thread Reply:* and spark listener event has methods like onApplicationStart and onApplicationEnd

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-27 09:50:13
-
-

*Thread Reply:* We are using the SparkListener, which has a function called OnApplicationStart which gets called whenever a spark application starts, so i was thinking why cant we send one at start and simlarly at end as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-27 09:50:33
-
-

*Thread Reply:* additionally, we would like to have a concept of a parent run for a spark job which aggregates all actions run within a single spark job context

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-27 09:51:11
-
-

*Thread Reply:* yeah exactly. the way that it works with airflow integration

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-27 09:51:26
-
-

*Thread Reply:* we do have an issue for that https://github.com/OpenLineage/OpenLineage/issues/2105

-
- - - - - - - -
-
Labels
- proposal -
- -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-27 09:52:08
-
-

*Thread Reply:* what you can is: come to our monthly Openlineage open meetings and raise that issue and convince the community about its importance

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-09-27 09:53:32
-
-

*Thread Reply:* yeah sure would love to do that…how can i join them, will that be posted here in this slack channel?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-27 09:54:08
-
-

*Thread Reply:* Hi, you can see the schedule and RSVP here: https://openlineage.io/community

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 🙌 Paweł Leszczyński -
- -
- :gratitude_thank_you: Guntaka Jeevan Paul -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-27 11:19:16
-
-

Meetup recap: Toronto Meetup @ Airflow Summit, September 18, 2023 -It was great to see so many members of our community at this event! I counted 32 total attendees, with all but a handful being first-timers. -Topics included: -• Presentation on the history, architecture and roadmap of the project by @Julien Le Dem and @Harel Shein -• Discussion of OpenLineage support in Marquez by @Willy Lulciuc -• Presentation by Ye Liu and Ivan Perepelitca from Metaphor, the social platform for data, about their integration -• Presentation by @Paweł Leszczyński about the Spark integration -• Presentation by @Maciej Obuchowski about the Apache Airflow Provider -Thanks to all the presenters and attendees with a shout out to @Harel Shein for the help with organizing and day-of logistics, @Jakub Dardziński for the help with set up/clean up, and @Sheeri Cabral (Collibra) for the crucial assist with the signup sheet. -This was our first meetup in Toronto, and we learned some valuable lessons about planning events in new cities — the first and foremost being to ask for a pic of the building! 🙂 But it seemed like folks were undeterred, and the space itself lived up to expectations. -For a recording and clips from the meetup, head over to our YouTube channel. -Upcoming events: -• October 5th in San Francisco: Marquez Meetup @ Astronomer (sign up https://www.meetup.com/meetup-group-bnfqymxe/events/295444209/?utmmedium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|here) -• November: Warsaw meetup (details, date TBA) -• January: London meetup (details, date TBA) -Are you interested in hosting or co-hosting an OpenLineage or Marquez meetup? DM me!

-
-
metaphor.io
- - - - - - - - - - - - - - - - - -
-
-
YouTube
- - - - - - - - - - - - - - - - - -
-
-
Meetup
- - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - -
-
- - - - - - - - - -
-
- - - - - - - - - -
-
- - - - - - - - - -
-
- - - - - - - - - -
- - -
- 🙌 Mars Lan, Harel Shein, Paweł Leszczyński -
- -
- ❤️ Jakub Dardziński, Harel Shein, Rodrigo Maia, Paweł Leszczyński, Julien Le Dem, Willy Lulciuc -
- -
- 🚀 Jakub Dardziński, Kevin Languasco -
- -
- 😅 Harel Shein -
- -
- ✅ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-27 11:55:47
-
-

*Thread Reply:* A few more pics:

- -
- - - - - - - - - -
-
- - - - - - - - - -
-
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-09-27 12:23:05
-
-

Hi folks, am I correct in my observations that the Spark integration does not generate inputs and outputs for Kafka-to-Kafka pipelines?

- -

EDIT: Removed the crazy wall of text. Relevant GitHub issue is here.

-
- - - - - - - - - - - - - - - - -
- - - -
- 👀 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-28 02:42:18
-
-

*Thread Reply:* responded within the issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 02:40:40
-
-

Hello community -First time poster - bear with me :)

- -

I am looking to make a minor PR on the airflow integration (fixing github #2130), and the code change is easy enough, but I fail to install the python environment. I have tried the simple ones -OpenLineage/integration/airflow &gt; pip install -e . - or -OpenLineage/integration/airflow &gt; pip install -r dev-requirements.txt -but they both fail on -ERROR: No matching distribution found for openlineage-sql==1.3.0

- -

(which I think is an unreleased version in the git project)

- -

How would I go about to install the requirements?

- -

//Erik

- -

PS. Sorry for posting this in general if there is a specific integration or contribution channel - I didnt find a better channel

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-28 03:04:48
-
-

*Thread Reply:* Hi @Erik Alfthan, the channel is totally OK. I am not airflow integration expert, but what it looks to me, you're missing openlineage-sql library, which is a rust library used to extract lineage from sql queries. This is how we do that in circle ci: -https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/8080/workflows/aba53369-836c-48f5-a2dd-51bc0740a31c/jobs/140113

- -

and subproject page with build instructions: https://github.com/OpenLineage/OpenLineage/tree/main/integration/sql

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 03:07:23
-
-

*Thread Reply:* Ok, so I go and "manually" build the internal dependency so that it becomes available in the pip cache?

- -

I was hoping for something more automagical, but that should work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-09-28 03:08:06
-
-

*Thread Reply:* I think so. @Jakub Dardziński am I right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 03:18:27
-
-

*Thread Reply:* https://openlineage.io/docs/development/developing/python/setup -there’s a guide how to setup the dev environment

- -

> Typically, you first need to build openlineage-sql locally (see README). After each release you have to repeat this step in order to bump local version of the package. -This might be somewhat exposed more in GitHub repository README as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 03:27:20
-
-

*Thread Reply:* It didnt find the wheel in the cache, but if I used the line in the sql/README.md -pip install openlineage-sql --no-index --find-links ../target/wheels --force-reinstall -It is installed and thus skipped/passed when pip later checks if it needs to be installed.

- -

Now I have a second issue because it is expecting me to have mysqlclient-2.2.0 which seems to need a binary -Command 'pkg-config --exists mysqlclient' returned non-zero exit status 127 -and -Command 'pkg-config --exists mariadb' returned non-zero exit status 127 -I am on Ubuntu 22.04 in WSL2. Should I go to apt and grab me a mysql client?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 03:31:52
-
-

*Thread Reply:* > It didnt find the wheel in the cache, but if I used the line in the sql/README.md -> pip install openlineage-sql --no-index --find-links ../target/wheels --force-reinstall -> It is installed and thus skipped/passed when pip later checks if it needs to be installed. -That’s actually expected. You should build new wheel locally and then install it.

- -

> Now I have a second issue because it is expecting me to have mysqlclient-2.2.0 which seems to need a binary -> Command 'pkg-config --exists mysqlclient' returned non-zero exit status 127 -> and -> Command 'pkg-config --exists mariadb' returned non-zero exit status 127 -> I am on Ubuntu 22.04 in WSL2. Should I go to apt and grab me a mysql client? -We’ve left some system specific configuration, e.g. mysqlclient, to users as it’s a bit aside from OpenLineage and more of general development task.

- -

probably -sudo apt-get install python3-dev default-libmysqlclient-dev build-essential -should work

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 03:32:04
-
-

*Thread Reply:* I just realized that I should probably skip setting up my wsl and just run the tests in the docker setup you prepared

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 03:35:46
-
-

*Thread Reply:* You could do that as well but if you want to test your changes vs many Airflow versions that wouldn’t be possible I think (run them with tox btw)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 04:54:39
-
-

*Thread Reply:* This is starting to feel like a rabbit hole 😞

- -

When I run tox, I get a lot of build errors -• client needs to be built -• sql needs to be built to a different target than its readme says -• a lot of builds fail on cython_sources

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 05:19:34
-
-

*Thread Reply:* would you like to share some exact log lines? I’ve never seen such errors, they probably are system specific

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 06:45:48
-
-

*Thread Reply:* Getting requirements to build wheel did not run successfully. -│ exit code: 1 -╰─&gt; [62 lines of output] - /tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/config/setupcfg.py:293: _DeprecatedConfig: Deprecated config insetup.cfg` -!!`

- -
        `****************************************************************************************************************************************************************`
-        `The license_file parameter is deprecated, use license_files instead.`
-
-        `By 2023-Oct-30, you need to update your project and remove deprecated calls`
-        `or your builds will no longer be supported.`
-
-        `See <https://setuptools.pypa.io/en/latest/userguide/declarative_config.html> for details.`
-        `****************************************************************************************************************************************************************`
-
-`!!`
-  `parsed = self.parsers.get(option_name, lambda x: x)(value)`
-`running egg_info`
-`writing lib3/PyYAML.egg-info/PKG-INFO`
-`writing dependency_links to lib3/PyYAML.egg-info/dependency_links.txt`
-`writing top-level names to lib3/PyYAML.egg-info/top_level.txt`
-`Traceback (most recent call last):`
-  `File "/home/obr_erikal/projects/OpenLineage/integration/airflow/.tox/py3-airflow-2.1.4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in &lt;module&gt;`
-    `main()`
-  `File "/home/obr_erikal/projects/OpenLineage/integration/airflow/.tox/py3-airflow-2.1.4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main`
-    `json_out['return_val'] = hook(****hook_input['kwargs'])`
-  `File "/home/obr_erikal/projects/OpenLineage/integration/airflow/.tox/py3-airflow-2.1.4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel`
-    `return hook(config_settings)`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 355, in get_requires_for_build_wheel`
-    `return self._get_build_requires(config_settings, requirements=['wheel'])`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 325, in _get_build_requires`
-    `self.run_setup()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 341, in run_setup`
-    `exec(code, locals())`
-  `File "&lt;string&gt;", line 271, in &lt;module&gt;`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/__init__.py", line 103, in setup`
-    `return distutils.core.setup(****attrs)`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup`
-    `return run_commands(dist)`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands`
-    `dist.run_commands()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands`
-    `self.run_command(cmd)`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 989, in run_command`
-    `super().run_command(command)`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command`
-    `cmd_obj.run()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 318, in run`
-    `self.find_sources()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 326, in find_sources`
-    `mm.run()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 548, in run`
-    `self.add_defaults()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 586, in add_defaults`
-    `sdist.add_defaults(self)`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/sdist.py", line 113, in add_defaults`
-    `super().add_defaults()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 251, in add_defaults`
-    `self._add_defaults_ext()`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 336, in _add_defaults_ext`
-    `self.filelist.extend(build_ext.get_source_files())`
-  `File "&lt;string&gt;", line 201, in get_source_files`
-  `File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 107, in __getattr__`
-    `raise AttributeError(attr)`
-`AttributeError: cython_sources`
-`[end of output]`
-
- -

note: This error originates from a subprocess, and is likely not a problem with pip. -py3-airflow-2.1.4: exit 1 (7.85 seconds) /home/obr_erikal/projects/OpenLineage/integration/airflow&gt; python -m pip install --find-links target/wheels/ --find-links ../sql/iface-py/target/wheels --use-deprecated=legacy-resolver --constraint=<https://raw.githubusercontent.com/apache/airflow/constraints-2.1.4/constraints-3.8.txt> apache-airflow==2.1.4 'mypy&gt;=0.9.6' pytest pytest-mock -r dev-requirements.txt pid=368621 -py3-airflow-2.1.4: FAIL ✖ in 7.92 seconds

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 06:53:54
-
-

*Thread Reply:* Then, for the actual error in my PR: Evidently you are not using isort, so what linter/fixer should I use for imports?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 06:58:15
-
-

*Thread Reply:* for the error - I think there’s a mistake in the docs. Could you please run maturin build --out target/wheels as a temp solution?

- - - -
- 👀 Erik Alfthan -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 06:58:57
-
-

*Thread Reply:* we’re using ruff , tox runs it as one of commands

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:00:37
-
-

*Thread Reply:* Not in the airflow folder? -OpenLineage/integration/airflow$ maturin build --out target/wheels -💥 maturin failed - Caused by: pyproject.toml at /home/obr_erikal/projects/OpenLineage/integration/airflow/pyproject.toml is invalid - Caused by: TOML parse error at line 1, column 1 - | -1 | [tool.ruff] - | ^ -missing fieldbuild-system``

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:02:32
-
-

*Thread Reply:* I meant change here https://github.com/OpenLineage/OpenLineage/blob/main/integration/sql/README.md

- -

so -cd iface-py -python -m pip install maturin -maturin build --out ../target/wheels -becomes -cd iface-py -python -m pip install maturin -maturin build --out target/wheels -tox runs -install_command = python -m pip install {opts} --find-links target/wheels/ \ - --find-links ../sql/iface-py/target/wheels -but it should be -install_command = python -m pip install {opts} --find-links target/wheels/ \ - --find-links ../sql/target/wheels -actually and I’m posting PR to fix that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:05:12
-
-

*Thread Reply:* yes, that part I actually worked out myself, but the cython_sources error I fail to understand cause. I have python3-dev installed on WSL Ubuntu with python version 3.10.12 in a virtualenv. Anything in that that could cause issues?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:12:20
-
-

*Thread Reply:* looks like it has something to do with latest release of Cython? -pip install "Cython&lt;3" maybe solves the issue?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:15:06
-
-

*Thread Reply:* I didnt have any cython before the install. Also no change. Could it be some update to setuptools itself? seems like the depreciation notice and the error is coming from inside setuptools

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:16:59
-
-

*Thread Reply:* (I.e. I tried the pip install "Cython&lt;3" command without any change in the output )

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:20:30
-
-

*Thread Reply:* Applying ruff lint on the converter.py file fixed the issue on the PR though so unless you have any feedback on the change itself, I will set it up on my own computer later instead (right now doing changes on behalf of a client on the clients computer)

- -

If the issue persists on my own computer, I'll dig a bit further

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:21:03
-
-

*Thread Reply:* It’s a bit hard for me to find the root cause as I cannot reproduce this locally and CI works fine as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:22:41
-
-

*Thread Reply:* Yeah, I am thinking that if I run into the same problem "at home", I might find it worthwhile to understand the issue. Right now, the client only wants the fix.

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:25:10
-
-

*Thread Reply:* Is there an official release cycle?

- -

or more specific, given that the PRs are approved, how soon can they reach openlineage-dbt and apache-airflow-providers-openlineage ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:28:58
-
-

*Thread Reply:* we need to differentiate some things:

- -
  1. OpenLineage repository: -a. dbt integration - this is the only place where it is maintained -b. Airflow integration - here we only keep backwards compatibility but generally speaking starting from Airflow 2.7+ we would like to do all the job in Airflow repo as OL Airflow provider
  2. Airflow repository - there’s only Airflow Openlineage provider compatible (and works best) with Airflow 2.7+
  3. -
- -

we have control over releases (obviously) in OL repo - it’s monthly cycle so beginning next week that should happen. There’s also a possibility to ask for ad-hoc release in #general slack channel and with approvals of committers the new version is also released

- -

For Airflow providers - the cycle is monthly as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:31:30
-
-

*Thread Reply:* it’s a bit complex for this split but needed temporarily

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:31:47
-
-

*Thread Reply:* oh, I did the fix in the wrong place! The client is on airflow 2.7 and is using the provider. Is it syncing?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:32:28
-
-

*Thread Reply:* it’s not, two separate places a~nd we haven’t even added the whole thing with converting old lineage objects to OL specific~

- -

editing, that’s not true

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:34:40
-
-

*Thread Reply:* the code’s here: -https://github.com/apache/airflow/blob/main/airflow/providers/openlineage/extractors/manager.py#L154

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:35:17
-
-

*Thread Reply:* sorry I did not mention this earlier. we definitely need to add some guidance how to proceed with contributions to OL and Airflow OL provider

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:36:10
-
-

*Thread Reply:* anyway, the dbt fix is the blocking issue, so if that parts comes next week, there is no real urgency in getting the columns. It is a nice to have for our ingest parquet files.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:37:12
-
-

*Thread Reply:* may I ask if you use some custom operator / python operator there?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:37:33
-
-

*Thread Reply:* yeah, taskflow with inlets/outlets

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:38:38
-
-

*Thread Reply:* so we extract from sources and use pyarrow to create parquet files in storage that an mssql-server can use as external tables

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-09-28 07:39:54
-
-

*Thread Reply:* awesome 👍 -we have plans to integrate more with Python operator as well but not earlier than in Airflow 2.8

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Erik Alfthan - (slack@alfthan.eu) -
-
2023-09-28 07:43:41
-
-

*Thread Reply:* I guess writing a generic extractor for the python operator is quite hard, but if you could support some inlet/outlet type for tabular fileformat / their python libraries like pyarrow or maybe even pandas and document it, I think a lot of people would understand how to use them

- - - -
- ➕ Harel Shein -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-09-28 16:16:24
-
-

Are you located in the Brussels area or within commutable distance? Interested in attending a meetup between October 16-20? If so, please DM @Sheeri Cabral (Collibra) or myself. TIA

- - - -
- ❤️ Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-02 11:58:32
-
-

@channel -Hello all, I’d like to open a vote to release OpenLineage 1.3.0, including: -• support for Spark 3.5 in the Spark integration -• scheme preservation bug fix in the Spark integration -• find-links path in tox bug in the Airflow integration fix -• more graceful logging when no OL provider is installed in the Airflow integration -• columns as schema facet for airflow.lineage.Table addition -• SQLSERVER to supported dbt profile types addition -Three +1s from committers will authorize. Thanks in advance.

- - - -
- 🙌 Harel Shein, Paweł Leszczyński, Rodrigo Maia -
- -
- 👍 Jason Yip, Paweł Leszczyński -
- -
- ➕ Willy Lulciuc, Jakub Dardziński, Erik Alfthan, Julien Le Dem -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-02 17:00:08
-
-

*Thread Reply:* Thanks all. The release is authorized and will be initiated within 2 business days.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-02 17:11:46
-
-

*Thread Reply:* looking forward to that, I am seeing inconsistent results in Databricks for Spark 3.4+, sometimes there's no inputs / outputs, hope that is fixed?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-10-03 09:59:24
-
-

*Thread Reply:* @Jason Yip if it isn’t fixed for you, would love it if you could open up an issue that will allow us to reproduce and fix

- - - -
- 👍 Jason Yip -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-03 20:23:40
-
-

*Thread Reply:* @Harel Shein the issue still exists -> Spark 3.4 and above, including 3.5, saveAsTable and create table won't have inputs and outputs in Databricks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-03 20:30:15
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/2124

-
- - - - - - - -
-
Labels
- integration/spark, integration/databricks -
- -
-
Comments
- 1 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-03 20:30:21
-
-

*Thread Reply:* and of course this issue still exists

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-10-03 21:45:09
-
-

*Thread Reply:* thanks for posting, we’ll continue looking into this.. if you find any clues that might help, please let us know.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-03 21:46:27
-
-

*Thread Reply:* is there any instructions on how to hook up a debugger to OL?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-10-04 09:04:16
-
-

*Thread Reply:* @Paweł Leszczyński has been working on adding a debug facet, but more suggestions are more than welcome!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-10-04 09:05:58
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/pull/2147

-
- - - - - - - -
-
Labels
- documentation, integration/spark -
- -
-
Assignees
- <a href="https://github.com/pawel-big-lebowski">@pawel-big-lebowski</a> -
- - - - - - - - - - -
- - - -
- 👀 Paweł Leszczyński -
- -
- 👍 Jason Yip -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-05 03:20:11
-
-

*Thread Reply:* @Paweł Leszczyński do you have a build for the PR? Appreciated!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-10-05 15:05:08
-
-

*Thread Reply:* we’ll ask for a release once it’s reviewed and merged

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-02 12:28:28
-
-

@channel -The September issue of OpenLineage News is here! This issue covers the big news about OpenLineage coming out of Airflow Summit, progress on the Airflow Provider, highlights from our meetup in Toronto, and much more. -To get the newsletter directly in your inbox each month, sign up here.

-
-
apache.us14.list-manage.com
- - - - - - - - - - - - - - - -
- - - -
- 🦆 Harel Shein, Paweł Leszczyński -
- -
- 🔥 Willy Lulciuc, Jakub Dardziński, Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-10-03 03:44:36
-
-

Hi folks - I'm wondering if its just me, but does io.openlineage:openlineage_sql_java:1.2.2 ship with the arm64.dylib binary? When i try and run code that uses the Java package on an Apple M1, the binary isn't found, The workaround is to checkout 1.2.2 and then build and publish it locally.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-03 09:01:38
-
-

*Thread Reply:* Not sure if I follow your question. Whenever OL is released, there is a script new-version.sh - https://github.com/OpenLineage/OpenLineage/blob/main/new-version.sh being run and modify the codebase.

- -

So, If you pull the code, it contains OL version that has not been released yet and in case of dependencies, one need to build them on their own.

- -

For example, here https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#preparation Preparation section describes how to build openlineage-java and openlineage-sql in order to build openlineage-spark.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-10-04 05:27:26
-
-

*Thread Reply:* Hmm. Let's elaborate my use case a bit.

- -

We run Apache Hive on-premise. Hive provides query execution hooks for pre-query, post-query, and I think failed query.

- -

Any way, as part of the hook, you're given the query string.

- -

So I, naturally, tried to pass the query string into OpenLineageSql.parse(Collections.singletonList(hookContext.getQueryPlan().getQueryStr()), "hive") in order to test this out.

- -

I was using openlineage-sql-java:1.2.2 at that time, and no matter what query string I gave it, nothing was returned.

- -

I then stepped through the code and noticed that it was looking for the arm64 lib, and I noticed that that package (downloaded from maven central) lacked that particular native binary.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-10-04 05:27:36
-
-

*Thread Reply:* I hope that helps.

- - - -
- 👍 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-04 09:03:02
-
-

*Thread Reply:* I get in now. In Circle CI we do have 3 build steps: -- build-integration-sql-x86 - - build-integration-sql-arm - - build-integration-sql-macos -but no mac m1. I think at that time circle CI did not have a proper resource class in free plan. Additionally, @Maciej Obuchowski would prefer to migrate this to github actions as he claims this can be achieved there in a cleaner way (https://github.com/OpenLineage/OpenLineage/issues/1624).

- -

Feel free to create an issue for this. Others would be able to upvote it in case they have similar experience.

-
- - - - - - - -
-
Assignees
- <a href="https://github.com/mobuchowski">@mobuchowski</a> -
- -
-
Labels
- ci, integration/sql -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-10-23 11:56:12
-
-

*Thread Reply:* It doesn't have the free resource class still 😞 -We're blocked on that unfortunately. Other solution would be to migrate to GH actions, where most of our solution could be replaced by something like that https://github.com/PyO3/maturin-action

-
- - - - - - - -
-
Stars
- 98 -
- -
-
Language
- TypeScript -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-03 10:56:03
-
-

@channel -We released OpenLineage 1.3.1! -Added: -• Airflow: add some basic stats to the Airflow integration #1845 @harels -• Airflow: add columns as schema facet for airflow.lineage.Table (if defined) #2138 @erikalfthan -• DBT: add SQLSERVER to supported dbt profile types #2136 @erikalfthan -• Spark: support for latest 3.5 #2118 @pawel-big-lebowski -Fixed: -• Airflow: fix find-links path in tox #2139 @JDarDagran -• Airflow: add more graceful logging when no OpenLineage provider installed #2141 @JDarDagran -• Spark: fix bug in PathUtils’ prepareDatasetIdentifierFromDefaultTablePath (CatalogTable) to correctly preserve scheme from CatalogTable’s location #2142 @d-m-h -Thanks to all the contributors, including new contributor @Erik Alfthan! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.3.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.2.2...1.3.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👍 Jason Yip, Peter Hicks, Peter Huang, Mars Lan -
- -
- 🎉 Sheeri Cabral (Collibra) -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-10-04 07:42:59
-
-

*Thread Reply:* Any chance we can do a 1.3.2 soonish to include https://github.com/OpenLineage/OpenLineage/pull/2151 instead of waiting for the next monthly release?

-
- - - - - - - -
-
Labels
- documentation, client/python -
- -
-
Comments
- 4 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Matthew Paras - (matthewparas2020@u.northwestern.edu) -
-
2023-10-03 12:34:57
-
-

Hey everyone - does anyone have a good mechanism for alerting on issues with open lineage? For example, maybe alerting when an event times out - perhaps to prometheus or some other kind of generic endpoint? Not sure the best approach here (if the meta inf extension would be able to achieve it)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-04 03:01:02
-
-

*Thread Reply:* That's a great usecase for OpenLineage. Unfortunately, we don't have any doc or recomendation on that.

- -

I would try using FluentD proxy we have (https://github.com/OpenLineage/OpenLineage/tree/main/proxy/fluentd) to copy event stream (alerting is just one of usecases for lineage events) and write fluentd plugin to send it asynchronously further to alerting service like PagerDuty.

- -

It looks cool to me but I never had enough time to test this approach.

- - - -
- 👍 Matthew Paras -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-05 14:44:14
-
-

@channel -This month’s TSC meeting is next Thursday the 12th at 10am PT. On the tentative agenda: -• announcements -• recent releases -• Airflow Summit recap -• tutorial: migrating to the Airflow Provider -• discussion topic: observability for OpenLineage/Marquez -• open discussion -• more (TBA) -More info and the meeting link can be found on the website. All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👀 Sheeri Cabral (Collibra), Julian LaNeve, Peter Hicks -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Julien Le Dem - (julien@apache.org) -
-
2023-10-05 20:40:40
-
-

The Marquez meetup in San Francisco is happening right now! -https://www.meetup.com/meetup-group-bnfqymxe/events/295444209/?utmmedium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link|https://www.meetup.com/meetup-group-bnfqymxe/events/295444209/?utmmedium=referral&utmcampaign=share-btnsavedeventssharemodal&utmsource=link

-
-
Meetup
- - - - - - - - - - - - - - - - - -
- - - -
- 🎉 Paweł Leszczyński, Rodrigo Maia -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mars Lan - (mars@metaphor.io) -
-
2023-10-06 07:19:01
-
-

@Michael Robinson can we cut a new release to include this change? -• https://github.com/OpenLineage/OpenLineage/pull/2151

-
- - - - - - - -
-
Labels
- documentation, client/python -
- -
-
Comments
- 6 -
- - - - - - - - - - -
- - - -
- ➕ Harel Shein, Jakub Dardziński, Julien Le Dem, Michael Robinson, Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-06 19:16:02
-
-

*Thread Reply:* Thanks for requesting a release, @Mars Lan. It has been approved and will be initiated within 2 business days of next Monday.

- - - -
- 🙏 Mars Lan -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-08 23:59:36
-
-

@here I am trying out the openlineage integration of spark on databricks. There is no event getting emitted from Openlineage, I see logs saying OpenLineage Event Skipped. I am attaching the Notebook that i am trying to run and the cluster logs. Kindly can someone help me on this

- - -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-09 00:02:10
-
-

*Thread Reply:* from my experience, it will only work on Spark 3.3.x or below, aka Runtime 12.2 or below. Anything above the events will show up once in a blue moon

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 00:04:38
-
-

*Thread Reply:* ohh, thanks for the information @Jason Yip, I am trying out with 13.3 Databricks Version and Spark 3.4.1, will try using a below version as you suggested. Any issue tracking this bug @Jason Yip

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-09 00:06:06
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/issues/2124

-
- - - - - - - -
-
Labels
- integration/spark, integration/databricks -
- -
-
Comments
- 2 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 00:11:54
-
-

*Thread Reply:* tried with databricks 12.2 --> spark 3.3.2, still the same behaviour no event getting emitted

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-09 00:12:35
-
-

*Thread Reply:* you can do 11.3, its the most stable one I know

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 00:12:46
-
-

*Thread Reply:* sure, let me try that out

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 00:31:51
-
-

*Thread Reply:* still the same problem…the jar that i am using is the latest openlineage-spark-1.3.1.jar, do you think that can be the problem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 00:43:59
-
-

*Thread Reply:* tried with openlineage-spark-1.2.2.jar, still the same issue, seems like they are skipping some events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-09 01:47:20
-
-

*Thread Reply:* Probably not all events will be captured, I have only tested create tables and jobs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-09 04:31:12
-
-

*Thread Reply:* Hi @Guntaka Jeevan Paul, how did you configure openlineage and what is your job doing?

- -

We do have a bunch of integration tests on Databricks platform available here and they're passing on databricks runtime 13.0.x-scala2.12.

- -

Could you also try running code same as our test does (this one)? If you run it and see OL events, this will make us sure your config is OK and we can continue further debug.

- -

Looking at your spark script: could you save your dataset and see if you still don't see any events?

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 05:06:41
-
-

*Thread Reply:* babynames = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("dbfs:/FileStore/babynames.csv") -babynames.createOrReplaceTempView("babynames_table") -years = spark.sql("select distinct(Year) from babynames_table").rdd.map(lambda row : row[0]).collect() -years.sort() -dbutils.widgets.dropdown("year", "2014", [str(x) for x in years]) -display(babynames.filter(babynames.Year == dbutils.widgets.get("year")))

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 05:08:09
-
-

*Thread Reply:* this is the script that i am running @Paweł Leszczyński…kindly let me know if i’m doing any mistake. I have added the init script at the cluster level and from the logs i could see that openlineage is configured as i see a log statement

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-09 05:10:30
-
-

*Thread Reply:* there's nothing wrong in that script. It's just we decided to limit amount of OL events for jobs that don't write their data anywhere and just do collect operation

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-09 05:11:02
-
-

*Thread Reply:* this is also a potential reason why can't you see any events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-09 05:14:33
-
-

*Thread Reply:* ohh…okk, will try out the test script that you have mentioned above. Kindly correct me if my understanding is correct, so if there are a few transformatiosna nd finally writing somewhere that is where the OL events are expected to be emitted?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-09 05:16:54
-
-

*Thread Reply:* yes. main purpose of the lineage is to track dependencies between the datasets, when a job reads from dataset A and writes to dataset B. In case of databricks notebook, that do show or collect and print some query result on the screen, there may be no reason to track it in the sense of lineage.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-09 15:25:14
-
-

@channel -We released OpenLineage 1.4.1! -Additions: -• Client: allow setting client’s endpoint via environment variable 2151 @Mars Lan -• Flink: expand Iceberg source types 2149 @Peter Huang -• Spark: add debug facet 2147 @Paweł Leszczyński -• Spark: enable Nessie REST catalog 2165 @julwin -Thanks to all the contributors, especially new contributors @Peter Huang and @julwin! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.4.1 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.3.1...1.4.1 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👍 Jason Yip, Ross Turk, Mars Lan, Harel Shein, Rodrigo Maia -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Drew Bittenbender - (drew@salt.io) -
-
2023-10-09 16:55:35
-
-

Hello. I am getting started with OL and Marquez with dbt. I am using dbt-ol. The namespace of the dataset showing up in Marquez is not the namespace I provide using OPENLINEAGENAMESPACE. It happens to be the same as the source in Marquez which is the snowflake account uri. It's obviously picking up the other env variable OPENLINEAGEURL so i am pretty sure its not the environment. Is this expected?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-09 18:56:13
-
-

*Thread Reply:* Hi Drew, thank you for using OpenLineage! I don’t know the details of your use case, but I believe this is expected, yes. In general, the dataset namespace is different. Jobs are namespaced separately from datasets, which are namespaced by their containing datasources. This is the case so datasets have the same name regardless of the job writing to them, as datasets are sometimes shared by jobs in different namespaces.

- - - -
- 👍 Drew Bittenbender -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-10 01:05:11
-
-

Any idea why "environment-properties" is gone in Spark 3.4+ in StartEvent?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-10 20:53:59
-
-

example:

- -

{"environment_properties":{"spark.databricks.clusterUsageTags.clusterName":"<a href="mailto:jason.yip@tredence.com">jason.yip@tredence.com</a>'s Cluster","spark.databricks.job.runId":"","spark.databricks.job.type":"","spark.databricks.clusterUsageTags.azureSubscriptionId":"a4f54399_8db8_4849_adcc_a42aed1fb97f","spark.databricks.notebook.path":"/Repos/jason.yip@tredence.com/segmentation/01_Data Prep","spark.databricks.clusterUsageTags.clusterOwnerOrgId":"4679476628690204","MountPoints":[{"MountPoint":"/databricks-datasets","Source":"databricks_datasets"},{"MountPoint":"/Volumes","Source":"UnityCatalogVolumes"},{"MountPoint":"/databricks/mlflow-tracking","Source":"databricks/mlflow-tracking"},{"MountPoint":"/databricks-results","Source":"databricks_results"},{"MountPoint":"/databricks/mlflow-registry","Source":"databricks/mlflow-registry"},{"MountPoint":"/Volume","Source":"DbfsReserved"},{"MountPoint":"/volumes","Source":"DbfsReserved"},{"MountPoint":"/","Source":"DatabricksRoot"},{"MountPoint":"/volume","Source":"DbfsReserved"}],"User":"<a href="mailto:jason.yip@tredence.com">jason.yip@tredence.com</a>","UserId":"4768657035718622","OrgId":"4679476628690204"}}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-11 03:46:13
-
-

*Thread Reply:* Is this related to any OL version? In OL 1.2.2. we've added extra variable spark.databricks.clusterUsageTags.clusterAllTags to be captured, but this should not break things.

- -

I think we're facing some issues on recent databricks runtime versions. Here is an issue for this: https://github.com/OpenLineage/OpenLineage/issues/2131

- -

Is the problem you describe specific to some databricks runtime versions?

-
- - - - - - - -
-
Labels
- integration/spark, integration/databricks -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-11 11:17:06
-
-

*Thread Reply:* yes, exactly Spark 3.4+

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-11 21:12:27
-
-

*Thread Reply:* Btw I don't understand the code flow entirely, if we are talking about a different classpath only, I see there's Unity Catalog handler in the code and it says it works the same as Delta, but I am not seeing it subclassing Delta. I suppose it will work the same.

- -

I am happy to jump on a call to show you if needed

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-16 02:58:56
-
-

*Thread Reply:* @Paweł Leszczyński do you think in Spark 3.4+ only one event would happen?

- -

/** - * We get exact copies of OL events for org.apache.spark.scheduler.SparkListenerJobStart and - * org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart. The same happens for end - * events. - * - * @return - */ - private boolean isOnJobStartOrEnd(SparkListenerEvent event) { - return event instanceof SparkListenerJobStart || event instanceof SparkListenerJobEnd; - }

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-10 23:43:39
-
-

@here i am trying out the databricks spark integration and in one of the events i am getting a openlineage event where the output dataset is having a facet called symlinks , the statement that generated this event is this sql -CREATE TABLE IF NOT EXISTS covid_research.covid_data -USING CSV -LOCATION '<abfss://oltptestdata@jeevanacceldata.dfs.core.windows.net/testdata/johns-hopkins-covid-19-daily-dashboard-cases-by-states.csv>' -OPTIONS (header "true", inferSchema "true"); -Can someone kindly let me know what this symlinks facet is. i tried seeing the spec but did not get it completely

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-10 23:44:53
-
-

*Thread Reply:* I use it to get the table with database name

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-10 23:47:15
-
-

*Thread Reply:* so can i think it like if there is a synlink, then that table is kind of a reference to the original dataset

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-11 01:25:44
-
-

*Thread Reply:* yes

- - - -
- 🙌 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-11 06:55:58
-
-

@here When i am running this sql as part of a databricks notebook, i am recieving an OL event where i see only an output dataset and there is no input dataset or a symlink facet inside the dataset to map it to the underlying azure storage object. Can anyone kindly help on this -spark.sql(f"CREATE TABLE IF NOT EXISTS covid_research.uscoviddata USING delta LOCATION '<abfss://oltptestdata@jeevanacceldata.dfs.core.windows.net/testdata/modified-delta>'") -{ - "eventTime": "2023-10-11T10:47:36.296Z", - "producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "schemaURL": "<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent>", - "eventType": "COMPLETE", - "run": { - "runId": "d0f40be9-b921-4c84-ac9f-f14a86c29ff7", - "facets": { - "spark.logicalPlan": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet>", - "plan": [ - { - "class": "org.apache.spark.sql.catalyst.plans.logical.CreateTable", - "num-children": 1, - "name": 0, - "tableSchema": [], - "partitioning": [], - "tableSpec": null, - "ignoreIfExists": true - }, - { - "class": "org.apache.spark.sql.catalyst.analysis.ResolvedIdentifier", - "num-children": 0, - "catalog": null, - "identifier": null - } - ] - }, - "spark_version": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet>", - "spark-version": "3.3.0", - "openlineage-spark-version": "1.2.2" - }, - "processing_engine": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet>", - "version": "3.3.0", - "name": "spark", - "openlineageAdapterVersion": "1.2.2" - } - } - }, - "job": { - "namespace": "default", - "name": "adb-3942203504488904.4.azuredatabricks.net.create_table.covid_research_db_uscoviddata", - "facets": {} - }, - "inputs": [], - "outputs": [ - { - "namespace": "dbfs", - "name": "/user/hive/warehouse/covid_research.db/uscoviddata", - "facets": { - "dataSource": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet>", - "name": "dbfs", - "uri": "dbfs" - }, - "schema": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet>", - "fields": [] - }, - "storage": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/StorageDatasetFacet.json#/$defs/StorageDatasetFacet>", - "storageLayer": "unity", - "fileFormat": "parquet" - }, - "symlinks": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet>", - "identifiers": [ - { - "namespace": "/user/hive/warehouse/covid_research.db", - "name": "covid_research.uscoviddata", - "type": "TABLE" - } - ] - }, - "lifecycleStateChange": { - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>", - "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet>", - "lifecycleStateChange": "CREATE" - } - }, - "outputFacets": {} - } - ] -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-10-11 06:57:46
-
-

*Thread Reply:* Hey Guntaka - can I ask you a favour? Can you please stop using @here or @channel - please keep in mind, you're pinging over 1000 people when you use that mention. Its incredibly distracting to have Slack notify me of a message that isn't pertinent to me.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-11 06:58:50
-
-

*Thread Reply:* sure noted @Damien Hawes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-10-11 06:59:34
-
-

*Thread Reply:* Thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-10-11 12:04:24
-
-

Hi @there, I am trying to make API call to get column-lineage information could you please let me know the url construct to retrieve the same? As per the API documentation I am passing the following url to GET column-lineage: http://localhost:5000/api/v1/column-lineage but getting error code:400. Thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-12 13:55:26
-
-

*Thread Reply:* Make sure to provide a dataset field nodeId as a query param in your request. If you’ve seeded Marquez with test metadata, you can use: -curl -XGET "<http://localhost:5002/api/v1/column-lineage?nodeId=datasetField%3Afood_delivery%3Apublic.delivery_7_days%3Acustomer_email>" -You can view the API docs for column lineage here!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-10-17 05:57:36
-
-

*Thread Reply:* Thanks Willy. The documentation says 'name space' so i constructed API Like this: -'http://marquez-web:3000/api/v1/column-lineage/nodeId=datasetField:file:/home/jovyan/Downloads/event_attribute.csv:eventType' -but it is still not working 😞

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Madhav Kakumani - (madhav.kakumani@6point6.co.uk) -
-
2023-10-17 06:07:06
-
-

*Thread Reply:* nodeId is constructed like this: datasetField:<namespace>:<dataset>:<field name>

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-11 13:00:01
-
-

@channel -Friendly reminder: this month’s TSC meeting, open to all, is tomorrow at 10 am PT: https://openlineage.slack.com/archives/C01CK9T7HKR/p1696531454431629

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-11 14:26:45
-
-

*Thread Reply:* Newly added discussion topics: -• a proposal to add a Registry of Consumers and Producers -• a dbt issue to add OpenLineage Dataset names to the Manifest -• a proposal to add Dataset support in Spark LogicalPlan Nodes -• a proposal to institute a certification process for new integrations

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-12 15:08:34
-
-

This might be a dumb question, I guess I need to setup local Spark in order for the Spark tests to run successfully?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-13 01:56:19
-
-

*Thread Reply:* just follow these instructions: https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#build

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-13 06:41:56
-
-

*Thread Reply:* when trying to install openlineage-java in local via this command --> cd ../../client/java/ && ./gradlew publishToMavenLocal, i am receiving this error -```> Task :signMavenJavaPublication FAILED

- -

FAILURE: Build failed with an exception.

- -

** What went wrong: -Execution failed for task ':signMavenJavaPublication'. -> Cannot perform signing task ':signMavenJavaPublication' because it has no configured signatory```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-13 13:35:06
-
-

*Thread Reply:* @Paweł Leszczyński this is what I am getting

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-13 13:36:00
-
-

*Thread Reply:* attaching the html

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-16 03:02:13
-
-

*Thread Reply:* which java are you using? what is your operation system (is it windows?)?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-16 03:35:18
-
-

*Thread Reply:* yes it is Windows, i downloaded java 8 but I can try to build it with Linux subsystem or Mac

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Guntaka Jeevan Paul - (jeevan@acceldata.io) -
-
2023-10-16 03:35:51
-
-

*Thread Reply:* In my case it is Mac

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-16 03:56:09
-
-

*Thread Reply: * Where: -Build file '/mnt/c/Users/jason/Downloads/github/OpenLineage/integration/spark/build.gradle' line: 9

- -

** What went wrong: -An exception occurred applying plugin request [id: 'com.adarshr.test-logger', version: '3.2.0'] -> Failed to apply plugin [id 'com.adarshr.test-logger'] - > Could not generate a proxy class for class com.adarshr.gradle.testlogger.TestLoggerExtension.

- -

** Try:

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-16 03:56:23
-
-

*Thread Reply:* tried with Linux subsystem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-16 04:04:29
-
-

*Thread Reply:* we don't have any restrictions for windows builds, however it is something we don't test regularly. 2h ago we did have a successful build on circle CI https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/8271/workflows/0ec521ae-cd21-444a-bfec-554d101770ea

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-16 04:13:04
-
-

*Thread Reply:* ... 111 more -Caused by: java.lang.ClassNotFoundException: org.gradle.api.provider.HasMultipleValues - ... 117 more

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-17 00:26:07
-
-

*Thread Reply:* @Paweł Leszczyński now I am doing gradlew instead of gradle on windows coz Linux one doesn't work. The doc didn't mention about setting up Spark / Hadoop and that's my original question -- do I need to setup local Spark? Now it's throwing an error on Hadoop: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-21 23:33:48
-
-

*Thread Reply:* Got it working with Mac, couldn't get it working with Windows / Linux subsystem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-22 13:08:40
-
-

*Thread Reply:* Now getting class not found despite build and test succeeded

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-22 21:46:23
-
-

*Thread Reply:* I uploaded the wrong jar.. there are so many jars, only the jar in the spark folder works, not subfolder

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-10-13 02:48:40
-
-

Hi team, I am running the following pyspark code in a cell: -```print("SELECTING 100 RECORDS FROM METADATA TABLE") -df = spark.sql("""select ** from

limit 100""")

- -

print("WRITING (1) 100 RECORDS FROM METADATA TABLE") -df.write.mode("overwrite").format('delta').save("") -df.createOrReplaceTempView("temp_metadata")

- -

print("WRITING (2) 100 RECORDS FROM METADATA TABLE") -df.write.mode("overwrite").format("delta").save("")

- -

print("READING (1) 100 RECORDS FROM METADATA TABLE") -dfread = spark.read.format('delta').load("") -dfread.createOrReplaceTempView("metadata_1")

- -

print("DOING THE MERGE INTO SQL STEP!") -dfnew = spark.sql(""" - MERGE INTO metadata1 - USING

- ON metadata1.id = tempmetadata.id - WHEN MATCHED THEN UPDATE SET - metadata1.id = tempmetadata.id, - metadata1.aspect = tempmetadata.aspect - WHEN NOT MATCHED THEN INSERT (id, aspect) - VALUES (tempmetadata.id, tempmetadata.aspect) -""")`` -I am running with debug log levels. I actually don't see any of the events being logged forSaveIntoDataSourceCommandor theMergeIntoCommand`, but OL is in fact emitting events to the backend. It seems like the events are just not being logged... I actually observe this for all delta table related spark sql queries...

- - - - - - - - - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-10-16 00:01:42
-
-

*Thread Reply:* Hi @Paweł Leszczyński is this expected? CMIIW but we should expect to see the events being logged when running with debug log level right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-10-16 04:17:30
-
-

*Thread Reply:* It's impossible to know without seeing how you've configured the listener.

- -

Can you show this configuration?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-10-17 03:15:20
-
-

*Thread Reply:* spark.openlineage.transport.url &lt;url&gt; -spark.openlineage.transport.endpoint /&lt;endpoint&gt; -spark.openlineage.transport.type http -spark.extraListeners io.openlineage.spark.agent.OpenLineageSparkListener -spark.openlineage.facets.custom_environment_variables [BUNCH_OF_VARIABLES;] -spark.openlineage.facets.disabled [spark_unknown\;spark.logicalPlan] -These are my spark configs... I'm setting log level to debug with sc.setLogLevel("DEBUG")

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Damien Hawes - (damien.hawes@booking.com) -
-
2023-10-17 04:40:03
-
-

*Thread Reply:* Two things:

- -
  1. If you want debug logs, you're going to have to provide a log4j.properties file or log4j2.properties file depending on the version of spark you're running. In that file, you will need to configure the logging levels. If I am not mistaken, the sc.setLogLevel controls ONLY the log levels of Spark namespaced components (i.e., org.apache.spark)
  2. You're telling the listener to emit to a URL. If you want to see the events emitted to the console, then set spark.openlineage.transport.type=console, and remove the other spark.openlineage.transport.** configurations. -Do either (1) or (2).
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-10-20 00:49:45
-
-

*Thread Reply:* @Damien Hawes Hi, sflr.

- -
  1. So enabling sc.setLogLevel does actually enable debug logs from Openlineage. I can see the events and everyting being logged if I save it as a parquet format instead of delta.
  2. I do want to emit events to the url. But, I would like to just see what exactly are the events being emitted for some specific jobs, since I see that the lineage is incorrect for some MergeInto cases
  3. -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-10-26 04:56:50
-
-

*Thread Reply:* Hi @Damien Hawes would like to check again on whether you'd have any thoughts about this... Thanks! 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-10-17 03:17:57
-
-

Hello All 👋! -We are currently trying to work the the spark integration for OpenLineage in our Databricks instance. The general setup is done and working with a few hicups here and there. -But one thing we are still struggling is how to link all spark jobs events with a Databricks job or a notebook run. -We´ve recently noticed that some of the events produced by OL have the "environment-properties" attribute with information (for our context) regarding notebook path (if it is a notebook run), or the the job run ID (if its a databricks job run). But the thing is that these attributes are not always present. -I ran some samples yesterday for a job with 4 notebook tasks. From all 20 json payload sent by the OL listener, only 3 presented the "environment-properties" attribute. Its not only happening with Databricks jobs. When i run single notebooks and each cell has its onw set of spark jobs, not all json events presented that property either.

- -

So my question is what is the criteria to have this attributes present or not in the event json file? Or maybe this in an issue? @Jason Yip did you find out anything about this?

- -

⚙️ Spark 3.4 / OL-Spark 1.4.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-17 06:55:47
-
-

*Thread Reply:* In general, we assume that OL events per run are cumulative. So, if you have 20 events with the same runId , then even if a single event contains some facet, we consider this is OK and let the backend combine it together. That's what we do in Marquez project (a reference backend architecture for OL) and that's why it is worth to use in Marquez as a rest API.

- -

Are you able to use job namespace to aggregate all the Spark actions run within the databricks notebook? This is something that should serve this purpose.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-17 12:48:33
-
-

*Thread Reply:* @Rodrigo Maia for Spark 3.4 I don't see the environment-properties showing up at all, but if you run the code as it is, register a listener on SparkListenerJobStart and get the properties, all of those properties will show up. There's an event filter that filters out the SparkListenerJobStart, I suspect that filtered out the "unneccessary" events.. was trying to do a custom build to do that, but still trying to setup Hadoop and Spark on my local

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-10-18 05:23:16
-
-

*Thread Reply:* @Paweł Leszczyński you are right. This is what we are doing as well, combining events with the same runId to process the information on our backend. But even so, there are several runIds without this information. I went through these events to have a better view of what was happening. As you can see from 7 runIds, only 3 were showing the "environment-properties" attribute. Some condition is not being met here, or maybe it is what @Jason Yip suspects and there's some sort of filtering of unnecessary events

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-19 02:28:03
-
-

*Thread Reply:* @Rodrigo Maia, If you are able to provide a small Spark script such that none of the OL events contain the environment-properties, but at least one should, please raise an issue for this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-19 02:29:11
-
-

*Thread Reply:* It's extremely helpful when community open issues that are not only described well, but also contain small piece of code needed to reproduce this.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-10-19 02:59:39
-
-

*Thread Reply:* I know. that's the goal. that is why I wanted to understand in the first place if there was any condition preventing this from happening, but now i get that this is not expected behaviour.

- - - -
- 👍 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-19 13:44:00
-
-

*Thread Reply:* @Paweł Leszczyński @Rodrigo Maia I am referring to this: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/main/java/io/openlineage/spark/agent/filters/DeltaEventFilter.java#L51

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-19 14:49:03
-
-

*Thread Reply:* Please note that I am getting the same behavior, no code is needed, Spark 3.4+ won't be generating no matter what. I have been testing the same code for 2 months from this issue: https://github.com/OpenLineage/OpenLineage/issues/2124

- -

I tried the code without OL and it worked perfectly, so it is OL filtering out the event for sure. I will try posting the code I use to collect the properties.

-
- - - - - - - -
-
Labels
- integration/spark, integration/databricks -
- -
-
Comments
- 3 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-19 23:46:17
-
-

*Thread Reply:* this code proves that the prosperities are still there, somehow got filtered out by OL:

- -

```%scala -import org.apache.spark.scheduler._

- -

class JobStartListener extends SparkListener { - override def onJobStart(jobStart: SparkListenerJobStart): Unit = { - // Extract properties here - val jobId = jobStart.jobId - val stageInfos = jobStart.stageInfos - val properties = jobStart.properties

- -
// You can print properties or save them somewhere
-println(s"JobId: $jobId, Stages: ${stageInfos.size}, Properties: $properties")
-
- -

} -}

- -

val listener = new JobStartListener() -spark.sparkContext.addSparkListener(listener)

- -

val df = spark.range(1000).repartition(10) -df.count()```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-19 23:55:05
-
-

*Thread Reply:* of course feel free to test this logic as well, it still works -- if not the filtering:

- -

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/[…]ark/agent/facets/builder/DatabricksEnvironmentFacetBuilder.java

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-10-30 04:46:16
-
-

*Thread Reply:* Any ideas on how could i test it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ankit jain - (ankit.goods10@gmail.com) -
-
2023-10-17 22:57:03
-
-

Hello All, I am completely new for Openlineage, I have to setup the lab to conduct POC on various aspects like Lineage, metadata management , etc. As per openlineage site, i tried downloading Ubuntu, docker and binary files for Marquez. But I am lost somewhere and unable to configure whole setup. Can someone please assist in steps to start from scratch so that i can delve into the Openlineage capabilities. Many thanks

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-18 01:32:01
-
-

*Thread Reply:* hey, did you try to follow one of these guides? -https://openlineage.io/docs/guides/about

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-18 09:14:08
-
-

*Thread Reply:* Which guide were you using, and what errors/issues are you encountering?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ankit jain - (ankit.goods10@gmail.com) -
-
2023-10-21 15:43:14
-
-

*Thread Reply:* Thanks Jakub for the response.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ankit jain - (ankit.goods10@gmail.com) -
-
2023-10-21 15:45:42
-
-

*Thread Reply:* In docker, marquez-api image is not running and exiting with the exit code 127.

- -
- - - - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-22 09:34:53
-
-

*Thread Reply:* @ankit jain thanks. I don't recognize 127, but 9 times out of 10 if the API or DB container fails the reason is a port conflict. Have you checked if port 5000 is available?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-22 09:54:10
-
-

*Thread Reply:* could you please check what’s the output of -git config --get core.autocrlf -or -git config --global --get core.autocrlf -?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ankit jain - (ankit.goods10@gmail.com) -
-
2023-10-24 08:09:14
-
-

*Thread Reply:* @Michael Robinson thanks , I checked the port 5000 is not available. -I tried deleting docker images and recreating them, but still the same issue persist stating -/Usr/bin/env bash/r not found. -Gradle build is successful.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ankit jain - (ankit.goods10@gmail.com) -
-
2023-10-24 08:09:54
-
-

*Thread Reply:* @Jakub Dardziński thanks, first command resulted as true and second command has no response

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-24 08:15:57
-
-

*Thread Reply:* are you running docker and git in Windows or Mac OS before 10.0?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Matthew Paras - (matthewparas2020@u.northwestern.edu) -
-
2023-10-19 15:00:42
-
-

Hey all - we've been noticing that some events go unreported by openlineage (spark) when the AsyncEventQueue fills up and starts dropping events. Wondering if anyone has experienced this before, and knows why it is happening? We've expanded the event queue capacity and thrown more hardware at the problem but no dice

- -

Also as a note, the query plans from this job are pretty big - could the listener just be choking up? Happy to open a github issue as well if we suspect that it could be the listener itself having issues

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Anirudh Shrinivason - (anirudh.shrinivason@grabtaxi.com) -
-
2023-10-20 02:57:50
-
-

*Thread Reply:* Hi, just checking, are you excluding the sparkPlan from the events? Or is it sending the spark plan too

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-10-23 11:59:40
-
-

*Thread Reply:* yeah - setting spark.openlineage.facets.disabled to [spark_unknown;spark.logicalPlan] should help

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Matthew Paras - (matthewparas2020@u.northwestern.edu) -
-
2023-10-24 17:50:26
-
-

*Thread Reply:* sorry for the late reply - turns out this job is just whack 😄 we were going in circles trying to figure it out, we end up dropping events without open lineage enabled at all. But good to know that disabling the logical plan should speed us up if we run into this again

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
praveen kanamarlapudi - (kpraveen420@gmail.com) -
-
2023-10-20 18:18:37
-
-

Hi,

- -

We are using openlineage spark connector. We have used spark 3.2 and scala 2.12 so far. We have triggered a new job with Spark 3.4 and scala 2.13 and faced below exception.

- -

java.lang.NoSuchMethodError: 'scala.collection.Seq org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.map(scala.Function1)' - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.lambda$buildInputDatasets$6(OpenLineageRunEventBuilder.java:341) - at java.base/java.util.Optional.map(Optional.java:265) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildInputDatasets(OpenLineageRunEventBuilder.java:339) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.populateRun(OpenLineageRunEventBuilder.java:295) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:279) - at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:222) - at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:72) - at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:91)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-23 04:56:25
-
-

*Thread Reply:* Hmy, that is interesting. Did it occur on databricks runtime? Could you give it a try with Scala 2.12? I think we don't test scala 2.13.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
praveen kanamarlapudi - (kpraveen420@gmail.com) -
-
2023-10-23 12:02:13
-
-

*Thread Reply:* I believe our Scala 2.12 jobs are working fine. It's not databricks runtime. We run Spark on Kube.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-24 06:47:14
-
-

*Thread Reply:* Ok. I think You can raise an issue to support Scala 2.13 for latest Spark versions.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-10-26 06:13:40
-
-

Hi I want to customise the events which comes from Openlineage spark . Can some one give some information

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-26 07:45:41
-
-

*Thread Reply:* Hi @priya narayana, please get familiar with Extending section on our docs: https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#extending

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-10-26 09:53:07
-
-

*Thread Reply:* Okay thank you. Just checking any other docs or git code which also can help me

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:11:17
-
-

Hello Team

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:12:38
-
-

Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:14:02
-
-

*Thread Reply:* @Jakub Dardziński any thoughts?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 13:14:24
-
-

*Thread Reply:* what version of Airflow are you using?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:14:52
-
-

*Thread Reply:* 2.6.3 that satisfies the requirement

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 13:16:38
-
-

*Thread Reply:* is it possible you have some custom operator?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:17:15
-
-

*Thread Reply:* i think its the base operator causing the issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:17:36
-
-

*Thread Reply:* so no i believe

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 13:18:43
-
-

*Thread Reply:* BaseOperator is parent class for any other operators, it defines how to do deepcopy

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:19:11
-
-

*Thread Reply:* yeah so its controlled by Airflow itself, I didnt customize it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 13:19:49
-
-

*Thread Reply:* uhm, maybe it's possible you could share dag code? you may hide sensitive data

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:21:23
-
-

*Thread Reply:* let me try with lower versions of openlineage, what's say

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:21:39
-
-

*Thread Reply:* its a big jump from 0.24.0 to 1.4.1

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:22:25
-
-

*Thread Reply:* but i will help here to investigate this issue

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 13:24:03
-
-

*Thread Reply:* for me it seems that within dag or task you're defining some object that is not easy to copy

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:26:05
-
-

*Thread Reply:* possible, but with 0.24.0 that issue is not occurring, so worry is that the version upgrade could potentially break things

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 13:39:34
-
-

*Thread Reply:* 0.24.0 is not that old 🤔

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 13:45:07
-
-

*Thread Reply:* i see the issue with 0.24.0 I see it as warning -[airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/threading.py", line 932, in _bootstrap_inner -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - self.run() -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/threading.py", line 870, in run -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - self._target(**self._args, ****self._kwargs) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/home/upgrade/.local/lib/python3.8/site-packages/openlineage/airflow/listener.py", line 89, in on_running -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - task_instance_copy = copy.deepcopy(task_instance) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 172, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = _reconstruct(x, memo, **rv) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 270, in _reconstruct -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - state = deepcopy(state, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 172, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = _reconstruct(x, memo, **rv) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 270, in _reconstruct -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - state = deepcopy(state, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 153, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/home/upgrade/.local/lib/python3.8/site-packages/airflow/models/dag.py", line 2162, in __deepcopy__ -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - setattr(result, k, copy.deepcopy(v, memo)) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 153, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/home/upgrade/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 1224, in __deepcopy__ -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - setattr(result, k, copy.deepcopy(v, memo)) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 172, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = _reconstruct(x, memo, **rv) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 270, in _reconstruct -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - state = deepcopy(state, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 153, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/home/upgrade/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 1224, in __deepcopy__ -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - setattr(result, k, copy.deepcopy(v, memo)) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File "/usr/lib64/python3.8/copy.py", line 161, in deepcopy -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - rv = reductor(4) -[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - TypeError: cannot pickle 'module' object -but with 1.4.1 its stopped processing any further and threw error

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:18:08
-
-

*Thread Reply:* I see the difference of calling in these 2 versions, current versions checks if Airflow is >2.6 then directly runs on_running but earlier version was running on separate thread. IS this what's raising this exception?

- - - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:24:49
-
-

*Thread Reply:* this is the issue - https://github.com/OpenLineage/OpenLineage/blob/c343835c1664eda94d5c315897ae6702854c81bd/integration/airflow/openlineage/airflow/listener.py#L89 while copying the task

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:25:21
-
-

*Thread Reply:* since we are directly running if version>2.6.0 therefore its throwing error in main processing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:28:02
-
-

*Thread Reply:* may i know which Airflow version we tested this process?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:28:39
-
-

*Thread Reply:* im on 2.6.3

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 14:30:53
-
-

*Thread Reply:* 2.1.4, 2.2.4, 2.3.4, 2.4.3, 2.5.2, 2.6.1 -usually there are not too many changes between minor versions

- -

I still believe it might be some code you might improve and probably is also an antipattern in airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:34:26
-
-

*Thread Reply:* hummm...that's a valid observation but I dont write DAGS, other teams do, so imagine if many people wrote such DAGS I can't ask everyone to change their patterns right? If something is running on current openlineage version with warning that should still be running on upgraded version isn't it?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:38:04
-
-

*Thread Reply:* however I see ur point

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:49:52
-
-

*Thread Reply:* So that specific task has 570 line of query and pretty bulky query, let me split into smaller units

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:50:15
-
-

*Thread Reply:* that should help right? @Jakub Dardziński

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 14:51:27
-
-

*Thread Reply:* query length shouldn’t be the issue, rather any python code

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 14:51:50
-
-

*Thread Reply:* I get your point too, we might figure out some mechanism to skip irrelevant parts of task instance so that it doesn’t fail then

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:52:12
-
-

*Thread Reply:* actually its failing on that task itself

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:52:33
-
-

*Thread Reply:* let me try it will be pretty quick

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-26 14:58:58
-
-

*Thread Reply:* @Jakub Dardziński but ur right we have to fix this at Openlineage side as well. Because ideally Openlineage shouldn't be causing any issue to the main DAG processing

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-10-26 17:51:05
-
-

*Thread Reply:* it doesn’t break any airflow functionality, execution is wrapped into try/except block, only exception traceback is logged as you can see

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-10-27 05:25:54
-
-

*Thread Reply:* Can you migrate to Airflow 2.7 and use apache-airflow-providers-openlineage? Ideally we wouldn't make meaningful changes to openlineage-airflow

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-27 11:35:44
-
-

*Thread Reply:* yup thats what im planning to do

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-27 13:59:03
-
-

*Thread Reply:* referencing to https://openlineage.slack.com/archives/C01CK9T7HKR/p1698398754823079?threadts=1698340358.557159&cid=C01CK9T7HKR|this conversation - what it takes to move to openlineage provider package from openlineage-airflow. Im updating Airflow to 2.7.2 but moving off of openlineage-airflow to provider package Im trying to estimate the amount of work it takes, any thoughts? reading changelogs I dont think its too much of a change but please share your thoughts and if somewhere its drafted please do share that as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-10-30 08:21:10
-
-

*Thread Reply:* Generally not much - I would maybe think of a operator coverage. For example, for BigQuery old openlineage-airflow supports BigQueryExecuteQueryOperator. However, new apache-airflow-providers-openlineage supports BigQueryInsertJobOperator - because it's intended replacement for BigQueryExecuteQueryOperator and Airflow community does not want to accept contributions to deprecated operators.

- - - -
- 🙏 harsh loomba -
- -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-31 15:00:38
-
-

*Thread Reply:* one question if someone is around - when im keeping both openlineage-airflow and apache-airflow-providers-openlineage in my requirement file, i see the following error - -from openlineage.airflow.extractors import Extractors -ModuleNotFoundError: No module named 'openlineage.airflow' -any thoughts?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-10-31 15:37:07
-
-

*Thread Reply:* I would usually do a pip freeze | grep openlineage as a sanity check to validate that the module is actually installed. Not sure how the provider and the module play together though

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
harsh loomba - (hloomba@upgrade.com) -
-
2023-10-31 17:07:41
-
-

*Thread Reply:* yeah so @John Lukenoff im not getting how i can use the specific extractor when i run my operator. Say for example, I have custom datawarehouseOperator and i want to override getopenlineagefacetsonstart and getopenlineagefacetsoncomplete using the redshift extractor then how would i do that?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-10-27 05:49:25
-
-

Spark Integration Logs -Hey There -Are these events skipped because it's not supported or it's configured somewhere? -23/10/27 08:25:58 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart -23/10/27 08:25:58 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Hitesh - (splicer9904@gmail.com) -
-
2023-10-27 08:12:32
-
-

Hi People, actually I want to intercept the OpenLineage spark events right after the job ends and before they are emitted, so that I can add some extra information to the events or remove some information that I don't want. -Is there any way of doing this? Can someone please help me

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-10-30 09:03:57
-
-

*Thread Reply:* It general, I think this kind of use case is probably best served by facets, but what do you think @Paweł Leszczyński?

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:01:12
-
-

Hello, has anyone run into similar error as posted in this github open issues[https://github.com/MarquezProject/marquez/issues/2468] while setting up marquez on an EC2 Instance, would appreciate any help to get past the errors

-
- - - - - - - -
-
Comments
- 5 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:04:30
-
-

*Thread Reply:* Hmm, have you looked over our Running on AWS docs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:06:08
-
-

*Thread Reply:* More specifically, the AWS RDS section. How are you deploying Marquez on Ec2?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:08:05
-
-

*Thread Reply:* we were primarily referencing this document on git - https://github.com/MarquezProject/marquez

-
- - - - - - - -
-
Website
- <https://marquezproject.ai> -
- -
-
Stars
- 1450 -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:09:05
-
-

*Thread Reply:* leveraged docker and docker-compose

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:13:10
-
-

*Thread Reply:* hmm so you’re running docker-compose up on an Ec2 instance you’ve ssh’d into? (just trying to understand your setup better)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:13:26
-
-

*Thread Reply:* yes, thats correct

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:16:39
-
-

*Thread Reply:* I’ve only used docker compose for local dev or integration tests. but, ok you’re probably in the PoC phase. Can you run the docker cmd on you local machine successfully? What OS is stalled on the Ec2 instance?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:18:00
-
-

*Thread Reply:* yes, i can run and the OS is Ubuntu 20.04.6 LTS

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:19:27
-
-

*Thread Reply:* we initiallly ran into a permission denied error related to postgressql.conf file and we had to update file permissions to 777 and after which we started to see below errors

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:19:36
-
-

*Thread Reply:* marquez-db | 2023-10-27 20:35:52.512 GMT [35] FATAL: no pghba.conf entry for host "172.18.0.5", user "marquez", database "marquez", no encryption - marquez-db | 2023-10-27 20:35:52.529 GMT [36] FATAL: no pghba.conf entry for host "172.18.0.5", user "marquez", database "marquez", no encryption

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:20:12
-
-

*Thread Reply:* we then manually updated pg_hba.conf file to include host user and db details

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:20:42
-
-

*Thread Reply:* Did you also update the marquez.yml with the db user / password?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:20:48
-
-

*Thread Reply:* after which we started to see the errors posted in the github open issues page

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:21:33
-
-

*Thread Reply:* hmm are you using an external database or are you spinning up the entire Marquez stack with docker compose?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:21:56
-
-

*Thread Reply:* we are spinning up the entire Marquez stack with docker compose

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:23:24
-
-

*Thread Reply:* we did not change anything in the marquez.yml, i think we did not find that file in the github repo that we cloned into our local instance

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:26:31
-
-

*Thread Reply:* It’s important that the init-db.sh script runs, but I don’t think it is

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:26:56
-
-

*Thread Reply:* can you grab all the docker compose logs and share them? it’s hard to debug otherwise

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:29:59
-
-

*Thread Reply:*

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:33:15
-
-

*Thread Reply:* I would first suggest to remove the --build flag since you are specifying a version of Marquez to use via --tag

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:33:49
-
-

*Thread Reply:* no the issue per se, but will help clear up some of the logs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:35:06
-
-

*Thread Reply:* for sure thanks. we could get the logs without the --build portion, we tried with that option just once

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:35:40
-
-

*Thread Reply:* the errors were the same with/without --build option

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 17:36:02
-
-

*Thread Reply:* marquez-api | ERROR [2023-10-27 21:34:58,019] org.apache.tomcat.jdbc.pool.ConnectionPool: Unable to create initial connections of pool. - marquez-api | ! org.postgresql.util.PSQLException: FATAL: password authentication failed for user "marquez" - marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:693) - marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:203) - marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258) - marquez-api | ! at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54) - marquez-api | ! at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:253) - marquez-api | ! at org.postgresql.Driver.makeConnection(Driver.java:434) - marquez-api | ! at org.postgresql.Driver.connect(Driver.java:291) - marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:346) - marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:227) - marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:768) - marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:696) - marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.init(ConnectionPool.java:495) - marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.<init>(ConnectionPool.java:153) - marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.pCreatePool(DataSourceProxy.java:118) - marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.createPool(DataSourceProxy.java:107) - marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:131) - marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcUtils.openConnection(JdbcUtils.java:48) - marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcConnectionFactory.<init>(JdbcConnectionFactory.java:75) - marquez-api | ! at org.flywaydb.core.FlywayExecutor.execute(FlywayExecutor.java:147) - marquez-api | ! at org.flywaydb.core.Flyway.info(Flyway.java:190) - marquez-api | ! at marquez.db.DbMigration.hasPendingDbMigrations(DbMigration.java:73) - marquez-api | ! at marquez.db.DbMigration.migrateDbOrError(DbMigration.java:27) - marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:105) - marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:48) - marquez-api | ! at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:67) - marquez-api | ! at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:98) - marquez-api | ! at io.dropwizard.cli.Cli.run(Cli.java:78) - marquez-api | ! at io.dropwizard.Application.run(Application.java:94) - marquez-api | ! at marquez.MarquezApp.main(MarquezApp.java:60) - marquez-api | INFO [2023-10-27 21:34:58,024] marquez.MarquezApp: Stopping app...

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:38:52
-
-

*Thread Reply:* debugging docker issues like this is so difficult

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:40:44
-
-

*Thread Reply:* it could be a number of things, but you are connected to the database it’s just that the marquez user hasn’t been created

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:41:59
-
-

*Thread Reply:* the /init-db.sh is what manages user creation

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:42:17
-
-

*Thread Reply:* so it’s possible that the script isn’t running for whatever reason on your Ec2 instance

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:44:20
-
-

*Thread Reply:* do you have other services running on that Ec2 instance? Like, other than Marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Willy Lulciuc - (willy@datakin.com) -
-
2023-10-27 17:44:52
-
-

*Thread Reply:* is there a postgres process running outside of docker?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 20:34:50
-
-

*Thread Reply:* no other services except marquez on this EC2 instance

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 20:35:49
-
-

*Thread Reply:* this was a new Ec2 instance that was spun up to install and use marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-27 20:36:09
-
-

*Thread Reply:* n we can confirm that no postgres process runs outside of docker

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-29 03:06:28
-
-

I realize in Spark 3.4+, some job ids don't have a start event. What part of the code is responsible for triggering the START and COMPLETE event

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-10-30 09:59:53
-
-

*Thread Reply:* hi @Jason Yip could you provide an example of such a job?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-10-30 16:51:55
-
-

*Thread Reply:* @Paweł Leszczyński same old:

- -

delete the old table if needed

- -

_ = spark.sql('DROP TABLE IF EXISTS transactions')

- -

expected structure of the file

- -

transactionsschema = StructType([ - StructField('householdid', IntegerType()), - StructField('basketid', LongType()), - StructField('day', IntegerType()), - StructField('productid', IntegerType()), - StructField('quantity', IntegerType()), - StructField('salesamount', FloatType()), - StructField('storeid', IntegerType()), - StructField('discountamount', FloatType()), - StructField('transactiontime', IntegerType()), - StructField('weekno', IntegerType()), - StructField('coupondiscount', FloatType()), - StructField('coupondiscountmatch', FloatType()) - ])

- -

read data to dataframe

- -

df = (spark - .read - .csv( - adlsRootPath + '/examples/data/csv/completejourney/transactiondata.csv', - header=True, - schema=transactionsschema))

- -

df.write\ - .format('delta')\ - .mode('overwrite')\ - .option('overwriteSchema', 'true')\ - .option('path', adlsRootPath + '/examples/data/csv/completejourney/silver/transactions')\ - .saveAsTable('transactions')

- -

df.count()

- -

# create table object to make delta lake queryable

- -

_ = spark.sql(f'''

- -

CREATE TABLE transactions

- -

USING DELTA

- -

LOCATION '{adlsRootPath}/examples/data/csv/completejourney/silver/transactions'

- -

''')

- -

show data

- -

display( - spark.table('transactions') - )

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-10-30 18:51:43
-
-

👋 Hi team, cross-posting from the Marquez Channel in case anyone here has a better idea of the spec

- -

> For most of our lineage extractors in airflow, we are using the rust sql parser from openlineage-sql to extract table lineage via sql statements. When errors occur we are adding an extractionError run facet similar to what is being done here. I’m finding in the case that multiple statements were extracted but one failed to parse while many others were successful, the lineage for these runs doesn’t appear as expected in Marquez. Is there any logic around the extractionError run facet that could be causing this? It seems reasonable to assume that we might take this to mean the entire run event is invalid if we have any extraction errors. -> -> I would still expect to see the other lineage we sent for the run but am instead just seeing the extractionError in the marquez UI, in the database, runs with an extractionError facet don’t seem to make it to the job_versions_io_mapping table

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-10-31 06:34:05
-
-

*Thread Reply:* Can you show the actual event? Should be in the events tab in Marquez

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-31 11:59:07
-
-

*Thread Reply:* @John Lukenoff, would you mind posting the link to Marquez teams slack channel?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-10-31 12:15:37
-
-

*Thread Reply:* yep here is the link: https://marquezproject.slack.com/archives/C01E8MQGJP7/p1698702140709439

- -

This is the full event, sanitized of internal info: -{ - "job": { - "name": "some_dag.some_task", - "facets": {}, - "namespace": "default" - }, - "run": { - "runId": "a9565df2-f1a1-3ee3-b202-7626f8c4b92d", - "facets": { - "extractionError": { - "errors": [ - { - "task": "ALTER SESSION UNSET QUERY_TAG;", - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.24.0/client/python>", - "_schemaURL": "<https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet>", - "taskNumber": 0, - "errorMessage": "Expected one of TABLE or INDEX, found: SESSION" - } - ], - "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.24.0/client/python>", - "_schemaURL": "<https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ExtractionErrorRunFacet>", - "totalTasks": 1, - "failedTasks": 1 - } - } - }, - "inputs": [ - { - "name": "foo.bar", - "facets": {}, - "namespace": "snowflake" - }, - { - "name": "fizz.buzz", - "facets": {}, - "namespace": "snowflake" - } - ], - "outputs": [ - { "name": "foo1.bar2", "facets": {}, "namespace": "snowflake" }, - { - "name": "fizz1.buzz2", - "facets": {}, - "namespace": "snowflake" - } - ], - "producer": "<https://github.com/MyCompany/repo/blob/next-master/company/data/pipelines/airflow_utils/openlineage_utils/client.py>", - "eventTime": "2023-10-30T02:46:13.367274Z", - "eventType": "COMPLETE" -}

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-31 12:43:07
-
-

*Thread Reply:* thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-31 13:14:29
-
-

*Thread Reply:* @John Lukenoff, sorry to trouble again, is the slack channel still active? for whatever reason i cant get to this workspace

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-10-31 13:15:26
-
-

*Thread Reply:* yep it’s still active, maybe you need to join the workspace first? https://join.slack.com/t/marquezproject/shared_invite/zt-266fdhg9g-TE7e0p~EHK50GJMMqNH4tg

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Kavitha - (kkandaswamy@cardinalcommerce.com) -
-
2023-10-31 13:25:51
-
-

*Thread Reply:* that was a good call. the link you just shared worked! thank you!

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-10-31 13:27:55
-
-

*Thread Reply:* yeah from OL perspective this looks good - the inputs and outputs are there, the extraction error facet looks like it should

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-10-31 13:28:05
-
-

*Thread Reply:* must be some Marquez hiccup 🙂

- - - -
- 👍 John Lukenoff -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-10-31 13:28:45
-
-

*Thread Reply:* Makes sense, I’ll tail my marquez logs today to see if I can find anything

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-11-01 19:37:06
-
-

*Thread Reply:* Somehow this started working after we switched from our beta to prod infrastructure. I suspect something was failing due to constraints on the size of our db and the load of poor quality data it was under after months of testing against it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-01 11:34:43
-
-

@channel -I’m opening a vote to release OpenLineage 1.5.0, including: -• support for Cassandra Connectors lineage in the Flink integration -• support for Databricks Runtime 13.3 in the Spark integration -• support for rdd and toDF operations from the Spark Scala API in Spark -• lowered requirements for attrs and requests packages in the Airflow integration -• lazy rendering of yaml configs in the dbt integration -• bug fixes, tests, infra fixes, doc changes, and more. -Three +1s from committers will authorize an immediate release.

- - - -
- ➕ Jakub Dardziński, William Angel, Abdallah, Willy Lulciuc, Paweł Leszczyński, Julien Le Dem -
- -
- 👍 Jason Yip -
- -
- 🚀 Luca Soato, tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-02 05:11:58
-
-

*Thread Reply:* Thanks, all. The release is authorized and will be initiated within 2 business days.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-01 13:29:09
-
-

@channel -The October 2023 issue of OpenLineage News is available now! to get in directly in your inbox each month.

-
-
apache.us14.list-manage.com
- - - - - - - - - - - - - - - -
- - - -
- 👍 Mars Lan, harsh loomba -
- -
- 🎉 tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
John Lukenoff - (john@jlukenoff.com) -
-
2023-11-01 19:40:39
-
-

Hi team 👋 , we’re finding that for our Spark jobs we are almost always getting some junk characters in our dataset names. We’ve pushed the regex filter to its limits and would like to extend the logic of deriving the dataset name in openlineage-spark (currently on 1.4.1). I seem to recall hearing we could do this by implementing our own LogicalPlanVisitor or something along those lines? Is that still the recommended approach and if so would this be possible to implement in Scala vs. Java (scala noob here 🙂)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-02 03:34:15
-
-

*Thread Reply:* Hi John, we're always happy to help with the contribution.

- -

One of the possible solutions to this would be to do that just in openlineage-java client: -• introduce config entry like normalizeDatasetNameToAscii : enabled/disabled -• modify DatasetIdentifier class to contain static member boolean normalizeDatasetNameToAscii and normalize dataset name according to this setting -• additionally, you would need to add config entry in io.openlineage.client.OpenLineageYaml and make sure both loadOpenLineageYaml methods set DatasetIdentifier.normalizeDatasetNameToAscii based on the config -• document this in the doc -So, no Scala nor custom logical plan visitors required.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-02 03:34:47
-
-

*Thread Reply:* https://github.com/OpenLineage/OpenLineage/blob/main/client/java/src/main/java/io/openlineage/client/utils/DatasetIdentifier.java

-
- - - - - - - - - - - - - - - - -
- - - -
- 🙌 John Lukenoff -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Mike Fang - (fangmik@amazon.com) -
-
2023-11-01 20:30:38
-
-

I am looking to send OpenLineage events to an AWS API Gateway endpoint from an AWS MWAA instance. The problem is that all requests to AWS services need to be signed with SigV4, and using API Gateway with IAM authentication would require requests to API Gateway be signed with SigV4. Would the best way to do so be to just modify the python client HTTP transport to include a new config option for signing emitted OpenLineage events with SigV4? Are there any alternatives?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-02 02:41:50
-
-

*Thread Reply:* there’s actually an issue for that: -https://github.com/OpenLineage/OpenLineage/issues/2189

- -

but the way to do this is imho to create new custom transport (it might inherit from HTTP transport) and register it in transport factory

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike Fang - (fangmik@amazon.com) -
-
2023-11-02 13:05:05
-
-

*Thread Reply:* I am thinking of just modifying the HTTP transport and using requests.auth.AuthBase to create different auth methods instead of a TokenProvider class

- -

Classes which subclass requests.auth.AuthBase can also just directly be given to the requests call in the auth parameter

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-02 14:40:24
-
-

*Thread Reply:* would you like to contribute? 🙂

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike Fang - (fangmik@amazon.com) -
-
2023-11-02 14:43:05
-
-

*Thread Reply:* I was about to contribute, but I actually just realized that there is an existing way to provide a custom transport that would solve form y use case. My only question is how do I register this custom transport in my MWAA environment? Can I provide the custom transport as an Airflow plugin and then specify the class in the Openlineage.yml config? Will it automatically pick it up?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-02 15:45:56
-
-

*Thread Reply:* although I did not test this in MWAA but locally only: I’ve created Airflow plugin that in __init__.py has defined (or imported) following code: -```from openlineage.client.transport import register_transport, Transport, Config

- -

@register_transport -class FakeTransport(Transport): - kind = "fake" - config = Config

- -
def __init__(self, config: Config) -> None:
-    print(config)
-
-def emit(self, event) -> None:
-    print(event)```
-
- -

setting AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "fake"}' does take effect and I can see output in Airflow logs

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-02 15:47:45
-
-

*Thread Reply:* in setup.py it’s: -..., - entry_points={ - 'airflow.plugins': [ - 'custom_transport = custom_transport:CustomTransportPlugin', - ], - }, - install_requires=["openlineage-python"] -)

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Mike Fang - (fangmik@amazon.com) -
-
2023-11-03 12:52:55
-
-

*Thread Reply:* ok great thanks for following up on this, super helpful

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-02 12:00:00
-
-

@channel -We released OpenLineage 1.5.0, including: -• support for Cassandra Connectors lineage in the Flink integration by @Peter Huang -• support for Databricks Runtime 13.3 in the Spark integration by @Paweł Leszczyński -• support for rdd and toDF operations from the Spark Scala API in Spark by @Paweł Leszczyński -• lowered requirements for attrs and requests packages in the Airflow integration by @Jakub Dardziński -• lazy rendering of yaml configs in the dbt integration by @Jakub Dardziński -• bug fixes, tests, infra fixes, doc changes, and more. -Thanks to all the contributors, including new contributor @Sophie LY! -Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.5.0 -Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md -Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.4.1...1.5.0 -Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage -PyPI: https://pypi.org/project/openlineage-python/

- - - -
- 👍 Jason Yip, Sophie LY, Tristan GUEZENNEC -CROIX-, Mars Lan, Sangeeta Mishra -
- -
- 🚀 tati -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-02 14:49:18
-
-

@Paweł Leszczyński I tested 1.5.0, it works great now, but the environment facets is gone in START... which I very much want it.. any thoughts?

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-03 04:18:11
-
-

actually, it shows up in one of the RUNNING now... behavior is consistent between 11.3 and 13.3, thanks for fixing this issue

- - - -
- 👍 Paweł Leszczyński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-04 15:44:22
-
-

*Thread Reply:* @Paweł Leszczyński looks like I need to bring bad news.. 13.3 is fixed for specific scenarios, but 11.3 is still reading output as dbfs.. there are scenarios that it's not producing input and output like:

- -

create table table using delta as -location 'abfss://....' -Select ** from parquet.`abfss://....'

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-04 15:44:31
-
-

*Thread Reply:* Will test more and ope issues

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-11-06 05:34:33
-
-

*Thread Reply:* @Jason Yiphow did you manage the get the environment attribute. it's not showing up to me at all. I've tried databricks abut also tried a local instance of spark.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-07 18:32:02
-
-

*Thread Reply:* @Rodrigo Maia its showing up in one of the RUNNING events, not in the START event anymore

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-11-08 03:04:32
-
-

*Thread Reply:* I never had a running event 🫠 Am I filtering something?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-08 13:03:26
-
-

*Thread Reply:* Umm.. ok show me your code, will try on my end

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-08 14:26:06
-
-

*Thread Reply:* @Paweł Leszczyński @Rodrigo Maia actually if you are using UC-enabled cluster, you won't get any RUNNING events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-03 12:00:07
-
-

@channel -This month’s TSC meeting (open to all) is next Thursday the 9th at 10am PT. On the agenda: -• announcements -• recent releases -• recent additions to the Flink integration by @Peter Huang -• recent additions to the Spark integration by @Paweł Leszczyński -• updates on proposals by @Julien Le Dem -• discussion topics -• open discussion -More info and the meeting link can be found on the website. All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda.

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
- 👍 harsh loomba -
- -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-11-04 07:08:10
-
-

Hi Team , we are trying to customize the events by writing custom lineage listener extending OpenLineageSparkListener, but would need some direction how to capture the events

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-04 07:11:46
-
-

*Thread Reply:* https://openlineage.slack.com/archives/C01CK9T7HKR/p1698315220142929 -Do you need some more guidance than that?

-
- - -
- - - } - - priya narayana - (https://openlineage.slack.com/team/U062Q95A1FG) -
- - - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-11-04 07:13:47
-
-

*Thread Reply:* yes

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-04 07:15:21
-
-

*Thread Reply:* It seems pretty extensively described, what kind of help do you need?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-11-04 07:16:13
-
-

*Thread Reply:* io.openlineage.spark.api.OpenLineageEventHandlerFactory if i use this how will i pass custom listener to my spark submit

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-11-04 07:17:25
-
-

*Thread Reply:* I would like to know how will i customize my events using this . For example: - In "input" Facet i want only symlinks name i am not intereseted in anything else

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-11-04 07:17:32
-
-

*Thread Reply:* can you please provide some guidance

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-11-04 07:18:36
-
-

*Thread Reply:* @Jakub Dardziński this is the doubt i have

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
priya narayana - (n.priya88@gmail.com) -
-
2023-11-04 08:17:25
-
-

*Thread Reply:* Some one who did spark integration throw some light

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jakub Dardziński - (jakub.dardzinski@getindata.com) -
-
2023-11-04 08:21:22
-
-

*Thread Reply:* it's weekend for most of us so you probably need to wait until Monday for precise answers

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Goss - (david.goss@matillion.com) -
-
2023-11-06 04:03:42
-
-

👋 I raised a PR https://github.com/OpenLineage/OpenLineage/pull/2223 off the back of some Marquez conversations a while back to try and clarify how names of Snowflake objects should be expressed in OL events. I used Snowflake’s OL view as a guide, but also I appreciate there are other OL producers that involve Snowflake too (Airflow? dbt?). Any feedback on this would be appreciated!

-
- - - - - - - - - - - - - - - - -
-
- - - - - - - -
-
Stars
- 11 -
- -
-
Last updated
- 3 months ago -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
David Goss - (david.goss@matillion.com) -
-
2023-11-08 10:42:35
-
-

*Thread Reply:* Thanks for merging this @Maciej Obuchowski!

- - - -
- 👍 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-11-06 05:22:03
-
-

Hey team! 👋

- -

We're trying to use openlineage-flink, and would like provide the openlineage.transport.type=http and configure other transport configs, but we're not able to find sufficient docs (tried this doc) on where/how these configs can be provided.

- -

For example, in spark, the changes mostly were delegated to the spark-submit command like -spark-submit --conf "spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener" \ - --packages "io.openlineage:openlineage_spark:&lt;spark-openlineage-version&gt;" \ - --conf "spark.openlineage.transport.url=http://{openlineage.client.host}/api/v1/namespaces/spark_integration/" \ - --class com.mycompany.MySparkApp my_application.jar -And the OpenLineageSparkListener has a method to retrieve the provided spark confs as an object in the ArgumentParser. Similarly, looking for some pointers on how the openlineage.transport configs can be provided to OpenLineageFlinkJobListener & how the flink listener parses/uses these configs

- -

TIA! 😄

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-07 05:56:09
-
-

*Thread Reply:* similarly to spark config, you can use flink config

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-11-07 22:36:53
-
-

*Thread Reply:* @Maciej Obuchowski - Got it. Our use-case is that we're trying to build a wrapper on top of openlineage-flink for productionising for our flink jobs.

- -

We're trying to have a wrapper class that extends OpenLineageFlinkJobListener class, and overwrites the HTTP transport endpoint/url to a constant value (say, example.com and /api/v1/flink). But we see that the OpenLineageFlinkJobListener constructor is defined as a private constructor - just wanted to check with the team whether it was just a default scope, or intended to be private. If it was just a default scope, can we contribute a PR to make it public, to make it friendly for teams trying to adopt & extend openlineage?

- -

And also, we wanted to understand better on where we're reading the HTTP transport endpoint/url configs in OpenLineageFlinkJobListener and what'd be the best place to override it to the constant endpoint/url for our use-case

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-08 05:55:43
-
-

*Thread Reply:* We parse flink conf to get that information: https://github.com/OpenLineage/OpenLineage/blob/26494b596e9669d2ada164066a73c44e04[…]ink/src/main/java/io/openlineage/flink/client/EventEmitter.java

- -

> But we see that the OpenLineageFlinkJobListener constructor is defined as a private constructor - just wanted to check with the team whether it was just a default scope, or intended to be private. -The way to construct is is a public builder in the same class

- -

I think easier way than wrapper class would be use existing flink configuration, or to set up OPENLINEAGE_URL env variable, or have openlineage.yml config file - not sure why this is the way you've chosen?

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Athitya Kumar - (athityakumar@gmail.com) -
-
2023-11-09 12:41:02
-
-

*Thread Reply:* > I think easier way than wrapper class would be use existing flink configuration, or to set up OPENLINEAGE_URL env variable, or have openlineage.yml config file - not sure why this is the way you've chosen? -@Maciej Obuchowski - The reasoning behind going with a wrapper class is that we can abstract out the nitty-gritty like how/where we're publishing openlineage events etc - especially for companies that have a lot of teams that may be adopting openlineage.

- -

For example, if we wanna move away from http transport to kafka transport - we'd be changing only this wrapper class and ask folks to update their wrapper class dependency version. If we went without the wrapper class, then the exact config changes would need to be synced and done by many different teams, who may not have enough context.

- -

Similarly, if we wanna enable some other default best-practise configs, or inject any company-specific configs etc, the wrapper would be useful in abstracting out the details and be the 1 place that handles all openlineage related integrations for any future changes.

- -

That's why we wanna extend openlineage's listener class & leverage most of the OSS code as-is; and at the same time, have the ability to extend & inject customisations. I think that's where some things like having getters for the class object attributes, or having public constructors would be really helpful 😄

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-09 13:03:56
-
-

*Thread Reply:* @Athitya Kumar that makes sense. Feel free to provide PR adding getters and stuff.

- - - -
- 🎉 Athitya Kumar -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Libert - (yannick.libert.partner@decathlon.com) -
-
2023-11-07 06:03:49
-
-

Hi all, we (I work with @Sophie LY and @Abdallah) have a quick question regarding the spark integration: -if a spark app contains several jobs, they will be named "mysparkappname.job1" and "mysparkappname.job2" -eg: -sparkjob.collectlimit -sparkjob.mappartitionsparallelcollection

- -

If I understood correctly, the spark integration maps one Spark job to a single OpenLineage Job, and the application itself should be assigned a Run id at startup and each job that executes will report the application's Run id as its parent job run (taken from: https://openlineage.io/docs/integrations/spark/).

- -

In our case, the app Run Id is never created, and the jobs runs don't contain any parent facets. We tested it with a recent integration version in 1.4.1 and also an older one (0.26.0). -Did we miss something in the OL spark integration config?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-07 06:07:51
-
-

*Thread Reply:* hey, a name of the output dataset should be put at the end of the job name. This was introduced to help with jobs that call multiple spark actions

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Libert - (yannick.libert.partner@decathlon.com) -
-
2023-11-07 07:05:52
-
-

*Thread Reply:* Hi Paweł, -Thanks for your answer, yes indeed with the newer version of OL, we automatically have the name of the output dataset at the end of the job name, but no App run id, nor any parent run facet.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-07 08:16:44
-
-

*Thread Reply:* yes, you're right. I mean you can set in config spark.openlineage.parentJobName which will be shared through whole app run, but this needs to be set manually

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Yannick Libert - (yannick.libert.partner@decathlon.com) -
-
2023-11-07 08:36:58
-
-

*Thread Reply:* I see, thanks a lot for your reply we'll try that

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
ldacey - (lance.dacey2@sutherlandglobal.com) -
-
2023-11-07 10:49:25
-
-

if I have a dataset on adls gen2 which synapse connects to as an external delta table, is that the use case of a symlink dataset? the delta table is connected to by PBI and by Synapse, but the underlying data is exactly the same

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-08 10:49:04
-
-

*Thread Reply:* Sounds like it, yes - if the logical dataset names are different but physical one is the same

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-11-08 12:38:52
-
-

Has anyone here tried OpenLineage with Spark on Amazon EMR?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-08 13:01:16
-
-

*Thread Reply:* No but it should work the same I tried on AWS and Google Colab and Azure

- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Tristan GUEZENNEC -CROIX- - (tristan.guezennec@decathlon.com) -
-
2023-11-09 03:10:54
-
-

*Thread Reply:* Yes. @Abdallah could provide some details if needed.

- - - -
- 👍 Abdallah -
- -
- 🔥 Maciej Obuchowski -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Rodrigo Maia - (rodrigo.maia@manta.io) -
-
2023-11-20 11:29:26
-
-

*Thread Reply:* Thanks @Tristan GUEZENNEC -CROIX- -HI @Abdallah i was able to set up a spark cluster on AWS EMR but im struggling to configure the OL Listener. Ive tried with steps and bootstrap actions for the jar and it didn't work out. How did you manage to include the jar? Besides, what about the spark configuration? Could you send me a sample of these configs?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-08 12:44:54
-
-

@channel -Friendly reminder: this month’s TSC meeting, open to all, is tomorrow at 10 am PT: https://openlineage.slack.com/archives/C01CK9T7HKR/p1699027207361229

-
- - -
- - - } - - Michael Robinson - (https://openlineage.slack.com/team/U02LXF3HUN7) -
- - - - - - - - - - - - - - - - - -
- - - -
- 👍 Jakub Dardziński -
- -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-10 15:25:45
-
-

@Paweł Leszczyński regarding to https://github.com/OpenLineage/OpenLineage/issues/2124, OL is parsing out the table location in Hive metastore, it is the location of the table in the catalog and not the physical location of the data. It is both right and wrong because it is a table, just it is an external table.

- -

https://docs.databricks.com/en/sql/language-manual/sql-ref-external-tables.html

-
-
docs.databricks.com
- - - - - - - - - - - - - - - - - -
-
- - - - - - - -
-
Labels
- integration/spark, integration/databricks -
- -
-
Comments
- 5 -
- - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-10 15:32:28
-
-

*Thread Reply:* Here's for more reference: https://dilorom.medium.com/finding-the-path-to-a-table-in-databricks-2c74c6009dbb

-
-
Medium
- - - - - - -
-
Reading time
- 2 min read -
- - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-11 03:29:33
-
-

@Paweł Leszczyński this is why if create a table with adls location it won't show input and output:

- -

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/spark35/src[…]k35/agent/lifecycle/plan/CreateReplaceOutputDatasetBuilder.java

- -

Because the catalog object is not there.

-
- - - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-11 03:30:44
-
-

Databricks needs to be re-written in a way that supports Databricks it seems like

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-13 03:00:42
-
-

@Paweł Leszczyński I went back to 1.4.1, output does show adls location. But environment facet is gone in 1.4.1. It shows up in 1.5.0 but namespace is back to dbfs....

- -
- - - - - - - -
- - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-13 03:18:37
-
-

@Paweł Leszczyński I diff CreateReplaceDatasetBuilder.java and CreateReplaceOutputDatasetBuilder.java and they are the same except for the class name, so I am not sure what is causing the change. I also realize you don't have a test case for ADLS

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-13 04:52:07
-
-

*Thread Reply:* Thanks @Jason Yip for your engagement in finding the cause and solution to this issue.

- -

Among the technical problems, another problem here is that our databricks integration tests are run on AWS and the issue you describe occurs in Azure. I would consider this a primary issue as it is difficult for me to verify the behaviour you describe and fix it with a failing integration test at the start.

- -

Are you able to reproduce the issue on AWS Databricks environment so that we could include it in our integration tests and make sure the behvaiour will not change later on in future?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Jason Yip - (jasonyip@gmail.com) -
-
2023-11-13 18:06:44
-
-

*Thread Reply:* I didn't know Azure and AWS Databricks are different. Let me try it on AWS as well

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Naresh reddy - (naresh.naresh36@gmail.com) -
-
2023-11-15 07:17:24
-
-

Hi -Can anyone point me to the deck on how Airflow can be integrated using Openlineage?

- - - -
-
-
-
- - - - - - - - - - - -
-
- - - - -
- -
Naresh reddy - (naresh.naresh36@gmail.com) -
-
2023-11-15 07:27:55
-
-

*Thread Reply:* thank you @Maciej Obuchowski

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Naresh reddy - (naresh.naresh36@gmail.com) -
-
2023-11-15 11:09:24
-
-

Can anyone tell me why OL is better than other competitors if you can provide an analysis that would be great

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Harel Shein - (harel.shein@gmail.com) -
-
2023-11-16 11:46:16
-
-

*Thread Reply:* Hey @Naresh reddy can you help me understand what you mean by competitors? -OL is a specification that can be used to solve various problems, so if you have a clear problem statement, maybe I can help with pros/cons for that problem

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Naresh reddy - (naresh.naresh36@gmail.com) -
-
2023-11-15 11:10:58
-
-

what are the pros and cons of OL. we often talk about positives to market it but what are the pain points using OL,how it's addressing user issues?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Michael Robinson - (michael.robinson@astronomer.io) -
-
2023-11-16 13:38:42
-
-

*Thread Reply:* Hi @Naresh reddy, thanks for your question. We’ve heard that OpenLineage is attractive because of its desirable integrations, including a best-in-class Spark integration, its extensibility, the fact that it’s not destructive, and the fact that it’s open source. I’m not aware of pain points per se, but there are certainly features and integrations that we wish we could focus on but can’t at the moment — like the Dagster integration, which needs a new maintainer. OpenLineage is like any other open standard in that ecosystem coverage is a constant process rather than a journey, and it requires contributions in order to get close to 100%. Thankfully, we are gaining users and contributors all the time, and integrations are being added or improved upon daily. See the Ecosystem page on the website for a list of consumers and producers and links to more resources, and check out the GitHub repo for the codebase, commit history, contributors, governance procedures, and more. We’re quick to respond to messages here and issues on GitHub — usually within one day.

-
- - - - - - - -
-
Website
- <http://openlineage.io> -
- -
-
Stars
- 1449 -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
karthik nandagiri - (karthik.nandagiri@gmail.com) -
-
2023-11-19 23:57:38
-
-

Hi So we can use openlineage to identify column level lineage with Airflow , Spark? will it also allow to connect to Power BI and derive the downstream column lineage ?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-20 06:07:36
-
-

*Thread Reply:* Yes, it works with Airflow and Spark - there is caveat that amount of operators that support it on Airflow side is fairly small and limited generally to most popular SQL operators. -> will it also allow to connect to Power BI and derive the downstream column lineage ? -No, there is no such feature yet 🙂 -However, there's nothing preventing this - if you wish to work on such implementation, we'd be happy to help.

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
karthik nandagiri - (karthik.nandagiri@gmail.com) -
-
2023-11-21 00:20:11
-
-

*Thread Reply:* Thank you Maciej Obuchowski for the update. Currently we are looking out for a tool which can support connecting to Power Bi and pull column level lineage information for reports and dashboards. How this can be achieved with OL ? Can you give some idea?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-21 07:59:10
-
-

*Thread Reply:* I don't think I can help you with that now, unless you want to work on your own integration with PowerBI 🙁

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rafał Wójcik - (rwojcik@griddynamics.com) -
-
2023-11-21 07:02:08
-
-

Hi Everyone, first of all - big shout to all contributors - You do amazing job here. -I want to use OpenLineage in our project - to do so I want to setup some POC and experiment with possibilities library provides - I start working on sample from the conference talk: https://github.com/getindata/openlineage-bbuzz2023-column-lineage but when I go into spark transformation after staring context with openlineage I have issues with SessionHiveMetaStoreClient on section 3- does anyone has other plain sample to play with, to not setup everything from scratch?

-
- - - - - - - -
-
Language
- Jupyter Notebook -
- -
-
Last updated
- 5 months ago -
- - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-21 07:37:00
-
-

*Thread Reply:* Can you provide details about those issues? Like exceptions, logs, details of the jobs and how do you run them?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rafał Wójcik - (rwojcik@griddynamics.com) -
-
2023-11-21 07:45:37
-
-

*Thread Reply:* Hi @Maciej Obuchowski - I rerun docker container after deleting metadata_db folder possibly created by other local test, and fix this one but got problem with OpenLineageListener - during initialization of spark: -while I execute: -spark = (SparkSession.builder.master('local') - .appName('Food Delivery') - .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') - .config('spark.jars', '&lt;local-path&gt;/openlineage-spark-0.27.2.jar,&lt;local-path&gt;/postgresql-42.6.0.jar') - .config('spark.openlineage.transport.type', 'http') - .config('spark.openlineage.transport.url', '<http://api:5000>') - .config('spark.openlineage.facets.disabled', '[spark_unknown;spark.logicalPlan]') - .config('spark.openlineage.namespace', 'food-delivery') - .config('spark.sql.warehouse.dir', '/tmp/spark-warehouse/') - .config("spark.sql.repl.eagerEval.enabled", True) - .enableHiveSupport() - .getOrCreate()) -I got -Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. -: org.apache.spark.SparkException: Exception when registering SparkListener - at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2563) - at org.apache.spark.SparkContext.&lt;init&gt;(SparkContext.scala:643) - at org.apache.spark.api.java.JavaSparkContext.&lt;init&gt;(JavaSparkContext.scala:58) - at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) - at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) - at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) - at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) - at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480) - at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) - at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) - at py4j.Gateway.invoke(Gateway.java:238) - at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) - at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) - at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) - at py4j.ClientServerConnection.run(ClientServerConnection.java:106) - at java.base/java.lang.Thread.run(Thread.java:833) -Caused by: java.lang.ClassNotFoundException: io.openlineage.spark.agent.OpenLineageSparkListener - at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445) - at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:587) - at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) - at java.base/java.lang.Class.forName0(Native Method) - at java.base/java.lang.Class.forName(Class.java:467) - at org.apache.spark.util.Utils$.classForName(Utils.scala:218) - at org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:2921) - at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293) - at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) - at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) - at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) - at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293) - at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290) - at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108) - at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2919) - at org.apache.spark.SparkContext.$anonfun$setupAndStartListenerBus$1(SparkContext.scala:2552) - at org.apache.spark.SparkContext.$anonfun$setupAndStartListenerBus$1$adapted(SparkContext.scala:2551) - at scala.Option.foreach(Option.scala:407) - at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2551) - ... 15 more -looks like by some reasons jars are not loaded - need to look into it

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-21 07:58:09
-
-

*Thread Reply:* 🤔 Jars are added during image building: https://github.com/getindata/openlineage-bbuzz2023-column-lineage/blob/main/Dockerfile#L12C1-L12C29

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-21 07:58:28
-
-

*Thread Reply:* are you sure &lt;local-path&gt; is right?

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rafał Wójcik - (rwojcik@griddynamics.com) -
-
2023-11-21 08:00:49
-
-

*Thread Reply:* yes, it's same as in sample - wondering why it's not get added: -```from pyspark.sql import SparkSession

- -

spark = (SparkSession.builder.master('local') - .appName('Food Delivery') - .config('spark.jars', '/home/jovyan/jars/openlineage-spark-0.27.2.jar,/home/jovyan/jars/postgresql-42.6.0.jar') - .config('spark.sql.warehouse.dir', '/tmp/spark-warehouse/') - .config("spark.sql.repl.eagerEval.enabled", True) - .enableHiveSupport() - .getOrCreate())

- -

print(spark.sparkContext._jsc.sc().listJars())

- -

Vector()```

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-21 08:04:31
-
-

*Thread Reply:* can you make sure jars are in this directory? just by docker run --entrypoint /usr/local/bin/bash IMAGE_NAME "ls /home/jovyan/jars"

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Maciej Obuchowski - (maciej.obuchowski@getindata.com) -
-
2023-11-21 08:06:27
-
-

*Thread Reply:* another option to try is to replace spark.jars with spark.jars.packages io.openlineage:openlineage_spark:1.5.0,org.postgresql:postgresql:42.7.0

- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Paweł Leszczyński - (pawel.leszczynski@getindata.com) -
-
2023-11-21 08:16:54
-
-

*Thread Reply:* I think this was done for the purpose of presentation to make sure the demo will work without internet access. This can be the reason to add jar manually to a docker. openlineage-spark can be added to Spark via spark.jar.packages , like we do here https://openlineage.io/docs/integrations/spark/quickstart_local

-
-
openlineage.io
- - - - - - - - - - - - - - - -
- - - -
-
-
-
- - - - - -
-
- - - - -
- -
Rafał Wójcik - (rwojcik@griddynamics.com) -
-
2023-11-21 09:21:59
-
-

*Thread Reply:* got it guys - thanks a lot for help - it turns out that spark context from notebook 2 and 3 has come kind of metadata conflict - when I combine those 2 and recreate image to clean up old metadata it works. -One more note is that sometimes kernels return weird results but it may be caused by some local nuances - anyways thx !

- - - -
-
-
-
- - - - - - - - - - - \ No newline at end of file diff --git a/html_output/static/viewer.css b/html_output/static/viewer.css deleted file mode 100644 index 96be753..0000000 --- a/html_output/static/viewer.css +++ /dev/null @@ -1,241 +0,0 @@ -@import url('https://fonts.googleapis.com/css?family=Lato:400,900'); - -html { - font-family: 'Lato', sans-serif; -} - -body { - padding: 0; - margin: 0; -} - -#slack-archive-viewer { - padding: 0; - margin: 0; - height: 100vh; - overflow: hidden; -} - -#sidebar { - display: inline-block; - width: 280px; - color: white; - text-align: left; - background-color: #4D394B; - z-index: 10; - overflow-y: scroll; - overflow-x: auto; - height: 100vh; - user-select: none; -} - -#sidebar a { - color: white; - font-size: 14px; -} - -#sidebar h3 { - margin: 20px 20px; - color: white; - font-weight: 900; -} - -#sidebar h3:hover { - cursor: pointer; -} - -#sidebar h3::after { - content: '❯ '; - display: inline-block; - -webkit-transform: rotate(90deg); - transform: rotate(90deg); - margin-left: 15px; -} - -#sidebar h3.arrow::after { - margin-left: 10px; - -webkit-transform: none; - transform: none; -} - -.messages { - width: calc(100vw - 325px); - height: 100vh; - text-align: left; - display: inline-block; - padding-left: 20px; - padding-right: 20px; - overflow-y: scroll; -} - -.message-container { - clear: left; - min-height: 56px; -} - -.message-container:first-child { - margin-top: 20px; -} - -.message-container:last-child { - margin-bottom: 20px; -} - -.message-container .user_icon { - background-color: rgb(248, 244, 240); - width: 36px; - height: 36px; - border-radius: 0.2em; - display: inline-block; - vertical-align: top; - margin-right: 0.65em; - float: left; -} - -.message-container .user_icon_reply { - background-color: rgb(248, 244, 240); - width: 36px; - height: 36px; - border-radius: 0.2em; - display: inline-block; - vertical-align: top; - margin-right: 0.65em; - margin-left: 40px; - float: left; -} - -.message-container .time { - display: inline-block; - color: rgb(200, 200, 200); - margin-left: 0.5em; -} - -.message-container .username { - display: inline-block; - font-weight: 600; - line-height: 1; -} - -.message-container .user-email { - font-weight: normal; - font-style: italic; -} - -.message-container .message { - display: inline-block; - vertical-align: top; - line-height: 1; - width: calc(100% - 3em); -} - -.message-container .reply { - vertical-align: top; - line-height: 1; - width: calc(100% - 3em); - margin-left: 80px; -} - -.message-container .msg p { - white-space: pre-wrap; -} - -.message-container .msg pre { - background-color: #E6E5DF; - white-space: pre-wrap; -} - -.message-container .message .msg { - line-height: 1.5; -} - -.message-container .reply .msg { - line-height: 1.5; -} - -.message-container .message .msg a { - overflow-wrap: anywhere; -} - -.message-container .reply .msg a { - overflow-wrap: anywhere; -} - -.message-container .message-attachment { - padding-left: 5px; - border-left: 2px gray solid; - overflow-wrap: anywhere; -} - -.message-container .message-attachment .service-name { - color: #999999; -} - -.message-container .icon { - max-width: 10px; -} - -.channel_join .msg, .channel_topic .msg, -.bot_add .msg, .app_conversation_join .msg { - font-style: italic; -} - -.attachment-footer { - font-size: small; -} - -.list { - margin: 0; - padding: 0; - list-style-type: none; -} - -.list li { - padding: 4px 20px; -} - -.list li a { - width: 100%; - padding: 10px 20px; -} - -.list li.active { - background-color: #4C9689; -} - -.list li.active:hover { - background-color: #4C9689; -} - -.list li:hover { - text-decoration: none; - background: #3E313C; -} - -.list li a:hover { - text-decoration: none; -} - -a:link, -a:visited, -a:active { - color: #2a80b9; - text-decoration: none; -} - -a:hover { - color: #439fe0; - text-decoration: underline; -} - -.close { - display: none; -} - -@media screen { - .print-only { display: none } -} - -img.preview { - max-width: 100%; - height: auto; -} \ No newline at end of file diff --git a/slack-archive/.last-successful-run b/slack-archive/.last-successful-run deleted file mode 100644 index b4ae3a1..0000000 --- a/slack-archive/.last-successful-run +++ /dev/null @@ -1 +0,0 @@ -2023-11-21T14:26:35.411Z \ No newline at end of file diff --git a/slack-archive/data/C01CK9T7HKR.json b/slack-archive/data/C01CK9T7HKR.json deleted file mode 100644 index 1b23810..0000000 --- a/slack-archive/data/C01CK9T7HKR.json +++ /dev/null @@ -1,32963 +0,0 @@ -[ - { - "client_msg_id": "c8703ac5-44a3-41fb-9fe7-b2ada2736ef9", - "type": "message", - "text": "Hi Everyone, first of all - big shout to all contributors - You do amazing job here.\nI want to use OpenLineage in our project - to do so I want to setup some POC and experiment with possibilities library provides - I start working on sample from the conference talk: but when I go into spark transformation after staring context with openlineage I have issues with _SessionHiveMetaStoreClient on section 3_- does anyone has other plain sample to play with, to not setup everything from scratch?", - "user": "U066S97A90C", - "ts": "1700568128.192669", - "blocks": [ - { - "type": "rich_text", - "block_id": "LZMJn", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi Everyone, first of all - big shout to all contributors - You do amazing job here.\nI want to use OpenLineage in our project - to do so I want to setup some POC and experiment with possibilities library provides - I start working on sample from the conference talk: " - }, - { - "type": "link", - "url": "https://github.com/getindata/openlineage-bbuzz2023-column-lineage" - }, - { - "type": "text", - "text": " but when I go into spark transformation after staring context with openlineage I have issues with " - }, - { - "type": "text", - "text": "SessionHiveMetaStoreClient on section 3", - "style": { - "italic": true - } - }, - { - "type": "text", - "text": "- does anyone has other plain sample to play with, to not setup everything from scratch?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/getindata/openlineage-bbuzz2023-column-lineage", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "getindata/openlineage-bbuzz2023-column-lineage", - "title": "getindata/openlineage-bbuzz2023-column-lineage", - "fields": [ - { - "value": "Jupyter Notebook", - "title": "Language", - "short": true - }, - { - "value": "5 months ago", - "title": "Last updated", - "short": true - } - ] - } - ], - "thread_ts": "1700568128.192669", - "reply_count": 9, - "reply_users_count": 3, - "latest_reply": "1700576519.927839", - "reply_users": [ - "U01RA9B5GG2", - "U066S97A90C", - "U02MK6YNAQ5" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "78925e07-2703-4759-90dd-9202405c52cf", - "type": "message", - "text": "Can you provide details about those issues? Like exceptions, logs, details of the jobs and how do you run them?", - "user": "U01RA9B5GG2", - "ts": "1700570220.696829", - "blocks": [ - { - "type": "rich_text", - "block_id": "Cvx2D", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Can you provide details about those issues? Like exceptions, logs, details of the jobs and how do you run them?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700568128.192669", - "parent_user_id": "U066S97A90C" - }, - { - "client_msg_id": "59b35b8a-9b98-4517-aa83-de41a9855e29", - "type": "message", - "text": "Hi <@U01RA9B5GG2> - I rerun docker container after deleting metadata_db folder possibly created by other local test, and fix this one but got problem with OpenLineageListener - during initialization of spark:\nwhile I execute:\n```spark = (SparkSession.builder.master('local')\n .appName('Food Delivery')\n .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener')\n .config('spark.jars', '<local-path>/openlineage-spark-0.27.2.jar,<local-path>/postgresql-42.6.0.jar')\n .config('spark.openlineage.transport.type', 'http')\n .config('spark.openlineage.transport.url', '')\n .config('spark.openlineage.facets.disabled', '[spark_unknown;spark.logicalPlan]')\n .config('spark.openlineage.namespace', 'food-delivery')\n .config('spark.sql.warehouse.dir', '/tmp/spark-warehouse/')\n .config(\"spark.sql.repl.eagerEval.enabled\", True)\n .enableHiveSupport()\n .getOrCreate())```\nI got\n```Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.\n: org.apache.spark.SparkException: Exception when registering SparkListener\n\tat org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2563)\n\tat org.apache.spark.SparkContext.<init>(SparkContext.scala:643)\n\tat org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)\n\tat java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)\n\tat java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)\n\tat java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:238)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)\n\tat py4j.ClientServerConnection.run(ClientServerConnection.java:106)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\nCaused by: java.lang.ClassNotFoundException: io.openlineage.spark.agent.OpenLineageSparkListener\n\tat java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)\n\tat java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:587)\n\tat java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)\n\tat java.base/java.lang.Class.forName0(Native Method)\n\tat java.base/java.lang.Class.forName(Class.java:467)\n\tat org.apache.spark.util.Utils$.classForName(Utils.scala:218)\n\tat org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:2921)\n\tat scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)\n\tat scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)\n\tat scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)\n\tat scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)\n\tat scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)\n\tat scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)\n\tat scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)\n\tat org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2919)\n\tat org.apache.spark.SparkContext.$anonfun$setupAndStartListenerBus$1(SparkContext.scala:2552)\n\tat org.apache.spark.SparkContext.$anonfun$setupAndStartListenerBus$1$adapted(SparkContext.scala:2551)\n\tat scala.Option.foreach(Option.scala:407)\n\tat org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2551)\n\t... 15 more```\nlooks like by some reasons jars are not loaded - need to look into it", - "user": "U066S97A90C", - "ts": "1700570737.279069", - "blocks": [ - { - "type": "rich_text", - "block_id": "modz/", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi " - }, - { - "type": "user", - "user_id": "U01RA9B5GG2" - }, - { - "type": "text", - "text": " - I rerun docker container after deleting metadata_db folder possibly created by other local test, and fix this one but got problem with OpenLineageListener - during initialization of spark:\nwhile I execute:\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "spark = (SparkSession.builder.master('local')\n .appName('Food Delivery')\n .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener')\n .config('spark.jars', '/openlineage-spark-0.27.2.jar,/postgresql-42.6.0.jar')\n .config('spark.openlineage.transport.type', 'http')\n .config('spark.openlineage.transport.url', '" - }, - { - "type": "link", - "url": "http://api:5000" - }, - { - "type": "text", - "text": "')\n .config('spark.openlineage.facets.disabled', '[spark_unknown;spark.logicalPlan]')\n .config('spark.openlineage.namespace', 'food-delivery')\n .config('spark.sql.warehouse.dir', '/tmp/spark-warehouse/')\n .config(\"spark.sql.repl.eagerEval.enabled\", True)\n .enableHiveSupport()\n .getOrCreate())" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I got\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.\n: org.apache.spark.SparkException: Exception when registering SparkListener\n\tat org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2563)\n\tat org.apache.spark.SparkContext.(SparkContext.scala:643)\n\tat org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)\n\tat java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)\n\tat java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n\tat java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)\n\tat java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:238)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)\n\tat py4j.ClientServerConnection.run(ClientServerConnection.java:106)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\nCaused by: java.lang.ClassNotFoundException: io.openlineage.spark.agent.OpenLineageSparkListener\n\tat java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)\n\tat java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:587)\n\tat java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)\n\tat java.base/java.lang.Class.forName0(Native Method)\n\tat java.base/java.lang.Class.forName(Class.java:467)\n\tat org.apache.spark.util.Utils$.classForName(Utils.scala:218)\n\tat org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:2921)\n\tat scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)\n\tat scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)\n\tat scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)\n\tat scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)\n\tat scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)\n\tat scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)\n\tat scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)\n\tat org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2919)\n\tat org.apache.spark.SparkContext.$anonfun$setupAndStartListenerBus$1(SparkContext.scala:2552)\n\tat org.apache.spark.SparkContext.$anonfun$setupAndStartListenerBus$1$adapted(SparkContext.scala:2551)\n\tat scala.Option.foreach(Option.scala:407)\n\tat org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2551)\n\t... 15 more" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "looks like by some reasons jars are not loaded - need to look into it" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700568128.192669", - "parent_user_id": "U066S97A90C" - }, - { - "client_msg_id": "c75639c8-300b-41e0-8548-bc6a0bf8fcd5", - "type": "message", - "text": ":thinking_face: Jars are added during image building: ", - "user": "U01RA9B5GG2", - "ts": "1700571489.504839", - "blocks": [ - { - "type": "rich_text", - "block_id": "ek23V", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "emoji", - "name": "thinking_face", - "unicode": "1f914" - }, - { - "type": "text", - "text": " Jars are added during image building: " - }, - { - "type": "link", - "url": "https://github.com/getindata/openlineage-bbuzz2023-column-lineage/blob/main/Dockerfile#L12C1-L12C29" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700568128.192669", - "parent_user_id": "U066S97A90C" - }, - { - "client_msg_id": "eb7d3d59-bb15-4c16-a7fc-872af3828c7e", - "type": "message", - "text": "are you sure `<local-path>` is right?", - "user": "U01RA9B5GG2", - "ts": "1700571508.332809", - "blocks": [ - { - "type": "rich_text", - "block_id": "6yFIl", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "are you sure " - }, - { - "type": "text", - "text": "", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " is right?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700568128.192669", - "parent_user_id": "U066S97A90C" - }, - { - "client_msg_id": "afa11cb3-a6d6-4766-aaf1-37d25390cab2", - "type": "message", - "text": "yes, it's same as in sample - wondering why it's not get added:\n```from pyspark.sql import SparkSession\n\nspark = (SparkSession.builder.master('local')\n .appName('Food Delivery')\n .config('spark.jars', '/home/jovyan/jars/openlineage-spark-0.27.2.jar,/home/jovyan/jars/postgresql-42.6.0.jar')\n .config('spark.sql.warehouse.dir', '/tmp/spark-warehouse/')\n .config(\"spark.sql.repl.eagerEval.enabled\", True)\n .enableHiveSupport()\n .getOrCreate())\n\nprint(().listJars())\n\nVector()```", - "user": "U066S97A90C", - "ts": "1700571649.351379", - "blocks": [ - { - "type": "rich_text", - "block_id": "Pjc3w", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yes, it's same as in sample - wondering why it's not get added:\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "from pyspark.sql import SparkSession\n\nspark = (SparkSession.builder.master('local')\n .appName('Food Delivery')\n .config('spark.jars', '/home/jovyan/jars/openlineage-spark-0.27.2.jar,/home/jovyan/jars/postgresql-42.6.0.jar')\n .config('spark.sql.warehouse.dir', '/tmp/spark-warehouse/')\n .config(\"spark.sql.repl.eagerEval.enabled\", True)\n .enableHiveSupport()\n .getOrCreate())\n\nprint(spark.sparkContext._jsc.sc().listJars())\n\nVector()" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700568128.192669", - "parent_user_id": "U066S97A90C" - }, - { - "client_msg_id": "5a0d71ab-9451-4418-bfd1-ee136218e71d", - "type": "message", - "text": "can you make sure jars are in this directory? just by `docker run --entrypoint /usr/local/bin/bash IMAGE_NAME \"ls /home/jovyan/jars\"`", - "user": "U01RA9B5GG2", - "ts": "1700571871.112339", - "blocks": [ - { - "type": "rich_text", - "block_id": "RLxB1", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "can you make sure jars are in this directory? just by " - }, - { - "type": "text", - "text": "docker run --entrypoint /usr/local/bin/bash IMAGE_NAME \"ls /home/jovyan/jars\"", - "style": { - "code": true - } - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700568128.192669", - "parent_user_id": "U066S97A90C" - }, - { - "client_msg_id": "913d9441-d9ac-4616-8eda-3d38d643e491", - "type": "message", - "text": "another option to try is to replace `spark.jars` with `spark.jars.packages` `io.openlineage:openlineage-spark:1.5.0,org.postgresql:postgresql:42.7.0`", - "user": "U01RA9B5GG2", - "ts": "1700571987.624519", - "blocks": [ - { - "type": "rich_text", - "block_id": "hfgFo", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "another option to try is to replace " - }, - { - "type": "text", - "text": "spark.jars", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " with " - }, - { - "type": "text", - "text": "spark.jars.packages", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "text", - "text": "io.openlineage:openlineage-spark:1.5.0,org.postgresql:postgresql:42.7.0", - "style": { - "code": true - } - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700568128.192669", - "parent_user_id": "U066S97A90C" - }, - { - "client_msg_id": "bdfc78ad-1de4-4ecc-b731-91b2d2c7d9fd", - "type": "message", - "text": "I think this was done for the purpose of presentation to make sure the demo will work without internet access. This can be the reason to add jar manually to a docker. `openlineage-spark` can be added to Spark via `spark.jar.packages` , like we do here ", - "user": "U02MK6YNAQ5", - "ts": "1700572614.636549", - "blocks": [ - { - "type": "rich_text", - "block_id": "4fM4X", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I think this was done for the purpose of presentation to make sure the demo will work without internet access. This can be the reason to add jar manually to a docker. " - }, - { - "type": "text", - "text": "openlineage-spark", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " can be added to Spark via " - }, - { - "type": "text", - "text": "spark.jar.packages", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " , like we do here " - }, - { - "type": "link", - "url": "https://openlineage.io/docs/integrations/spark/quickstart_local" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.io/docs/integrations/spark/quickstart_local", - "service_icon": "https://openlineage.io/img/favicon.ico", - "id": 1, - "original_url": "https://openlineage.io/docs/integrations/spark/quickstart_local", - "fallback": "Quickstart with Jupyter | OpenLineage", - "text": "Trying out the Spark integration is super easy if you already have Docker Desktop and git installed.", - "title": "Quickstart with Jupyter | OpenLineage", - "title_link": "https://openlineage.io/docs/integrations/spark/quickstart_local", - "service_name": "openlineage.io" - } - ], - "thread_ts": "1700568128.192669", - "parent_user_id": "U066S97A90C" - }, - { - "client_msg_id": "b80b16bc-2f21-4f43-b205-a057ddaa19fe", - "type": "message", - "text": "got it guys - thanks a lot for help - it turns out that spark context from notebook 2 and 3 has come kind of metadata conflict - when I combine those 2 and recreate image to clean up old metadata it works.\nOne more note is that sometimes kernels return weird results but it may be caused by some local nuances - anyways thx !", - "user": "U066S97A90C", - "ts": "1700576519.927839", - "blocks": [ - { - "type": "rich_text", - "block_id": "vjzch", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "got it guys - thanks a lot for help - it turns out that spark context from notebook 2 and 3 has come kind of metadata conflict - when I combine those 2 and recreate image to clean up old metadata it works.\nOne more note is that sometimes kernels return weird results but it may be caused by some local nuances - anyways thx !" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700568128.192669", - "parent_user_id": "U066S97A90C" - } - ] - }, - { - "client_msg_id": "25c27b97-4479-4f92-aaaa-bed3ba0eba8c", - "type": "message", - "text": "Hi So we can use openlineage to identify column level lineage with Airflow , Spark? will it also allow to connect to Power BI and derive the downstream column lineage ?", - "user": "U066CNW85D3", - "ts": "1700456258.614309", - "blocks": [ - { - "type": "rich_text", - "block_id": "maShE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi So we can use openlineage to identify column level lineage with Airflow , Spark? will it also allow to connect to Power BI and derive the downstream column lineage ?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700456258.614309", - "reply_count": 3, - "reply_users_count": 2, - "latest_reply": "1700571550.031889", - "reply_users": [ - "U01RA9B5GG2", - "U066CNW85D3" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "f6f8055d-1939-45f1-a446-43f5652da88b", - "type": "message", - "text": "Yes, it works with Airflow and Spark - there is caveat that amount of operators that support it on Airflow side is fairly small and limited generally to most popular SQL operators.\n> will it also allow to connect to Power BI and derive the downstream column lineage ?\nNo, there is no such feature _yet_ :slightly_smiling_face:\nHowever, there's nothing preventing this - if you wish to work on such implementation, we'd be happy to help.", - "user": "U01RA9B5GG2", - "ts": "1700478456.679879", - "blocks": [ - { - "type": "rich_text", - "block_id": "W+RUb", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Yes, it works with Airflow and Spark - there is caveat that amount of operators that support it on Airflow side is fairly small and limited generally to most popular SQL operators.\n" - } - ] - }, - { - "type": "rich_text_quote", - "elements": [ - { - "type": "text", - "text": "will it also allow to connect to Power BI and derive the downstream column lineage ?" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "No, there is no such feature " - }, - { - "type": "text", - "text": "yet ", - "style": { - "italic": true - } - }, - { - "type": "emoji", - "name": "slightly_smiling_face", - "unicode": "1f642" - }, - { - "type": "text", - "text": "\nHowever, there's nothing preventing this - if you wish to work on such implementation, we'd be happy to help." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700456258.614309", - "parent_user_id": "U066CNW85D3" - }, - { - "client_msg_id": "3b2d2349-9b67-4745-a2f2-217d82315108", - "type": "message", - "text": "Thank you Maciej Obuchowski for the update. Currently we are looking out for a tool which can support connecting to Power Bi and pull column level lineage information for reports and dashboards. How this can be achieved with OL ? Can you give some idea?", - "user": "U066CNW85D3", - "ts": "1700544011.513999", - "blocks": [ - { - "type": "rich_text", - "block_id": "+u0dj", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thank you Maciej Obuchowski for the update. Currently we are looking out for a tool which can support connecting to Power Bi and pull column level lineage information for reports and dashboards. How this can be achieved with OL ? Can you give some idea?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U066CNW85D3", - "ts": "1700544159.000000" - }, - "thread_ts": "1700456258.614309", - "parent_user_id": "U066CNW85D3" - }, - { - "client_msg_id": "3c666be0-c6ff-48f8-bc2c-af26eb4dfb0e", - "type": "message", - "text": "I don't think I can help you with that now, unless you want to work on your own integration with PowerBI :slightly_frowning_face:", - "user": "U01RA9B5GG2", - "ts": "1700571550.031889", - "blocks": [ - { - "type": "rich_text", - "block_id": "XtCzB", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I don't think I can help you with that now, unless you want to work on your own integration with PowerBI " - }, - { - "type": "emoji", - "name": "slightly_frowning_face", - "unicode": "1f641" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700456258.614309", - "parent_user_id": "U066CNW85D3" - } - ] - }, - { - "client_msg_id": "aff93a22-bd07-470a-92b8-b4aafd17b114", - "type": "message", - "text": "what are the pros and cons of OL. we often talk about positives to market it but what are the pain points using OL,how it's addressing user issues?", - "user": "U066HKFCHUG", - "ts": "1700064658.956769", - "blocks": [ - { - "type": "rich_text", - "block_id": "af/Ig", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "what are the pros and cons of OL. we often talk about positives to market it but what are the pain points using OL,how it's addressing user issues?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700064658.956769", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1700159922.652539", - "reply_users": [ - "U02LXF3HUN7" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1700159922.652539", - "replies": [ - { - "client_msg_id": "00835fbb-223f-4679-b436-23ffc9aafffe", - "type": "message", - "text": "Hi <@U066HKFCHUG>, thanks for your question. We’ve heard that OpenLineage is attractive because of its desirable integrations, including a best-in-class Spark integration, its extensibility, the fact that it’s not destructive, and the fact that it’s open source. I’m not aware of pain points per se, but there are certainly features and integrations that we wish we could focus on but can’t at the moment — like the Dagster integration, which needs a new maintainer. OpenLineage is like any other open standard in that ecosystem coverage is a constant process rather than a journey, and it requires contributions in order to get close to 100%. Thankfully, we are gaining users and contributors all the time, and integrations are being added or improved upon daily. See the Ecosystem page on the website for a list of consumers and producers and links to more resources, and check out the for the codebase, commit history, contributors, governance procedures, and more. We’re quick to respond to messages here and issues on GitHub — usually within one day.", - "user": "U02LXF3HUN7", - "ts": "1700159922.652539", - "blocks": [ - { - "type": "rich_text", - "block_id": "b4e3K", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi " - }, - { - "type": "user", - "user_id": "U066HKFCHUG" - }, - { - "type": "text", - "text": ", thanks for your question. We’ve heard that OpenLineage is attractive because of its desirable integrations, including a best-in-class Spark integration, its extensibility, the fact that it’s not destructive, and the fact that it’s open source. I’m not aware of pain points per se, but there are certainly features and integrations that we wish we could focus on but can’t at the moment — like the Dagster integration, which needs a new maintainer. OpenLineage is like any other open standard in that ecosystem coverage is a constant process rather than a journey, and it requires contributions in order to get close to 100%. Thankfully, we are gaining users and contributors all the time, and integrations are being added or improved upon daily. See the Ecosystem page on the website for a list of consumers and producers and links to more resources, and check out the " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage", - "text": "GitHub repo" - }, - { - "type": "text", - "text": " for the codebase, commit history, contributors, governance procedures, and more. We’re quick to respond to messages here and issues on GitHub — usually within one day." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "OpenLineage/OpenLineage", - "text": "An Open Standard for lineage metadata collection", - "title": "OpenLineage/OpenLineage", - "fields": [ - { - "value": "", - "title": "Website", - "short": true - }, - { - "value": "1449", - "title": "Stars", - "short": true - } - ] - } - ], - "thread_ts": "1700064658.956769", - "parent_user_id": "U066HKFCHUG" - } - ] - }, - { - "client_msg_id": "f21b5e0b-4e53-407a-9f3d-97c6f3d4a986", - "type": "message", - "text": "Can anyone tell me why OL is better than other competitors if you can provide an analysis that would be great", - "user": "U066HKFCHUG", - "ts": "1700064564.825909", - "blocks": [ - { - "type": "rich_text", - "block_id": "V52kz", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Can anyone tell me why OL is better than other competitors if you can provide an analysis that would be great" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700064564.825909", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1700153176.479259", - "reply_users": [ - "U01HNKK4XAM" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "b6825ecd-c475-4cff-a701-842af7c6d68e", - "type": "message", - "text": "Hey <@U066HKFCHUG> can you help me understand what you mean by competitors?\nOL is a specification that can be used to solve various problems, so if you have a clear problem statement, maybe I can help with pros/cons for that problem", - "user": "U01HNKK4XAM", - "ts": "1700153176.479259", - "blocks": [ - { - "type": "rich_text", - "block_id": "7oHLi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hey " - }, - { - "type": "user", - "user_id": "U066HKFCHUG" - }, - { - "type": "text", - "text": " can you help me understand what you mean by competitors?\nOL is a specification that can be used to solve various problems, so if you have a clear problem statement, maybe I can help with pros/cons for that problem" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700064564.825909", - "parent_user_id": "U066HKFCHUG" - } - ] - }, - { - "client_msg_id": "f2964054-7781-41c2-85ec-e58345c88058", - "type": "message", - "text": "Hi\nCan anyone point me to the deck on how Airflow can be integrated using Openlineage?", - "user": "U066HKFCHUG", - "ts": "1700050644.509419", - "blocks": [ - { - "type": "rich_text", - "block_id": "WgWd5", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi\nCan anyone point me to the deck on how Airflow can be integrated using Openlineage?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700050644.509419", - "reply_count": 2, - "reply_users_count": 2, - "latest_reply": "1700051275.164259", - "reply_users": [ - "U01RA9B5GG2", - "U066HKFCHUG" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "ed2ee679-f2cc-4cdc-bbef-c163d66f2850", - "type": "message", - "text": "\n\n", - "user": "U01RA9B5GG2", - "ts": "1700051254.482469", - "blocks": [ - { - "type": "rich_text", - "block_id": "zzeIE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/index.html" - }, - { - "type": "text", - "text": "\n\n" - }, - { - "type": "link", - "url": "https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700050644.509419", - "parent_user_id": "U066HKFCHUG" - }, - { - "client_msg_id": "b63627c5-ff37-4c5c-9ff0-1a909e3048f4", - "type": "message", - "text": "thank you <@U01RA9B5GG2>", - "user": "U066HKFCHUG", - "ts": "1700051275.164259", - "blocks": [ - { - "type": "rich_text", - "block_id": "GzOOw", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "thank you " - }, - { - "type": "user", - "user_id": "U01RA9B5GG2" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700050644.509419", - "parent_user_id": "U066HKFCHUG" - } - ] - }, - { - "client_msg_id": "efc4a680-9991-48df-9816-fe75bb00d4bb", - "type": "message", - "text": "<@U02MK6YNAQ5> I diff CreateReplaceDatasetBuilder.java and CreateReplaceOutputDatasetBuilder.java and they are the same except for the class name, so I am not sure what is causing the change. I also realize you don't have a test case for ADLS", - "user": "U05T8BJD4DU", - "ts": "1699863517.394909", - "blocks": [ - { - "type": "rich_text", - "block_id": "ejWHB", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " I diff CreateReplaceDatasetBuilder.java and CreateReplaceOutputDatasetBuilder.java and they are the same except for the class name, so I am not sure what is causing the change. I also realize you don't have a test case for ADLS" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699863517.394909", - "reply_count": 2, - "reply_users_count": 2, - "latest_reply": "1699916804.305259", - "reply_users": [ - "U02MK6YNAQ5", - "U05T8BJD4DU" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "14a63719-a5a6-4cf4-aba3-9fcc8f228db4", - "type": "message", - "text": "Thanks <@U05T8BJD4DU> for your engagement in finding the cause and solution to this issue.\n\nAmong the technical problems, another problem here is that our databricks integration tests are run on AWS and the issue you describe occurs in Azure. I would consider this a primary issue as it is difficult for me to verify the behaviour you describe and fix it with a failing integration test at the start.\n\nAre you able to reproduce the issue on AWS Databricks environment so that we could include it in our integration tests and make sure the behvaiour will not change later on in future?", - "user": "U02MK6YNAQ5", - "ts": "1699869127.461919", - "blocks": [ - { - "type": "rich_text", - "block_id": "LrFjw", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks " - }, - { - "type": "user", - "user_id": "U05T8BJD4DU" - }, - { - "type": "text", - "text": " for your engagement in finding the cause and solution to this issue.\n\nAmong the technical problems, another problem here is that our databricks integration tests are run on AWS and the issue you describe occurs in Azure. I would consider this a primary issue as it is difficult for me to verify the behaviour you describe and fix it with a failing integration test at the start.\n\nAre you able to reproduce the issue on AWS Databricks environment so that we could include it in our integration tests and make sure the behvaiour will not change later on in future?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699863517.394909", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "2f6a95e7-dd6d-45f4-8c79-119c1d004c1c", - "type": "message", - "text": "I didn't know Azure and AWS Databricks are different. Let me try it on AWS as well", - "user": "U05T8BJD4DU", - "ts": "1699916804.305259", - "blocks": [ - { - "type": "rich_text", - "block_id": "SAyd2", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I didn't know Azure and AWS Databricks are different. Let me try it on AWS as well" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699863517.394909", - "parent_user_id": "U05T8BJD4DU" - } - ] - }, - { - "type": "message", - "text": "<@U02MK6YNAQ5> I went back to 1.4.1, output does show adls location. But environment facet is gone in 1.4.1. It shows up in 1.5.0 but namespace is back to dbfs....", - "files": [ - { - "id": "F0663EAEL0Y", - "created": 1699862379, - "timestamp": 1699862379, - "name": "data (11).json", - "title": "data (11).json", - "mimetype": "text/plain", - "filetype": "json", - "pretty_type": "JSON", - "user": "U05T8BJD4DU", - "user_team": "T01CWUYP5AR", - "editable": true, - "size": 52897, - "mode": "snippet", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F0663EAEL0Y/data__11_.json", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F0663EAEL0Y/download/data__11_.json", - "permalink": "https://openlineage.slack.com/files/U05T8BJD4DU/F0663EAEL0Y/data__11_.json", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F0663EAEL0Y-829953777d", - "edit_link": "https://openlineage.slack.com/files/U05T8BJD4DU/F0663EAEL0Y/data__11_.json/edit", - "preview": "{\r\n \"eventTime\":\"2023-11-13T07:49:59.575Z\",\r\n \"producer\":\"https://github.com/OpenLineage/OpenLineage/tree/1.4.1/integration/spark\",\r\n \"schemaURL\":\"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent\",\r\n \"eventType\":\"COMPLETE\",\r", - "preview_highlight": "
\n
\n
{
\n
   "eventTime":"2023-11-13T07:49:59.575Z",
\n
   "producer":"https://github.com/OpenLineage/OpenLineage/tree/1.4.1/integration/spark",
\n
   "schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent",
\n
   "eventType":"COMPLETE",
\n
\n
\n", - "lines": 1137, - "lines_more": 1132, - "preview_is_truncated": true, - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05T8BJD4DU", - "display_as_bot": false, - "ts": "1699862442.876219", - "blocks": [ - { - "type": "rich_text", - "block_id": "RX5T+", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " I went back to 1.4.1, output does show adls location. But environment facet is gone in 1.4.1. It shows up in 1.5.0 but namespace is back to dbfs...." - } - ] - } - ] - } - ], - "client_msg_id": "cdc89194-cce0-4480-8613-66db99259d74" - }, - { - "client_msg_id": "bc67ec9b-3910-4540-bcc0-eda4cf7452f1", - "type": "message", - "text": "Databricks needs to be re-written in a way that supports Databricks it seems like", - "user": "U05T8BJD4DU", - "ts": "1699691444.226989", - "blocks": [ - { - "type": "rich_text", - "block_id": "hkcRK", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Databricks needs to be re-written in a way that supports Databricks it seems like" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "d06c50d8-7cdc-40c2-b20a-f3b375abbffe", - "type": "message", - "text": "<@U02MK6YNAQ5> this is why if create a table with adls location it won't show input and output:\n\n\n\nBecause the catalog object is not there.", - "user": "U05T8BJD4DU", - "ts": "1699691373.531469", - "blocks": [ - { - "type": "rich_text", - "block_id": "+2vNS", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " this is why if create a table with adls location it won't show input and output:\n\n" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/spark35/src/main/java/io/openlineage/spark35/agent/lifecycle/plan/CreateReplaceOutputDatasetBuilder.java#L146-L148", - "text": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/spark35/src[…]k35/agent/lifecycle/plan/CreateReplaceOutputDatasetBuilder.java" - }, - { - "type": "text", - "text": "\n\nBecause the catalog object is not there." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05T8BJD4DU", - "ts": "1699691398.000000" - }, - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/spark35/src/main/java/io/openlineage/spark35/agent/lifecycle/plan/CreateReplaceOutputDatasetBuilder.java#L146-L148", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\n if (!di.isPresent()) {\n return Collections.emptyList();\n }\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ] - }, - { - "client_msg_id": "bf43c6e5-d1ea-4f72-a555-937229c6e5f5", - "type": "message", - "text": "<@U02MK6YNAQ5> regarding to , OL is parsing out the table location in Hive metastore, it is the location of the table in the catalog and not the physical location of the data. It is both right and wrong because it is a table, just it is an external table.\n\n", - "user": "U05T8BJD4DU", - "ts": "1699647945.224489", - "blocks": [ - { - "type": "rich_text", - "block_id": "uqacZ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " regarding to " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/2124" - }, - { - "type": "text", - "text": ", OL is parsing out the table location in Hive metastore, it is the location of the table in the catalog and not the physical location of the data. It is both right and wrong because it is a table, just it is an external table.\n\n" - }, - { - "type": "link", - "url": "https://docs.databricks.com/en/sql/language-manual/sql-ref-external-tables.html" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05T8BJD4DU", - "ts": "1699647973.000000" - }, - "attachments": [ - { - "from_url": "https://docs.databricks.com/en/sql/language-manual/sql-ref-external-tables.html", - "image_url": "https://www.databricks.com/wp-content/uploads/2020/04/og-databricks.png", - "image_width": 1200, - "image_height": 630, - "image_bytes": 12097, - "service_icon": "https://docs.databricks.com/en/_static/favicon.ico", - "id": 1, - "original_url": "https://docs.databricks.com/en/sql/language-manual/sql-ref-external-tables.html", - "fallback": "External tables", - "text": "Learn about Unity Catalog external tables in Databricks SQL and Databricks Runtime.", - "title": "External tables", - "title_link": "https://docs.databricks.com/en/sql/language-manual/sql-ref-external-tables.html", - "service_name": "docs.databricks.com" - }, - { - "id": 2, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1695498902, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/issues/2124", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2124 Same Delta Table not catching the location on write", - "text": "*What is the target system?*\n\nSpark / Databricks\n\n*What kind of integration is this?*\n\n☐ Produces OpenLineage metadata\n☐ Consumes OpenLineage metadata\n☐ Something else\n\n*How should this integration be implemented?*\n\nI am using OL 1.2.2, Azure Databricks Runtime 11.3 LTS. When creating a table writing into a ADLS location, OL won't be able to catch the location of the output. But when I read the same object it will be able to read the location as INPUT.\n\nPlease note I have also tested Databricks Runtime 13.3 LTS, Spark 3.4.1 - it will give correct ADLS location in INPUT but the input will only show up once in a blue moon. Most of the time the inputs and outputs are blank.\n\n```\n \"inputs\": [],\n \"outputs\": []\n```\n\n```\nCREATE OR REPLACE TABLE transactions_adj\nUSING DELTA LOCATION ''\nAS\n SELECT\n household_id,\n basket_id,\n week_no,\n day,\n transaction_time,\n store_id,\n product_id,\n amount_list,\n campaign_coupon_discount,\n manuf_coupon_discount,\n manuf_coupon_match_discount,\n total_coupon_discount,\n instore_discount,\n amount_paid,\n units\n FROM (\n SELECT \n household_id,\n basket_id,\n week_no,\n day,\n transaction_time,\n store_id,\n product_id,\n COALESCE(sales_amount - discount_amount - coupon_discount - coupon_discount_match,0.0) as amount_list,\n CASE \n WHEN COALESCE(coupon_discount_match,0.0) = 0.0 THEN -1 * COALESCE(coupon_discount,0.0) \n ELSE 0.0 \n END as campaign_coupon_discount,\n CASE \n WHEN COALESCE(coupon_discount_match,0.0) != 0.0 THEN -1 * COALESCE(coupon_discount,0.0) \n ELSE 0.0 \n END as manuf_coupon_discount,\n -1 * COALESCE(coupon_discount_match,0.0) as manuf_coupon_match_discount,\n -1 * COALESCE(coupon_discount - coupon_discount_match,0.0) as total_coupon_discount,\n COALESCE(-1 * discount_amount,0.0) as instore_discount,\n COALESCE(sales_amount,0.0) as `amount_paid,`\n quantity as units\n FROM transactions\n );\n```\n\nHere's the COMPLETE event:\n\n```\n\n \"outputs\":[\n {\n \"namespace\":\"dbfs\",\n \"name\":\"/user/hive/warehouse/journey.db/transactions_adj\",\n \"facets\":{\n \"dataSource\":{\n \"_producer\":\"\",\n \"_schemaURL\":\"\",\n \"name\":\"dbfs\",\n \"uri\":\"dbfs\"\n },\n\n```\n\nBelow logical plan shows the path:\n\n```\n== Analyzed Logical Plan ==\nnum_affected_rows: bigint, num_inserted_rows: bigint\nReplaceTableAsSelect TableSpec(Map(),Some(DELTA),Map(),Some(),None,None,false,Set()), true\n:- ResolvedIdentifier com.databricks.sql.managedcatalog.UnityCatalogV2Proxy@6251a8df, default.transactions_adj\n+- Project [household_id#184, basket_id#185L, week_no#193, day#186, transaction_time#192, store_id#190, product_id#187, amount_list#147, campaign_coupon_discount#148, manuf_coupon_discount#149, manuf_coupon_match_discount#150, total_coupon_discount#151, instore_discount#152, amount_paid#153, units#154]\n +- SubqueryAlias __auto_generated_subquery_name\n +- Project [household_id#184, basket_id#185L, week_no#193, day#186, transaction_time#192, store_id#190, product_id#187, coalesce(cast((((sales_amount#189 - discount_amount#191) - coupon_discount#194) - coupon_discount_match#195) as double), cast(0.0 as double)) AS amount_list#147, CASE WHEN (coalesce(cast(coupon_discount_match#195 as double), cast(0.0 as double)) = cast(0.0 as double)) THEN (cast(-1 as double) * coalesce(cast(coupon_discount#194 as double), cast(0.0 as double))) ELSE cast(0.0 as double) END AS campaign_coupon_discount#148, CASE WHEN NOT (coalesce(cast(coupon_discount_match#195 as double), cast(0.0 as double)) = cast(0.0 as double)) THEN (cast(-1 as double) * coalesce(cast(coupon_discount#194 as double), cast(0.0 as double))) ELSE cast(0.0 as double) END AS manuf_coupon_discount#149, (cast(-1 as double) * coalesce(cast(coupon_discount_match#195 as double), cast(0.0 as double))) AS manuf_coupon_match_discount#150, (cast(-1 as double) * coalesce(cast((coupon_discount#194 - coupon_discount_match#195) as double), cast(0.0 as double))) AS total_coupon_discount#151, coalesce(cast((cast(-1 as float) * discount_amount#191) as double), cast(0.0 as double)) AS instore_discount#152, coalesce(cast(sales_amount#189 as double), cast(0.0 as double)) AS amount_paid#153, quantity#188 AS units#154]\n +- SubqueryAlias spark_catalog.default.transactions\n +- Relation spark_catalog.default.transactions[household_id#184,basket_id#185L,day#186,product_id#187,quantity#188,sales_amount#189,store_id#190,discount_amount#191,transaction_time#192,week_no#193,coupon_discount#194,coupon_discount_match#195] parquet\n```\n\n*Where should this integration be implemented?*\n\n☐ In the target system\n☐ In the OpenLineage repo\n☐ Somewhere else\n\n*Do you plan to make this contribution yourself?*\n\n☐ I am interested in doing this work", - "title": "#2124 Same Delta Table not catching the location on write", - "title_link": "https://github.com/OpenLineage/OpenLineage/issues/2124", - "footer": "", - "fields": [ - { - "value": "integration/spark, integration/databricks", - "title": "Labels", - "short": true - }, - { - "value": "5", - "title": "Comments", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1699647945.224489", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1699648348.285049", - "reply_users": [ - "U05T8BJD4DU" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "34c0c0bc-49dd-48b6-9070-10348752259b", - "type": "message", - "text": "Here's for more reference: ", - "user": "U05T8BJD4DU", - "ts": "1699648348.285049", - "blocks": [ - { - "type": "rich_text", - "block_id": "XWYlR", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Here's for more reference: " - }, - { - "type": "link", - "url": "https://dilorom.medium.com/finding-the-path-to-a-table-in-databricks-2c74c6009dbb" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://dilorom.medium.com/finding-the-path-to-a-table-in-databricks-2c74c6009dbb", - "ts": 1676841924, - "image_url": "https://miro.medium.com/v2/resize:fit:1200/0*mynRvuX_x2XOkqmx", - "image_width": 1200, - "image_height": 750, - "image_bytes": 196986, - "service_icon": "https://miro.medium.com/v2/resize:fill:152:152/1*sHhtYhaCe2Uc3IU0IgKwIQ.png", - "id": 1, - "original_url": "https://dilorom.medium.com/finding-the-path-to-a-table-in-databricks-2c74c6009dbb", - "fallback": "Medium: Finding the Path to a Managed Table in Databricks", - "text": "This article shows how to find a path for a managed Databricks table.", - "title": "Finding the Path to a Managed Table in Databricks", - "title_link": "https://dilorom.medium.com/finding-the-path-to-a-table-in-databricks-2c74c6009dbb", - "service_name": "Medium", - "fields": [ - { - "value": "2 min read", - "title": "Reading time", - "short": true - } - ] - } - ], - "thread_ts": "1699647945.224489", - "parent_user_id": "U05T8BJD4DU" - } - ] - }, - { - "client_msg_id": "7210d352-df88-4ba2-bebd-a47cbd15b2d4", - "type": "message", - "text": "\nFriendly reminder: this month’s TSC meeting, open to all, is tomorrow at 10 am PT: ", - "user": "U02LXF3HUN7", - "ts": "1699465494.687309", - "blocks": [ - { - "type": "rich_text", - "block_id": "OMGKh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nFriendly reminder: this month’s TSC meeting, open to all, is tomorrow at 10 am PT: " - }, - { - "type": "link", - "url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1699027207361229" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1699027207361229", - "ts": "1699027207.361229", - "author_id": "U02LXF3HUN7", - "channel_id": "C01CK9T7HKR", - "channel_team": "T01CWUYP5AR", - "is_msg_unfurl": true, - "message_blocks": [ - { - "team": "T01CWUYP5AR", - "channel": "C01CK9T7HKR", - "ts": "1699027207.361229", - "message": { - "blocks": [ - { - "type": "rich_text", - "block_id": "VnOMq", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nThis month’s TSC meeting (open to all) is next Thursday the 9th at 10am PT. On the agenda:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "announcements" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "recent releases" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "recent additions to the Flink integration by " - }, - { - "type": "user", - "user_id": "U05QA2D1XNV" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "recent additions to the Spark integration by " - }, - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "updates on proposals by " - }, - { - "type": "user", - "user_id": "U01DCLP0GU9" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "discussion topics" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "open discussion" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "More info and the meeting link can be found on the " - }, - { - "type": "link", - "url": "https://openlineage.io/meetings/", - "text": "website" - }, - { - "type": "text", - "text": ". All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda." - } - ] - } - ] - } - ] - } - } - ], - "id": 1, - "original_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1699027207361229", - "fallback": "[November 3rd, 2023 9:00 AM] michael282: \nThis month’s TSC meeting (open to all) is next Thursday the 9th at 10am PT. On the agenda:\n• announcements\n• recent releases\n• recent additions to the Flink integration by <@U05QA2D1XNV> \n• recent additions to the Spark integration by <@U02MK6YNAQ5> \n• updates on proposals by <@U01DCLP0GU9> \n• discussion topics\n• open discussion\nMore info and the meeting link can be found on the . All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda.", - "text": "\nThis month’s TSC meeting (open to all) is next Thursday the 9th at 10am PT. On the agenda:\n• announcements\n• recent releases\n• recent additions to the Flink integration by <@U05QA2D1XNV> \n• recent additions to the Spark integration by <@U02MK6YNAQ5> \n• updates on proposals by <@U01DCLP0GU9> \n• discussion topics\n• open discussion\nMore info and the meeting link can be found on the . All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda.", - "author_name": "Michael Robinson", - "author_link": "https://openlineage.slack.com/team/U02LXF3HUN7", - "author_icon": "https://avatars.slack-edge.com/2022-01-25/3019716733729_66fea720e9504dc08144_48.jpg", - "author_subname": "Michael Robinson", - "mrkdwn_in": [ - "text" - ], - "footer": "Slack Conversation" - } - ], - "reactions": [ - { - "name": "+1", - "users": [ - "U02S6F54MAB" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "425424cf-69e6-4b05-9699-130951f09ab6", - "type": "message", - "text": "Has anyone here tried OpenLineage with Spark on Amazon EMR?", - "user": "U05TU0U224A", - "ts": "1699465132.534889", - "blocks": [ - { - "type": "rich_text", - "block_id": "AHU5F", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Has anyone here tried OpenLineage with Spark on Amazon EMR?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699465132.534889", - "reply_count": 3, - "reply_users_count": 3, - "latest_reply": "1700497766.698579", - "reply_users": [ - "U05T8BJD4DU", - "U053LCT71BQ", - "U05TU0U224A" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "dcefe260-742d-4735-ad72-7e19d7303e07", - "type": "message", - "text": "No but it should work the same I tried on AWS and Google Colab and Azure", - "user": "U05T8BJD4DU", - "ts": "1699466476.239829", - "blocks": [ - { - "type": "rich_text", - "block_id": "DhY38", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "No but it should work the same I tried on AWS and Google Colab and Azure" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699465132.534889", - "parent_user_id": "U05TU0U224A", - "reactions": [ - { - "name": "+1", - "users": [ - "U02S6F54MAB" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "2bc40618-6fa1-4d3f-b5ce-37a92d166c2b", - "type": "message", - "text": "Yes. <@U05HBLE7YPL> could provide some details if needed.", - "user": "U053LCT71BQ", - "ts": "1699517454.058999", - "blocks": [ - { - "type": "rich_text", - "block_id": "/ny7q", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Yes. " - }, - { - "type": "user", - "user_id": "U05HBLE7YPL" - }, - { - "type": "text", - "text": " could provide some details if needed." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699465132.534889", - "parent_user_id": "U05TU0U224A", - "reactions": [ - { - "name": "+1", - "users": [ - "U05HBLE7YPL" - ], - "count": 1 - }, - { - "name": "fire", - "users": [ - "U01RA9B5GG2" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "73fc95f8-a897-4a16-ae48-f494d98970eb", - "type": "message", - "text": "Thanks <@U053LCT71BQ>\nHI <@U05HBLE7YPL> i was able to set up a spark cluster on AWS EMR but im struggling to configure the OL Listener. Ive tried with steps and bootstrap actions for the jar and it didn't work out. How did you manage to include the jar? Besides, what about the spark configuration? Could you send me a sample of these configs?", - "user": "U05TU0U224A", - "ts": "1700497766.698579", - "blocks": [ - { - "type": "rich_text", - "block_id": "NuX5f", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks " - }, - { - "type": "user", - "user_id": "U053LCT71BQ" - }, - { - "type": "text", - "text": "\nHI " - }, - { - "type": "user", - "user_id": "U05HBLE7YPL" - }, - { - "type": "text", - "text": " i was able to set up a spark cluster on AWS EMR but im struggling to configure the OL Listener. Ive tried with steps and bootstrap actions for the jar and it didn't work out. How did you manage to include the jar? Besides, what about the spark configuration? Could you send me a sample of these configs?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699465132.534889", - "parent_user_id": "U05TU0U224A" - } - ] - }, - { - "client_msg_id": "1cf56ca5-8527-4399-9b26-877012f4d625", - "type": "message", - "text": "if I have a dataset on adls gen2 which synapse connects to as an external delta table, is that the use case of a symlink dataset? the delta table is connected to by PBI and by Synapse, but the underlying data is exactly the same", - "user": "U05NMJ0NBUK", - "ts": "1699372165.804069", - "blocks": [ - { - "type": "rich_text", - "block_id": "rqeWx", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "if I have a dataset on adls gen2 which synapse connects to as an external delta table, is that the use case of a symlink dataset? the delta table is connected to by PBI and by Synapse, but the underlying data is exactly the same" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699372165.804069", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1699458544.315609", - "reply_users": [ - "U01RA9B5GG2" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "3d4517cc-ae51-40f5-a2de-5721ba9585be", - "type": "message", - "text": "Sounds like it, yes - if the logical dataset names are different but physical one is the same", - "user": "U01RA9B5GG2", - "ts": "1699458544.315609", - "blocks": [ - { - "type": "rich_text", - "block_id": "bEjsY", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Sounds like it, yes - if the logical dataset names are different but physical one is the same" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699372165.804069", - "parent_user_id": "U05NMJ0NBUK" - } - ] - }, - { - "client_msg_id": "b227b661-6908-4ebb-9e0b-34e904a9dcdc", - "type": "message", - "text": "Hi all, we (I work with <@U05VDHJJ9T7> and <@U05HBLE7YPL>) have a quick question regarding the spark integration:\nif a spark app contains several jobs, they will be named \"my_spark_app_name.job1\" and \"my_spark_app_name.job2\"\neg:\nspark_job.collect_limit\nspark_job.map_partitions_parallel_collection\n\nIf I understood correctly, the spark integration maps one Spark job to a single OpenLineage Job, and the application itself should be assigned a Run id at startup and each job that executes will report the application's Run id as its parent job run (taken from: ).\n\nIn our case, the app Run Id is never created, and the jobs runs don't contain any parent facets. We tested it with a recent integration version in 1.4.1 and also an older one (0.26.0).\nDid we miss something in the OL spark integration config?", - "user": "U05J9LZ355L", - "ts": "1699355029.839029", - "blocks": [ - { - "type": "rich_text", - "block_id": "qYUfo", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi all, we (I work with " - }, - { - "type": "user", - "user_id": "U05VDHJJ9T7" - }, - { - "type": "text", - "text": " and " - }, - { - "type": "user", - "user_id": "U05HBLE7YPL" - }, - { - "type": "text", - "text": ") have a quick question regarding the spark integration:\nif a spark app contains several jobs, they will be named \"my_spark_app_name.job1\" and \"my_spark_app_name.job2\"\neg:\nspark_job.collect_limit\nspark_job.map_partitions_parallel_collection\n\nIf I understood correctly, the spark integration maps one Spark job to a single OpenLineage Job, and the application itself should be assigned a Run id at startup and each job that executes will report the application's Run id as its parent job run (taken from: " - }, - { - "type": "link", - "url": "https://openlineage.io/docs/integrations/spark/" - }, - { - "type": "text", - "text": ").\n\nIn our case, the app Run Id is never created, and the jobs runs don't contain any parent facets. We tested it with a recent integration version in 1.4.1 and also an older one (0.26.0).\nDid we miss something in the OL spark integration config?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699355029.839029", - "reply_count": 4, - "reply_users_count": 2, - "latest_reply": "1699364218.932689", - "reply_users": [ - "U02MK6YNAQ5", - "U05J9LZ355L" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "4bb63de7-60e0-4443-bc53-395529786d26", - "type": "message", - "text": "hey, a name of the output dataset should be put at the end of the job name. This was introduced to help with jobs that call multiple spark actions", - "user": "U02MK6YNAQ5", - "ts": "1699355271.048109", - "blocks": [ - { - "type": "rich_text", - "block_id": "11PhC", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "hey, a name of the output dataset should be put at the end of the job name. This was introduced to help with jobs that call multiple spark actions" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699355029.839029", - "parent_user_id": "U05J9LZ355L" - }, - { - "client_msg_id": "93865d43-03ba-4aed-bc8e-d134ea79539d", - "type": "message", - "text": "Hi Paweł,\nThanks for your answer, yes indeed with the newer version of OL, we automatically have the name of the output dataset at the end of the job name, but no App run id, nor any parent run facet.", - "user": "U05J9LZ355L", - "ts": "1699358752.643889", - "blocks": [ - { - "type": "rich_text", - "block_id": "Saf+l", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi Paweł,\nThanks for your answer, yes indeed with the newer version of OL, we automatically have the name of the output dataset at the end of the job name, but no App run id, nor any parent run facet." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699355029.839029", - "parent_user_id": "U05J9LZ355L" - }, - { - "client_msg_id": "f77ea12e-ea22-4c07-a1a9-8e7a2b45c9a3", - "type": "message", - "text": "yes, you're right. I mean you can set in config `spark.openlineage.parentJobName` which will be shared through whole app run, but this needs to be set manually", - "user": "U02MK6YNAQ5", - "ts": "1699363004.077549", - "blocks": [ - { - "type": "rich_text", - "block_id": "L5WWY", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yes, you're right. I mean you can set in config " - }, - { - "type": "text", - "text": "spark.openlineage.parentJobName", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " which will be shared through whole app run, but this needs to be set manually" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699355029.839029", - "parent_user_id": "U05J9LZ355L" - }, - { - "client_msg_id": "24dc4817-ff81-4505-b814-0c6090f0301d", - "type": "message", - "text": "I see, thanks a lot for your reply we'll try that", - "user": "U05J9LZ355L", - "ts": "1699364218.932689", - "blocks": [ - { - "type": "rich_text", - "block_id": "b8b1A", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I see, thanks a lot for your reply we'll try that" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699355029.839029", - "parent_user_id": "U05J9LZ355L" - } - ] - }, - { - "client_msg_id": "4f10979d-6d7d-4a68-8a53-00a158a9c222", - "type": "message", - "text": "Hey team! :wave:\n\nWe're trying to use openlineage-flink, and would like provide the `openlineage.transport.type=http` and configure other transport configs, but we're not able to find sufficient docs (tried ) on where/how these configs can be provided.\n\nFor example, in spark, the changes mostly were delegated to the spark-submit command like\n```spark-submit --conf \"spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener\" \\\n --packages \"io.openlineage:openlineage-spark:<spark-openlineage-version>\" \\\n --conf \"spark.openlineage.transport.url=http://{openlineage.client.host}/api/v1/namespaces/spark_integration/\" \\\n --class com.mycompany.MySparkApp my_application.jar```\nAnd the `OpenLineageSparkListener` has a method to retrieve the provided spark confs as an object in the ArgumentParser. Similarly, looking for some pointers on how the openlineage.transport configs can be provided to `OpenLineageFlinkJobListener` & how the flink listener parses/uses these configs\n\nTIA! :smile:", - "user": "U05JBHLPY8K", - "ts": "1699266123.453379", - "blocks": [ - { - "type": "rich_text", - "block_id": "77vAn", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hey team! " - }, - { - "type": "emoji", - "name": "wave", - "unicode": "1f44b" - }, - { - "type": "text", - "text": "\n\nWe're trying to use openlineage-flink, and would like provide the " - }, - { - "type": "text", - "text": "openlineage.transport.type=http", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " and configure other transport configs, but we're not able to find sufficient docs (tried " - }, - { - "type": "link", - "url": "https://openlineage.io/docs/integrations/flink", - "text": "this doc" - }, - { - "type": "text", - "text": ") on where/how these configs can be provided.\n\nFor example, in spark, the changes mostly were delegated to the spark-submit command like\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "spark-submit --conf \"spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener\" \\\n --packages \"io.openlineage:openlineage-spark:\" \\\n --conf \"spark.openlineage.transport.url=http://{openlineage.client.host}/api/v1/namespaces/spark_integration/\" \\\n --class com.mycompany.MySparkApp my_application.jar" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "\nAnd the " - }, - { - "type": "text", - "text": "OpenLineageSparkListener", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " has a method to retrieve the provided spark confs as an object in the ArgumentParser. Similarly, looking for some pointers on how the openlineage.transport configs can be provided to " - }, - { - "type": "text", - "text": "OpenLineageFlinkJobListener", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " & how the flink listener parses/uses these configs\n\nTIA! " - }, - { - "type": "emoji", - "name": "smile", - "unicode": "1f604" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.io/docs/integrations/flink", - "service_icon": "https://openlineage.io/img/favicon.ico", - "id": 1, - "original_url": "https://openlineage.io/docs/integrations/flink", - "fallback": "Apache Flink | OpenLineage", - "text": "This integration is considered experimental: only specific workflows and use cases are supported.", - "title": "Apache Flink | OpenLineage", - "title_link": "https://openlineage.io/docs/integrations/flink", - "service_name": "openlineage.io" - } - ], - "thread_ts": "1699266123.453379", - "reply_count": 5, - "reply_users_count": 2, - "latest_reply": "1699553036.057049", - "reply_users": [ - "U01RA9B5GG2", - "U05JBHLPY8K" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "b718b130-01f6-4c72-97a8-2145978818f4", - "type": "message", - "text": "similarly to spark config, you can use flink config", - "user": "U01RA9B5GG2", - "ts": "1699354569.864879", - "blocks": [ - { - "type": "rich_text", - "block_id": "3gH65", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "similarly to spark config, you can use flink config" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699266123.453379", - "parent_user_id": "U05JBHLPY8K" - }, - { - "client_msg_id": "767B5029-9253-418F-91D8-1B355C1CB159", - "type": "message", - "text": "<@U01RA9B5GG2> - Got it. Our use-case is that we're trying to build a wrapper on top of openlineage-flink for productionising for our flink jobs.\n\nWe're trying to have a wrapper class that extends OpenLineageFlinkJobListener class, and overwrites the HTTP transport endpoint/url to a constant value (say, and /api/v1/flink). But we see that the OpenLineageFlinkJobListener constructor is defined as a private constructor - just wanted to check with the team whether it was just a default scope, or intended to be private. If it was just a default scope, can we contribute a PR to make it public, to make it friendly for teams trying to adopt & extend openlineage?\n\nAnd also, we wanted to understand better on where we're reading the HTTP transport endpoint/url configs in OpenLineageFlinkJobListener and what'd be the best place to override it to the constant endpoint/url for our use-case", - "user": "U05JBHLPY8K", - "ts": "1699414613.261979", - "blocks": [ - { - "type": "rich_text", - "block_id": "U0Vuf", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U01RA9B5GG2" - }, - { - "type": "text", - "text": " - Got it. Our use-case is that we're trying to build a wrapper on top of openlineage-flink for productionising for our flink jobs.\n\nWe're trying to have a wrapper class that extends OpenLineageFlinkJobListener class, and overwrites the HTTP transport endpoint/url to a constant value (say, " - }, - { - "type": "link", - "url": "http://example.com", - "text": "example.com" - }, - { - "type": "text", - "text": " and /api/v1/flink). But we see that the OpenLineageFlinkJobListener constructor is defined as a private constructor - just wanted to check with the team whether it was just a default scope, or intended to be private. If it was just a default scope, can we contribute a PR to make it public, to make it friendly for teams trying to adopt & extend openlineage?\n\nAnd also, we wanted to understand better on where we're reading the HTTP transport endpoint/url configs in OpenLineageFlinkJobListener and what'd be the best place to override it to the constant endpoint/url for our use-case" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05JBHLPY8K", - "ts": "1699415789.000000" - }, - "thread_ts": "1699266123.453379", - "parent_user_id": "U05JBHLPY8K" - }, - { - "client_msg_id": "5d48005e-a7c3-43ad-8f94-26e47ceddba5", - "type": "message", - "text": "We parse flink conf to get that information: \n\n> But we see that the OpenLineageFlinkJobListener constructor is defined as a private constructor - just wanted to check with the team whether it was just a default scope, or intended to be private.\nThe way to construct is is a public builder in the same class\n\nI think easier way than wrapper class would be use existing flink configuration, or to set up `OPENLINEAGE_URL` env variable, or have `openlineage.yml` config file - not sure why this is the way you've chosen?", - "user": "U01RA9B5GG2", - "ts": "1699440943.234349", - "blocks": [ - { - "type": "rich_text", - "block_id": "mBhxR", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "We parse flink conf to get that information: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/26494b596e9669d2ada164066a73c44e04e884ba/integration/flink/src/main/java/io/openlineage/flink/client/EventEmitter.java#L35", - "text": "https://github.com/OpenLineage/OpenLineage/blob/26494b596e9669d2ada164066a73c44e04[…]ink/src/main/java/io/openlineage/flink/client/EventEmitter.java" - }, - { - "type": "text", - "text": "\n\n" - } - ] - }, - { - "type": "rich_text_quote", - "elements": [ - { - "type": "text", - "text": "But we see that the OpenLineageFlinkJobListener constructor is defined as a private constructor - just wanted to check with the team whether it was just a default scope, or intended to be private." - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "\nThe way to construct is is a public builder in the same class\n\nI think easier way than wrapper class would be use existing flink configuration, or to set up " - }, - { - "type": "text", - "text": "OPENLINEAGE_URL", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " env variable, or have " - }, - { - "type": "text", - "text": "openlineage.yml", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " config file - not sure why this is the way you've chosen?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/26494b596e9669d2ada164066a73c44e04e884ba/integration/flink/src/main/java/io/openlineage/flink/client/EventEmitter.java#L35", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\n public EventEmitter(Configuration configuration) {\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1699266123.453379", - "parent_user_id": "U05JBHLPY8K" - }, - { - "client_msg_id": "665487f6-7eb8-4b61-b441-b1f00cbf90b9", - "type": "message", - "text": "> I think easier way than wrapper class would be use existing flink configuration, or to set up `OPENLINEAGE_URL` env variable, or have `openlineage.yml` config file - not sure why this is the way you've chosen?\n<@U01RA9B5GG2> - The reasoning behind going with a wrapper class is that we can abstract out the nitty-gritty like how/where we're publishing openlineage events etc - especially for companies that have a lot of teams that may be adopting openlineage.\n\nFor example, if we wanna move away from http transport to kafka transport - we'd be changing only this wrapper class and ask folks to update their wrapper class dependency version. If we went without the wrapper class, then the exact config changes would need to be synced and done by many different teams, who may not have enough context.\n\nSimilarly, if we wanna enable some other default best-practise configs, or inject any company-specific configs etc, the wrapper would be useful in abstracting out the details and be the 1 place that handles all openlineage related integrations for any future changes.\n\nThat's why we wanna extend openlineage's listener class & leverage most of the OSS code as-is; and at the same time, have the ability to extend & inject customisations. I think that's where some things like having getters for the class object attributes, or having public constructors would be really helpful :smile:", - "user": "U05JBHLPY8K", - "ts": "1699551662.975039", - "blocks": [ - { - "type": "rich_text", - "block_id": "eIuSR", - "elements": [ - { - "type": "rich_text_quote", - "elements": [ - { - "type": "text", - "text": "I think easier way than wrapper class would be use existing flink configuration, or to set up " - }, - { - "type": "text", - "text": "OPENLINEAGE_URL", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " env variable, or have " - }, - { - "type": "text", - "text": "openlineage.yml", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " config file - not sure why this is the way you've chosen?" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "\n" - }, - { - "type": "user", - "user_id": "U01RA9B5GG2" - }, - { - "type": "text", - "text": " - The reasoning behind going with a wrapper class is that we can abstract out the nitty-gritty like how/where we're publishing openlineage events etc - especially for companies that have a lot of teams that may be adopting openlineage.\n\nFor example, if we wanna move away from http transport to kafka transport - we'd be changing only this wrapper class and ask folks to update their wrapper class dependency version. If we went without the wrapper class, then the exact config changes would need to be synced and done by many different teams, who may not have enough context.\n\nSimilarly, if we wanna enable some other default best-practise configs, or inject any company-specific configs etc, the wrapper would be useful in abstracting out the details and be the 1 place that handles all openlineage related integrations for any future changes.\n\nThat's why we wanna extend openlineage's listener class & leverage most of the OSS code as-is; and at the same time, have the ability to extend & inject customisations. I think that's where some things like having getters for the class object attributes, or having public constructors would be really helpful " - }, - { - "type": "emoji", - "name": "smile", - "unicode": "1f604" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05JBHLPY8K", - "ts": "1699555439.000000" - }, - "thread_ts": "1699266123.453379", - "parent_user_id": "U05JBHLPY8K" - }, - { - "client_msg_id": "e0063ce9-8e38-4649-8a66-60d769bee3fa", - "type": "message", - "text": "<@U05JBHLPY8K> that makes sense. Feel free to provide PR adding getters and stuff.", - "user": "U01RA9B5GG2", - "ts": "1699553036.057049", - "blocks": [ - { - "type": "rich_text", - "block_id": "xscjy", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U05JBHLPY8K" - }, - { - "type": "text", - "text": " that makes sense. Feel free to provide PR adding getters and stuff." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699266123.453379", - "parent_user_id": "U05JBHLPY8K", - "reactions": [ - { - "name": "tada", - "users": [ - "U05JBHLPY8K" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "9ddf3aa3-95fb-4922-94da-538fcfe0aeae", - "type": "message", - "text": ":wave: I raised a PR off the back of some Marquez conversations a while back to try and clarify how names of Snowflake objects should be expressed in OL events. I used as a guide, but also I appreciate there are other OL producers that involve Snowflake too (Airflow? dbt?). Any feedback on this would be appreciated!", - "user": "U0635GK8Y14", - "ts": "1699261422.618719", - "blocks": [ - { - "type": "rich_text", - "block_id": "PCfWI", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "emoji", - "name": "wave", - "unicode": "1f44b" - }, - { - "type": "text", - "text": " I raised a PR " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2223" - }, - { - "type": "text", - "text": " off the back of some Marquez conversations a while back to try and clarify how names of Snowflake objects should be expressed in OL events. I used " - }, - { - "type": "link", - "url": "https://github.com/Snowflake-Labs/OpenLineage-AccessHistory-Setup", - "text": "Snowflake’s OL view" - }, - { - "type": "text", - "text": " as a guide, but also I appreciate there are other OL producers that involve Snowflake too (Airflow? dbt?). Any feedback on this would be appreciated!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1698847613, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/pull/2223", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2223 spec: add clarity to snowflake naming docs", - "title": "#2223 spec: add clarity to snowflake naming docs", - "title_link": "https://github.com/OpenLineage/OpenLineage/pull/2223", - "footer": "", - "mrkdwn_in": [ - "text" - ] - }, - { - "id": 2, - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/Snowflake-Labs/OpenLineage-AccessHistory-Setup", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "Snowflake-Labs/OpenLineage-AccessHistory-Setup", - "title": "Snowflake-Labs/OpenLineage-AccessHistory-Setup", - "fields": [ - { - "value": "11", - "title": "Stars", - "short": true - }, - { - "value": "3 months ago", - "title": "Last updated", - "short": true - } - ] - } - ], - "thread_ts": "1699261422.618719", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1699458155.637699", - "reply_users": [ - "U0635GK8Y14" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "d6c8048f-0074-4940-9c8d-428c7f108399", - "type": "message", - "text": "Thanks for merging this <@U01RA9B5GG2>!", - "user": "U0635GK8Y14", - "ts": "1699458155.637699", - "blocks": [ - { - "type": "rich_text", - "block_id": "QoDq+", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks for merging this " - }, - { - "type": "user", - "user_id": "U01RA9B5GG2" - }, - { - "type": "text", - "text": "!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699261422.618719", - "parent_user_id": "U0635GK8Y14", - "reactions": [ - { - "name": "+1", - "users": [ - "U01RA9B5GG2" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "57c0c5ad-870f-4ea8-9d65-e86039204baa", - "type": "message", - "text": "Hi Team , we are trying to customize the events by writing custom lineage listener extending OpenLineageSparkListener, but would need some direction how to capture the events", - "user": "U062Q95A1FG", - "ts": "1699096090.087359", - "blocks": [ - { - "type": "rich_text", - "block_id": "rx6pv", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi Team , we are trying to customize the events by writing custom lineage listener extending OpenLineageSparkListener, but would need some direction how to capture the events" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699096090.087359", - "reply_count": 9, - "reply_users_count": 2, - "latest_reply": "1699100482.031569", - "reply_users": [ - "U02S6F54MAB", - "U062Q95A1FG" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "2e6f8167-50bf-49cf-85d5-24ae87ae8e42", - "type": "message", - "text": "\nDo you need some more guidance than that?", - "user": "U02S6F54MAB", - "ts": "1699096306.006439", - "blocks": [ - { - "type": "rich_text", - "block_id": "FuLGB", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1698315220142929" - }, - { - "type": "text", - "text": "\nDo you need some more guidance than that?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02S6F54MAB", - "ts": "1699096309.000000" - }, - "attachments": [ - { - "from_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1698315220142929", - "ts": "1698315220.142929", - "author_id": "U062Q95A1FG", - "channel_id": "C01CK9T7HKR", - "channel_team": "T01CWUYP5AR", - "is_msg_unfurl": true, - "is_thread_root_unfurl": true, - "message_blocks": [ - { - "team": "T01CWUYP5AR", - "channel": "C01CK9T7HKR", - "ts": "1698315220.142929", - "message": { - "blocks": [ - { - "type": "rich_text", - "block_id": "V6ApU", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi I want to customise the events which comes from Openlineage spark . Can some one give some information" - } - ] - } - ] - } - ] - } - } - ], - "id": 1, - "original_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1698315220142929", - "fallback": "[October 26th, 2023 3:13 AM] n.priya88: Hi I want to customise the events which comes from Openlineage spark . Can some one give some information", - "text": "Hi I want to customise the events which comes from Openlineage spark . Can some one give some information", - "author_name": "priya narayana", - "author_link": "https://openlineage.slack.com/team/U062Q95A1FG", - "author_icon": "https://avatars.slack-edge.com/2023-10-26/6084416738247_2017b9ef79397fadc4f2_48.png", - "author_subname": "priya narayana", - "mrkdwn_in": [ - "text" - ], - "footer": "Thread in Slack Conversation" - } - ], - "thread_ts": "1699096090.087359", - "parent_user_id": "U062Q95A1FG" - }, - { - "client_msg_id": "058fb0f2-eaba-45ba-b3a3-3ec93d23ab63", - "type": "message", - "text": "yes", - "user": "U062Q95A1FG", - "ts": "1699096427.399719", - "blocks": [ - { - "type": "rich_text", - "block_id": "IB8ze", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yes" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699096090.087359", - "parent_user_id": "U062Q95A1FG" - }, - { - "client_msg_id": "b32051d2-c4ec-4a36-8a2e-ab7781fc2932", - "type": "message", - "text": "It seems pretty extensively described, what kind of help do you need?", - "user": "U02S6F54MAB", - "ts": "1699096521.602359", - "blocks": [ - { - "type": "rich_text", - "block_id": "0HfAk", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "It seems pretty extensively described, what kind of help do you need?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699096090.087359", - "parent_user_id": "U062Q95A1FG" - }, - { - "client_msg_id": "0bca5058-05bf-4ec9-a103-6e89561627f0", - "type": "message", - "text": "`io.openlineage.spark.api.OpenLineageEventHandlerFactory` if i use this how will i pass custom listener to my spark submit", - "user": "U062Q95A1FG", - "ts": "1699096573.222189", - "blocks": [ - { - "type": "rich_text", - "block_id": "MUpbi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "io.openlineage.spark.api.OpenLineageEventHandlerFactory ", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " if i use this how will i pass custom listener to my spark submit" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699096090.087359", - "parent_user_id": "U062Q95A1FG" - }, - { - "client_msg_id": "521aa664-fd00-4232-b2b3-27ed5970cd1b", - "type": "message", - "text": "I would like to know how will i customize my events using this . For example: - In \"input\" Facet i want only symlinks name i am not intereseted in anything else", - "user": "U062Q95A1FG", - "ts": "1699096645.101619", - "blocks": [ - { - "type": "rich_text", - "block_id": "o62J5", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I would like to know how will i customize my events using this . For example: - In \"input\" Facet i want only symlinks name i am not intereseted in anything else" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699096090.087359", - "parent_user_id": "U062Q95A1FG" - }, - { - "client_msg_id": "27680813-782a-438d-882f-e954856535f3", - "type": "message", - "text": "can you please provide some guidance", - "user": "U062Q95A1FG", - "ts": "1699096652.525109", - "blocks": [ - { - "type": "rich_text", - "block_id": "WuJp0", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "can you please provide some guidance" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699096090.087359", - "parent_user_id": "U062Q95A1FG" - }, - { - "client_msg_id": "4cb5c3ae-a67c-40a1-99c4-466f2c7e9574", - "type": "message", - "text": "<@U02S6F54MAB> this is the doubt i have", - "user": "U062Q95A1FG", - "ts": "1699096716.604929", - "blocks": [ - { - "type": "rich_text", - "block_id": "jY65x", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02S6F54MAB" - }, - { - "type": "text", - "text": " this is the doubt i have" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699096090.087359", - "parent_user_id": "U062Q95A1FG" - }, - { - "client_msg_id": "af8afd49-ca6c-42d7-874e-8ef0125d4899", - "type": "message", - "text": "Some one who did spark integration throw some light", - "user": "U062Q95A1FG", - "ts": "1699100245.653179", - "blocks": [ - { - "type": "rich_text", - "block_id": "ebM8j", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Some one who did spark integration throw some light" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699096090.087359", - "parent_user_id": "U062Q95A1FG" - }, - { - "client_msg_id": "d244a1d7-3318-4011-9f53-f849f0c3bbbc", - "type": "message", - "text": "it's weekend for most of us so you probably need to wait until Monday for precise answers", - "user": "U02S6F54MAB", - "ts": "1699100482.031569", - "blocks": [ - { - "type": "rich_text", - "block_id": "sIFCY", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "it's weekend for most of us so you probably need to wait until Monday for precise answers" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1699096090.087359", - "parent_user_id": "U062Q95A1FG" - } - ] - }, - { - "client_msg_id": "d300023e-b76f-488e-9f6d-9d1a7e7821c9", - "type": "message", - "text": "\nThis month’s TSC meeting (open to all) is next Thursday the 9th at 10am PT. On the agenda:\n• announcements\n• recent releases\n• recent additions to the Flink integration by <@U05QA2D1XNV> \n• recent additions to the Spark integration by <@U02MK6YNAQ5> \n• updates on proposals by <@U01DCLP0GU9> \n• discussion topics\n• open discussion\nMore info and the meeting link can be found on the . All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda.", - "user": "U02LXF3HUN7", - "ts": "1699027207.361229", - "blocks": [ - { - "type": "rich_text", - "block_id": "VnOMq", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nThis month’s TSC meeting (open to all) is next Thursday the 9th at 10am PT. On the agenda:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "announcements" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "recent releases" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "recent additions to the Flink integration by " - }, - { - "type": "user", - "user_id": "U05QA2D1XNV" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "recent additions to the Spark integration by " - }, - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "updates on proposals by " - }, - { - "type": "user", - "user_id": "U01DCLP0GU9" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "discussion topics" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "open discussion" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "More info and the meeting link can be found on the " - }, - { - "type": "link", - "url": "https://openlineage.io/meetings/", - "text": "website" - }, - { - "type": "text", - "text": ". All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.io/meetings/", - "service_icon": "https://openlineage.io/img/favicon.ico", - "id": 1, - "original_url": "https://openlineage.io/meetings/", - "fallback": "TSC Meetings | OpenLineage", - "text": "The OpenLineage Technical Steering Committee meets monthly, and is open to all.", - "title": "TSC Meetings | OpenLineage", - "title_link": "https://openlineage.io/meetings/", - "service_name": "openlineage.io" - } - ], - "reactions": [ - { - "name": "+1", - "users": [ - "U062WLFMRTP" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "240610ee-fc3d-4e71-9f70-731cd88495a6", - "type": "message", - "text": "actually, it shows up in one of the RUNNING now... behavior is consistent between 11.3 and 13.3, thanks for fixing this issue", - "user": "U05T8BJD4DU", - "ts": "1698999491.798599", - "blocks": [ - { - "type": "rich_text", - "block_id": "q6sfC", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "actually, it shows up in one of the RUNNING now... behavior is consistent between 11.3 and 13.3, thanks for fixing this issue" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698999491.798599", - "reply_count": 7, - "reply_users_count": 2, - "latest_reply": "1699471566.197189", - "reply_users": [ - "U05T8BJD4DU", - "U05TU0U224A" - ], - "is_locked": false, - "subscribed": false, - "reactions": [ - { - "name": "+1", - "users": [ - "U02MK6YNAQ5" - ], - "count": 1 - } - ], - "replies": [ - { - "client_msg_id": "464bebd8-1178-4639-88a8-8cac9334395b", - "type": "message", - "text": "<@U02MK6YNAQ5> looks like I need to bring bad news.. 13.3 is fixed for specific scenarios, but 11.3 is still reading output as dbfs.. there are scenarios that it's not producing input and output like:\n\ncreate table table using delta as\nlocation 'abfss://....'\nSelect * from parquet.`abfss://....'", - "user": "U05T8BJD4DU", - "ts": "1699127062.015709", - "blocks": [ - { - "type": "rich_text", - "block_id": "1hVmH", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " looks like I need to bring bad news.. 13.3 is fixed for specific scenarios, but 11.3 is still reading output as dbfs.. there are scenarios that it's not producing input and output like:\n\ncreate table table using delta as\nlocation 'abfss://....'\nSelect * from parquet.`abfss://....'" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698999491.798599", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "6ba957bb-7e64-43f7-be80-de4d9c568138", - "type": "message", - "text": "Will test more and ope issues", - "user": "U05T8BJD4DU", - "ts": "1699127071.990639", - "blocks": [ - { - "type": "rich_text", - "block_id": "EiQMS", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Will test more and ope issues" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698999491.798599", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "83d639e6-a07c-4463-9797-1f6e6abc8381", - "type": "message", - "text": "<@U05T8BJD4DU>how did you manage the get the environment attribute. it's not showing up to me at all. I've tried databricks abut also tried a local instance of spark.", - "user": "U05TU0U224A", - "ts": "1699266873.918299", - "blocks": [ - { - "type": "rich_text", - "block_id": "0GwWq", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U05T8BJD4DU" - }, - { - "type": "text", - "text": "how did you manage the get the environment attribute. it's not showing up to me at all. I've tried databricks abut also tried a local instance of spark." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698999491.798599", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "0c036060-f060-45fb-9e10-84a1d263d415", - "type": "message", - "text": "<@U05TU0U224A> its showing up in one of the RUNNING events, not in the START event anymore", - "user": "U05T8BJD4DU", - "ts": "1699399922.001119", - "blocks": [ - { - "type": "rich_text", - "block_id": "SQy71", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U05TU0U224A" - }, - { - "type": "text", - "text": " its showing up in one of the RUNNING events, not in the START event anymore" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698999491.798599", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "be6884cc-251f-4bed-8e54-cc990b344860", - "type": "message", - "text": "I never had a running event :melting_face: Am I filtering something?", - "user": "U05TU0U224A", - "ts": "1699430672.649399", - "blocks": [ - { - "type": "rich_text", - "block_id": "1HOGl", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I never had a running event " - }, - { - "type": "emoji", - "name": "melting_face", - "unicode": "1fae0" - }, - { - "type": "text", - "text": " Am I filtering something?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698999491.798599", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "c590f6a7-e029-4118-b6e7-66f1c6a2c4cf", - "type": "message", - "text": "Umm.. ok show me your code, will try on my end", - "user": "U05T8BJD4DU", - "ts": "1699466606.346779", - "blocks": [ - { - "type": "rich_text", - "block_id": "4l7gs", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Umm.. ok show me your code, will try on my end" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698999491.798599", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "72b08038-c170-4b56-b2dd-1db98d7434e9", - "type": "message", - "text": "<@U02MK6YNAQ5> <@U05TU0U224A> actually if you are using UC-enabled cluster, you won't get any RUNNING events", - "user": "U05T8BJD4DU", - "ts": "1699471566.197189", - "blocks": [ - { - "type": "rich_text", - "block_id": "JozE0", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " " - }, - { - "type": "user", - "user_id": "U05TU0U224A" - }, - { - "type": "text", - "text": " actually if you are using UC-enabled cluster, you won't get any RUNNING events" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698999491.798599", - "parent_user_id": "U05T8BJD4DU" - } - ] - }, - { - "type": "message", - "text": "<@U02MK6YNAQ5> I tested 1.5.0, it works great now, but the environment facets is gone in START... which I very much want it.. any thoughts?", - "files": [ - { - "id": "F063M06MWBZ", - "created": 1698950948, - "timestamp": 1698950948, - "name": "data (10).json", - "title": "data (10).json", - "mimetype": "text/plain", - "filetype": "json", - "pretty_type": "JSON", - "user": "U05T8BJD4DU", - "user_team": "T01CWUYP5AR", - "editable": true, - "size": 24081, - "mode": "snippet", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F063M06MWBZ/data__10_.json", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F063M06MWBZ/download/data__10_.json", - "permalink": "https://openlineage.slack.com/files/U05T8BJD4DU/F063M06MWBZ/data__10_.json", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F063M06MWBZ-6b6cda4177", - "edit_link": "https://openlineage.slack.com/files/U05T8BJD4DU/F063M06MWBZ/data__10_.json/edit", - "preview": "{\r\n \"eventTime\":\"2023-11-02T18:42:00.619Z\",\r\n \"producer\":\"https://github.com/OpenLineage/OpenLineage/tree/1.5.0/integration/spark\",\r\n \"schemaURL\":\"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent\",\r\n \"eventType\":\"START\",\r", - "preview_highlight": "
\n
\n
{
\n
   "eventTime":"2023-11-02T18:42:00.619Z",
\n
   "producer":"https://github.com/OpenLineage/OpenLineage/tree/1.5.0/integration/spark",
\n
   "schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent",
\n
   "eventType":"START",
\n
\n
\n", - "lines": 531, - "lines_more": 526, - "preview_is_truncated": true, - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05T8BJD4DU", - "display_as_bot": false, - "ts": "1698950958.157459", - "blocks": [ - { - "type": "rich_text", - "block_id": "v2Gft", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " I tested 1.5.0, it works great now, but the environment facets is gone in START... which I very much want it.. any thoughts?" - } - ] - } - ] - } - ], - "client_msg_id": "9b3348ec-8f79-4822-ae9c-44c87120828f" - }, - { - "client_msg_id": "dd052197-8cbb-4f3c-bf1b-d22cc8d9fb98", - "type": "message", - "text": "\nWe released OpenLineage 1.5.0, including:\n• by <@U05QA2D1XNV> \n• by <@U02MK6YNAQ5> \n• `rdd``toDF` by <@U02MK6YNAQ5> \n• by <@U02S6F54MAB> \n• by <@U02S6F54MAB> \n• bug fixes, tests, infra fixes, doc changes, and more.\nThanks to all the contributors, including new contributor <@U05VDHJJ9T7>!\n*Release:* \n*Changelog:* \n*Commit history:* \n*Maven:* \n*PyPI:* ", - "user": "U02LXF3HUN7", - "ts": "1698940800.306129", - "blocks": [ - { - "type": "rich_text", - "block_id": "Me1wK", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nWe released OpenLineage 1.5.0, including:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2175", - "text": "support for Cassandra Connectors lineage in the Flink integration" - }, - { - "type": "text", - "text": " by " - }, - { - "type": "user", - "user_id": "U05QA2D1XNV" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2185", - "text": "support for Databricks Runtime 13.3 in the Spark integration" - }, - { - "type": "text", - "text": " by " - }, - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2188", - "text": "support for " - }, - { - "type": "text", - "text": "rdd", - "style": { - "code": true - } - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2188", - "text": " and " - }, - { - "type": "text", - "text": "toDF", - "style": { - "code": true - } - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2188", - "text": " operations from the Spark Scala API in Spark" - }, - { - "type": "text", - "text": " by " - }, - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2107", - "text": "lowered requirements for attrs and requests packages in the Airflow integration" - }, - { - "type": "text", - "text": " by " - }, - { - "type": "user", - "user_id": "U02S6F54MAB" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2221", - "text": "lazy rendering of yaml configs in the dbt integration" - }, - { - "type": "text", - "text": " by " - }, - { - "type": "user", - "user_id": "U02S6F54MAB" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "bug fixes, tests, infra fixes, doc changes, and more." - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks to all the contributors, including new contributor " - }, - { - "type": "user", - "user_id": "U05VDHJJ9T7" - }, - { - "type": "text", - "text": "!\n" - }, - { - "type": "text", - "text": "Release:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/releases/tag/1.5.0" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Changelog: ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Commit history:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/compare/1.4.1...1.5.0" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Maven:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://oss.sonatype.org/#nexus-search;quick~openlineage" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "PyPI:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://pypi.org/project/openlineage-python/" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "reactions": [ - { - "name": "+1", - "users": [ - "U05T8BJD4DU", - "U05VDHJJ9T7", - "U053LCT71BQ", - "U01HVNU6A4C", - "U05SMTVPPL3" - ], - "count": 5 - }, - { - "name": "rocket", - "users": [ - "U055N2GRT4P" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "d52119da-eee6-494c-a28e-aab20a0f55a5", - "type": "message", - "text": "I am looking to send OpenLineage events to an AWS API Gateway endpoint from an AWS MWAA instance. The problem is that all requests to AWS services need to be signed with SigV4, and using API Gateway with IAM authentication would require requests to API Gateway be signed with SigV4. Would the best way to do so be to just modify the python client HTTP transport to include a new config option for signing emitted OpenLineage events with SigV4? Are there any alternatives?", - "user": "U063YP6UJJ0", - "ts": "1698885038.172079", - "blocks": [ - { - "type": "rich_text", - "block_id": "uukJK", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I am looking to send OpenLineage events to an AWS API Gateway endpoint from an AWS MWAA instance. The problem is that all requests to AWS services need to be signed with SigV4, and using API Gateway with IAM authentication would require requests to API Gateway be signed with SigV4. Would the best way to do so be to just modify the python client HTTP transport to include a new config option for signing emitted OpenLineage events with SigV4? Are there any alternatives?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U063YP6UJJ0", - "ts": "1698885095.000000" - }, - "thread_ts": "1698885038.172079", - "reply_count": 7, - "reply_users_count": 2, - "latest_reply": "1699030375.287509", - "reply_users": [ - "U02S6F54MAB", - "U063YP6UJJ0" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "cac14e88-5767-4eee-81fc-7978de1c6b8e", - "type": "message", - "text": "there’s actually an issue for that:\n\n\nbut the way to do this is imho to create new custom transport (it might inherit from HTTP transport) and register it in transport factory", - "user": "U02S6F54MAB", - "ts": "1698907310.848799", - "blocks": [ - { - "type": "rich_text", - "block_id": "xfpF9", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "there’s actually an issue for that:\n" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/2189" - }, - { - "type": "text", - "text": "\n\nbut the way to do this is imho to create new custom transport (it might inherit from HTTP transport) and register it in transport factory" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698885038.172079", - "parent_user_id": "U063YP6UJJ0" - }, - { - "client_msg_id": "0d988f4f-3290-485d-ae49-627495942fee", - "type": "message", - "text": "I am thinking of just modifying the HTTP transport and using requests.auth.AuthBase to create different auth methods instead of a TokenProvider class\n\nClasses which subclass requests.auth.AuthBase can also just directly be given to the requests call in the auth parameter", - "user": "U063YP6UJJ0", - "ts": "1698944705.537979", - "blocks": [ - { - "type": "rich_text", - "block_id": "IxFOZ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I am thinking of just modifying the HTTP transport and using requests.auth.AuthBase to create different auth methods instead of a TokenProvider class\n\nClasses which subclass requests.auth.AuthBase can also just directly be given to the requests call in the auth parameter" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U063YP6UJJ0", - "ts": "1698944761.000000" - }, - "thread_ts": "1698885038.172079", - "parent_user_id": "U063YP6UJJ0", - "reactions": [ - { - "name": "+1", - "users": [ - "U02S6F54MAB" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "795deede-6d64-4abb-98e4-0c703f2bbeba", - "type": "message", - "text": "would you like to contribute? :slightly_smiling_face:", - "user": "U02S6F54MAB", - "ts": "1698950424.250389", - "blocks": [ - { - "type": "rich_text", - "block_id": "PVFJT", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "would you like to contribute? " - }, - { - "type": "emoji", - "name": "slightly_smiling_face", - "unicode": "1f642" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698885038.172079", - "parent_user_id": "U063YP6UJJ0" - }, - { - "client_msg_id": "29e7aa89-d809-4605-a27d-e56951ddbef4", - "type": "message", - "text": "I was about to contribute, but I actually just realized that there is an existing way to provide a custom transport that would solve form y use case. My only question is how do I register this custom transport in my MWAA environment? Can I provide the custom transport as an Airflow plugin and then specify the class in the Openlineage.yml config? Will it automatically pick it up?", - "user": "U063YP6UJJ0", - "ts": "1698950585.337849", - "blocks": [ - { - "type": "rich_text", - "block_id": "zxE+U", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I was about to contribute, but I actually just realized that there is an existing way to provide a custom transport that would solve form y use case. My only question is how do I register this custom transport in my MWAA environment? Can I provide the custom transport as an Airflow plugin and then specify the class in the Openlineage.yml config? Will it automatically pick it up?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698885038.172079", - "parent_user_id": "U063YP6UJJ0" - }, - { - "client_msg_id": "67a9dc7f-69c9-4e38-92d6-221ed3e5b652", - "type": "message", - "text": "although I did not test this in MWAA but locally only: I’ve created Airflow plugin that in `__init__.py` has defined (or imported) following code:\n```from openlineage.client.transport import register_transport, Transport, Config\n\n\n@register_transport\nclass FakeTransport(Transport):\n kind = \"fake\"\n config = Config\n\n def __init__(self, config: Config) -> None:\n print(config)\n\n def emit(self, event) -> None:\n print(event)```\nsetting `AIRFLOW__OPENLINEAGE__TRANSPORT='{\"type\": \"fake\"}'` does take effect and I can see output in Airflow logs", - "user": "U02S6F54MAB", - "ts": "1698954356.104239", - "blocks": [ - { - "type": "rich_text", - "block_id": "a3/UA", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "although I did not test this in MWAA but locally only: I’ve created Airflow plugin that in " - }, - { - "type": "text", - "text": "__init__.py", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " has defined (or imported) following code:\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "from openlineage.client.transport import register_transport, Transport, Config\n\n\n@register_transport\nclass FakeTransport(Transport):\n kind = \"fake\"\n config = Config\n\n def __init__(self, config: Config) -> None:\n print(config)\n\n def emit(self, event) -> None:\n print(event)" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "setting " - }, - { - "type": "text", - "text": "AIRFLOW__OPENLINEAGE__TRANSPORT='{\"type\": \"fake\"}'", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " does take effect and I can see output in Airflow logs" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02S6F54MAB", - "ts": "1698954364.000000" - }, - "thread_ts": "1698885038.172079", - "parent_user_id": "U063YP6UJJ0" - }, - { - "client_msg_id": "526277f0-8c1d-462a-a361-b66d1e4baa53", - "type": "message", - "text": "in `setup.py` it’s:\n``` ...,\n entry_points={\n 'airflow.plugins': [\n 'custom_transport = custom_transport:CustomTransportPlugin',\n ],\n },\n install_requires=[\"openlineage-python\"]\n)```", - "user": "U02S6F54MAB", - "ts": "1698954465.672369", - "blocks": [ - { - "type": "rich_text", - "block_id": "Bv4qn", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "in " - }, - { - "type": "text", - "text": "setup.py", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " it’s:\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": " ...,\n entry_points={\n 'airflow.plugins': [\n 'custom_transport = custom_transport:CustomTransportPlugin',\n ],\n },\n install_requires=[\"openlineage-python\"]\n)" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698885038.172079", - "parent_user_id": "U063YP6UJJ0" - }, - { - "client_msg_id": "d97a0613-2ab0-423c-81ba-429b592d5351", - "type": "message", - "text": "ok great thanks for following up on this, super helpful", - "user": "U063YP6UJJ0", - "ts": "1699030375.287509", - "blocks": [ - { - "type": "rich_text", - "block_id": "Gsemv", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "ok great thanks for following up on this, super helpful" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698885038.172079", - "parent_user_id": "U063YP6UJJ0" - } - ] - }, - { - "client_msg_id": "6f5d63b5-c671-48b3-85df-b49bd907adc0", - "type": "message", - "text": "Hi team :wave: , we’re finding that for our Spark jobs we are almost always getting some junk characters in our dataset names. We’ve pushed the regex filter to its limits and would like to extend the logic of deriving the dataset name in openlineage-spark (currently on `1.4.1`). I seem to recall hearing we could do this by implementing our own `LogicalPlanVisitor` or something along those lines? Is that still the recommended approach and if so would this be possible to implement in Scala vs. Java (scala noob here :simple_smile:)", - "user": "U04AZ7992SU", - "ts": "1698882039.335099", - "blocks": [ - { - "type": "rich_text", - "block_id": "pGDVg", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi team " - }, - { - "type": "emoji", - "name": "wave", - "unicode": "1f44b" - }, - { - "type": "text", - "text": " , we’re finding that for our Spark jobs we are almost always getting some junk characters in our dataset names. We’ve pushed the regex filter to its limits and would like to extend the logic of deriving the dataset name in openlineage-spark (currently on " - }, - { - "type": "text", - "text": "1.4.1", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "). I seem to recall hearing we could do this by implementing our own " - }, - { - "type": "text", - "text": "LogicalPlanVisitor", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " or something along those lines? Is that still the recommended approach and if so would this be possible to implement in Scala vs. Java (scala noob here " - }, - { - "type": "emoji", - "name": "simple_smile" - }, - { - "type": "text", - "text": ")" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698882039.335099", - "reply_count": 2, - "reply_users_count": 1, - "latest_reply": "1698910487.005589", - "reply_users": [ - "U02MK6YNAQ5" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "a2380d59-d730-4d2c-95c7-f79c66e609b0", - "type": "message", - "text": "Hi John, we're always happy to help with the contribution.\n\nOne of the possible solutions to this would be to do that just in `openlineage-java` client:\n• introduce config entry like `normalizeDatasetNameToAscii` : `enabled/disabled`\n• modify `DatasetIdentifier` class to contain static member `boolean normalizeDatasetNameToAscii` and normalize dataset name according to this setting\n• additionally, you would need to add config entry in `io.openlineage.client.OpenLineageYaml` and make sure both `loadOpenLineageYaml` methods set `DatasetIdentifier.normalizeDatasetNameToAscii` based on the config\n• document this in the doc\nSo, no Scala nor custom logical plan visitors required.", - "user": "U02MK6YNAQ5", - "ts": "1698910455.477209", - "blocks": [ - { - "type": "rich_text", - "block_id": "gWpRf", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi John, we're always happy to help with the contribution.\n\nOne of the possible solutions to this would be to do that just in " - }, - { - "type": "text", - "text": "openlineage-java", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " client:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "introduce config entry like " - }, - { - "type": "text", - "text": "normalizeDatasetNameToAscii", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " : " - }, - { - "type": "text", - "text": "enabled/disabled", - "style": { - "code": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "modify " - }, - { - "type": "text", - "text": "DatasetIdentifier", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " class to contain static member " - }, - { - "type": "text", - "text": "boolean normalizeDatasetNameToAscii", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " and normalize dataset name according to this setting" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "additionally, you would need to add config entry in " - }, - { - "type": "text", - "text": "io.openlineage.client.OpenLineageYaml", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " and make sure both " - }, - { - "type": "text", - "text": "loadOpenLineageYaml", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " methods set " - }, - { - "type": "text", - "text": "DatasetIdentifier.normalizeDatasetNameToAscii", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " based on the config" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "document this in the doc" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "\nSo, no Scala nor custom logical plan visitors required." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698882039.335099", - "parent_user_id": "U04AZ7992SU" - }, - { - "client_msg_id": "08076597-aebc-44fb-bbaa-bc9d3b22e344", - "type": "message", - "text": "", - "user": "U02MK6YNAQ5", - "ts": "1698910487.005589", - "blocks": [ - { - "type": "rich_text", - "block_id": "Cr6n9", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/client/java/src/main/java/io/openlineage/client/utils/DatasetIdentifier.java" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/main/client/java/src/main/java/io/openlineage/client/utils/DatasetIdentifier.java", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\n/*\n/* Copyright 2018-2023 contributors to the OpenLineage project\n/* SPDX-License-Identifier: Apache-2.0\n*/\npackage io.openlineage.client.utils;\n\nimport java.util.LinkedList;\nimport java.util.List;\nimport lombok.Value;\n\n@Value\npublic class DatasetIdentifier {\n String name;\n String namespace;\n List symlinks;\n\n public enum SymlinkType {\n TABLE\n };\n\n public DatasetIdentifier(String name, String namespace) {\n this.name = name;\n this.namespace = namespace;\n this.symlinks = new LinkedList<>();\n }\n\n public DatasetIdentifier withSymlink(String name, String namespace, SymlinkType type) {\n symlinks.add(new Symlink(name, namespace, type));\n return this;\n }\n\n public DatasetIdentifier withSymlink(Symlink symlink) {\n symlinks.add(symlink);\n return this;\n }\n\n @Value\n public static class Symlink {\n String name;\n String namespace;\n SymlinkType type;\n }\n}\n\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1698882039.335099", - "parent_user_id": "U04AZ7992SU", - "reactions": [ - { - "name": "raised_hands", - "users": [ - "U04AZ7992SU" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "fac0c1bd-2166-44fe-92a9-70fb0ae9fb93", - "type": "message", - "text": "\nThe October 2023 issue of is available now! to get in directly in your inbox each month.", - "user": "U02LXF3HUN7", - "ts": "1698859749.531699", - "blocks": [ - { - "type": "rich_text", - "block_id": "n//9G", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nThe October 2023 issue of " - }, - { - "type": "link", - "url": "https://mailchi.mp/cea829d27acd/openlineage-news-july-9597657?e=ef0563a7f8", - "text": "OpenLineage News" - }, - { - "type": "text", - "text": " is available now! " - }, - { - "type": "link", - "url": "https://openlineage.us14.list-manage.com/track/click?u=fe7ef7a8dbb32933f30a10466&id=123767f606&e=ef0563a7f8", - "text": "Sign up" - }, - { - "type": "text", - "text": " to get in directly in your inbox each month." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.us14.list-manage.com/track/click?u=fe7ef7a8dbb32933f30a10466&id=123767f606&e=ef0563a7f8", - "id": 1, - "original_url": "https://openlineage.us14.list-manage.com/track/click?u=fe7ef7a8dbb32933f30a10466&id=123767f606&e=ef0563a7f8", - "fallback": "OpenLineage Project", - "text": "OpenLineage Project Email Forms", - "title": "OpenLineage Project", - "title_link": "https://openlineage.us14.list-manage.com/track/click?u=fe7ef7a8dbb32933f30a10466&id=123767f606&e=ef0563a7f8", - "service_name": "apache.us14.list-manage.com" - } - ], - "reactions": [ - { - "name": "+1", - "users": [ - "U01HVNU6A4C", - "U062WLFMRTP" - ], - "count": 2 - }, - { - "name": "tada", - "users": [ - "U055N2GRT4P" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "8f19ced9-362d-47f7-837d-a47df88bbcae", - "type": "message", - "text": "\nI’m opening a vote to release OpenLineage 1.5.0, including:\n• support for Cassandra Connectors lineage in the Flink integration\n• support for Databricks Runtime 13.3 in the Spark integration\n• support for `rdd` and `toDF` operations from the Spark Scala API in Spark\n• lowered requirements for attrs and requests packages in the Airflow integration\n• lazy rendering of yaml configs in the dbt integration\n• bug fixes, tests, infra fixes, doc changes, and more.\nThree +1s from committers will authorize an immediate release.", - "user": "U02LXF3HUN7", - "ts": "1698852883.658009", - "blocks": [ - { - "type": "rich_text", - "block_id": "dgGRj", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nI’m opening a vote to release OpenLineage 1.5.0, including:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "support for Cassandra Connectors lineage in the Flink integration" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "support for Databricks Runtime 13.3 in the Spark integration" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "support for " - }, - { - "type": "text", - "text": "rdd", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " and " - }, - { - "type": "text", - "text": "toDF", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " operations from the Spark Scala API in Spark" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "lowered requirements for attrs and requests packages in the Airflow integration" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "lazy rendering of yaml configs in the dbt integration" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "bug fixes, tests, infra fixes, doc changes, and more." - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Three +1s from committers will authorize an immediate release." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698852883.658009", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1698916318.596059", - "reply_users": [ - "U02LXF3HUN7" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1698916318.596059", - "reactions": [ - { - "name": "heavy_plus_sign", - "users": [ - "U02S6F54MAB", - "U028NM4NR1Q", - "U05HBLE7YPL", - "U01DCMDFHBK", - "U02MK6YNAQ5", - "U01DCLP0GU9" - ], - "count": 6 - }, - { - "name": "+1", - "users": [ - "U05T8BJD4DU" - ], - "count": 1 - }, - { - "name": "rocket", - "users": [ - "U032GGJESD6", - "U055N2GRT4P" - ], - "count": 2 - } - ], - "replies": [ - { - "client_msg_id": "7aaef1f1-63eb-4a4d-8eaf-4e80d905ad4d", - "type": "message", - "text": "Thanks, all. The release is authorized and will be initiated within 2 business days.", - "user": "U02LXF3HUN7", - "ts": "1698916318.596059", - "blocks": [ - { - "type": "rich_text", - "block_id": "mCijO", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks, all. The release is authorized and will be initiated within 2 business days." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698852883.658009", - "parent_user_id": "U02LXF3HUN7" - } - ] - }, - { - "type": "message", - "subtype": "thread_broadcast", - "text": "one question if someone is around - when im keeping both `openlineage-airflow` and `apache-airflow-providers-openlineage` in my requirement file, i see the following error -\n``` from openlineage.airflow.extractors import Extractors\nModuleNotFoundError: No module named 'openlineage.airflow'```\nany thoughts?", - "user": "U062WLFMRTP", - "ts": "1698778838.540239", - "thread_ts": "1698340358.557159", - "root": { - "type": "message", - "text": "Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated", - "files": [ - { - "id": "F062ZFJN2UB", - "created": 1698340299, - "timestamp": 1698340299, - "name": "Screenshot 2023-10-26 at 10.11.34 AM.png", - "title": "Screenshot 2023-10-26 at 10.11.34 AM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U062WLFMRTP", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 356434, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F062ZFJN2UB/screenshot_2023-10-26_at_10.11.34_am.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F062ZFJN2UB/download/screenshot_2023-10-26_at_10.11.34_am.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_360.png", - "thumb_360_w": 360, - "thumb_360_h": 222, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_480.png", - "thumb_480_w": 480, - "thumb_480_h": 297, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_720.png", - "thumb_720_w": 720, - "thumb_720_h": 445, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_800.png", - "thumb_800_w": 800, - "thumb_800_h": 494, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_960.png", - "thumb_960_w": 960, - "thumb_960_h": 593, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 633, - "original_w": 1756, - "original_h": 1085, - "thumb_tiny": "AwAdADC8etKKQ9acKADFLRRQAUUUUAMPWnCm96UUAOopM+1GaAFopM0ZoA//2Q==", - "permalink": "https://openlineage.slack.com/files/U062WLFMRTP/F062ZFJN2UB/screenshot_2023-10-26_at_10.11.34_am.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F062ZFJN2UB-c8de1a91b2", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U062WLFMRTP", - "display_as_bot": false, - "ts": "1698340358.557159", - "blocks": [ - { - "type": "rich_text", - "block_id": "CRXyh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated" - } - ] - } - ] - } - ], - "client_msg_id": "925bbcea-663c-4480-8809-2e2b3dd06020", - "thread_ts": "1698340358.557159", - "reply_count": 39, - "reply_users_count": 4, - "latest_reply": "1698786461.272129", - "reply_users": [ - "U062WLFMRTP", - "U02S6F54MAB", - "U01RA9B5GG2", - "U04AZ7992SU" - ], - "is_locked": false, - "subscribed": false - }, - "blocks": [ - { - "type": "rich_text", - "block_id": "7zvcN", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "one question if someone is around - when im keeping both " - }, - { - "type": "text", - "text": "openlineage-airflow", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " and " - }, - { - "type": "text", - "text": "apache-airflow-providers-openlineage", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " in my requirement file, i see the following error -\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": " from openlineage.airflow.extractors import Extractors\nModuleNotFoundError: No module named 'openlineage.airflow'" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "any thoughts?" - } - ] - } - ] - } - ], - "client_msg_id": "84068da9-05c1-43a6-97a3-cb24d71c5832" - }, - { - "client_msg_id": "52f51233-521c-4a5d-a0c1-e6677707987c", - "type": "message", - "text": ":wave: Hi team, cross-posting from the Marquez Channel in case anyone here has a better idea of the spec\n\n> For most of our lineage extractors in airflow, we are using the rust sql parser from openlineage-sql to extract table lineage via sql statements. When errors occur we are adding an `extractionError` run facet similar to what is being done . I’m finding in the case that multiple statements were extracted but one failed to parse while many others were successful, the lineage for these runs doesn’t appear as expected in Marquez. Is there any logic around the `extractionError` run facet that could be causing this? It seems reasonable to assume that we might take this to mean the entire run event is invalid if we have any extraction errors. \n> \n> I would still expect to see the other lineage we sent for the run but am instead just seeing the `extractionError` in the marquez UI, in the database, runs with an `extractionError` facet don’t seem to make it to the `job_versions_io_mapping` table", - "user": "U04AZ7992SU", - "ts": "1698706303.956579", - "blocks": [ - { - "type": "rich_text", - "block_id": "532QI", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "emoji", - "name": "wave", - "unicode": "1f44b" - }, - { - "type": "text", - "text": " Hi team, cross-posting from the Marquez Channel in case anyone here has a better idea of the spec\n\n" - } - ] - }, - { - "type": "rich_text_quote", - "elements": [ - { - "type": "text", - "text": "For most of our lineage extractors in airflow, we are using the rust sql parser from openlineage-sql to extract table lineage via sql statements. When errors occur we are adding an " - }, - { - "type": "text", - "text": "extractionError", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " run facet similar to what is being done " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/openlineage/airflow/extractors/sql_extractor.py#L75-L89", - "text": "here" - }, - { - "type": "text", - "text": ". I’m finding in the case that multiple statements were extracted but one failed to parse while many others were successful, the lineage for these runs doesn’t appear as expected in Marquez. Is there any logic around the " - }, - { - "type": "text", - "text": "extractionError", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " run facet that could be causing this? It seems reasonable to assume that we might take this to mean the entire run event is invalid if we have any extraction errors. \n\nI would still expect to see the other lineage we sent for the run but am instead just seeing the " - }, - { - "type": "text", - "text": "extractionError", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " in the marquez UI, in the database, runs with an " - }, - { - "type": "text", - "text": "extractionError", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " facet don’t seem to make it to the " - }, - { - "type": "text", - "text": "job_versions_io_mapping", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " table" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/airflow/openlineage/airflow/extractors/sql_extractor.py#L75-L89", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\n if sql_meta.errors:\n run_facets['extractionError'] = ExtractionErrorRunFacet(\n totalTasks=len(self.operator.sql) if isinstance(self.operator.sql, list) else 1,\n failedTasks=len(sql_meta.errors),\n errors=[\n ExtractionError(\n errorMessage=error.message,\n stackTrace=None,\n task=error.origin_statement,\n taskNumber=error.index\n )\n for error\n in sql_meta.errors\n ]\n )\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1698706303.956579", - "reply_count": 11, - "reply_users_count": 3, - "latest_reply": "1698881826.588819", - "reply_users": [ - "U01RA9B5GG2", - "U05CAULTYG2", - "U04AZ7992SU" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "f364fc7b-1963-4ec3-8356-bd3952479e4f", - "type": "message", - "text": "Can you show the actual event? Should be in the events tab in Marquez", - "user": "U01RA9B5GG2", - "ts": "1698748445.166899", - "blocks": [ - { - "type": "rich_text", - "block_id": "6GDYD", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Can you show the actual event? Should be in the events tab in Marquez" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698706303.956579", - "parent_user_id": "U04AZ7992SU" - }, - { - "client_msg_id": "78ff6b9b-79ea-4eb0-9878-e2d3a468bdb2", - "type": "message", - "text": "<@U04AZ7992SU>, would you mind posting the link to Marquez teams slack channel?", - "user": "U05CAULTYG2", - "ts": "1698767947.720639", - "blocks": [ - { - "type": "rich_text", - "block_id": "uFKlN", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U04AZ7992SU" - }, - { - "type": "text", - "text": ", would you mind posting the link to Marquez teams slack channel?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698706303.956579", - "parent_user_id": "U04AZ7992SU" - }, - { - "client_msg_id": "224705df-657a-4c1f-a73a-cdfe83bda101", - "type": "message", - "text": "yep here is the link: \n\nThis is the full event, sanitized of internal info:\n```{\n \"job\": {\n \"name\": \"some_dag.some_task\",\n \"facets\": {},\n \"namespace\": \"default\"\n },\n \"run\": {\n \"runId\": \"a9565df2-f1a1-3ee3-b202-7626f8c4b92d\",\n \"facets\": {\n \"extractionError\": {\n \"errors\": [\n {\n \"task\": \"ALTER SESSION UNSET QUERY_TAG;\",\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"taskNumber\": 0,\n \"errorMessage\": \"Expected one of TABLE or INDEX, found: SESSION\"\n }\n ],\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"totalTasks\": 1,\n \"failedTasks\": 1\n }\n }\n },\n \"inputs\": [\n {\n \"name\": \"foo.bar\",\n \"facets\": {},\n \"namespace\": \"snowflake\"\n },\n {\n \"name\": \"fizz.buzz\",\n \"facets\": {},\n \"namespace\": \"snowflake\"\n }\n ],\n \"outputs\": [\n { \"name\": \"foo1.bar2\", \"facets\": {}, \"namespace\": \"snowflake\" },\n {\n \"name\": \"fizz1.buzz2\",\n \"facets\": {},\n \"namespace\": \"snowflake\"\n }\n ],\n \"producer\": \"\",\n \"eventTime\": \"2023-10-30T02:46:13.367274Z\",\n \"eventType\": \"COMPLETE\"\n}```", - "user": "U04AZ7992SU", - "ts": "1698768937.061429", - "blocks": [ - { - "type": "rich_text", - "block_id": "xChGR", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yep here is the link: " - }, - { - "type": "link", - "url": "https://marquezproject.slack.com/archives/C01E8MQGJP7/p1698702140709439" - }, - { - "type": "text", - "text": "\n\nThis is the full event, sanitized of internal info:\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "{\n \"job\": {\n \"name\": \"some_dag.some_task\",\n \"facets\": {},\n \"namespace\": \"default\"\n },\n \"run\": {\n \"runId\": \"a9565df2-f1a1-3ee3-b202-7626f8c4b92d\",\n \"facets\": {\n \"extractionError\": {\n \"errors\": [\n {\n \"task\": \"ALTER SESSION UNSET QUERY_TAG;\",\n \"_producer\": \"" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/0.24.0/client/python" - }, - { - "type": "text", - "text": "\",\n \"_schemaURL\": \"" - }, - { - "type": "link", - "url": "https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet" - }, - { - "type": "text", - "text": "\",\n \"taskNumber\": 0,\n \"errorMessage\": \"Expected one of TABLE or INDEX, found: SESSION\"\n }\n ],\n \"_producer\": \"" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/0.24.0/client/python" - }, - { - "type": "text", - "text": "\",\n \"_schemaURL\": \"" - }, - { - "type": "link", - "url": "https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ExtractionErrorRunFacet" - }, - { - "type": "text", - "text": "\",\n \"totalTasks\": 1,\n \"failedTasks\": 1\n }\n }\n },\n \"inputs\": [\n {\n \"name\": \"foo.bar\",\n \"facets\": {},\n \"namespace\": \"snowflake\"\n },\n {\n \"name\": \"fizz.buzz\",\n \"facets\": {},\n \"namespace\": \"snowflake\"\n }\n ],\n \"outputs\": [\n { \"name\": \"foo1.bar2\", \"facets\": {}, \"namespace\": \"snowflake\" },\n {\n \"name\": \"fizz1.buzz2\",\n \"facets\": {},\n \"namespace\": \"snowflake\"\n }\n ],\n \"producer\": \"" - }, - { - "type": "link", - "url": "https://github.com/MyCompany/repo/blob/next-master/company/data/pipelines/airflow_utils/openlineage_utils/client.py" - }, - { - "type": "text", - "text": "\",\n \"eventTime\": \"2023-10-30T02:46:13.367274Z\",\n \"eventType\": \"COMPLETE\"\n}" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698706303.956579", - "parent_user_id": "U04AZ7992SU" - }, - { - "client_msg_id": "dd5e58c1-61aa-43ce-b90b-3677f05f8a77", - "type": "message", - "text": "thank you!", - "user": "U05CAULTYG2", - "ts": "1698770587.611649", - "blocks": [ - { - "type": "rich_text", - "block_id": "afEI5", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "thank you!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698706303.956579", - "parent_user_id": "U04AZ7992SU" - }, - { - "client_msg_id": "dc2ce936-1acb-4730-aeb9-bcc5bb3ae25a", - "type": "message", - "text": "<@U04AZ7992SU>, sorry to trouble again, is the slack channel still active? for whatever reason i cant get to this workspace", - "user": "U05CAULTYG2", - "ts": "1698772469.402859", - "blocks": [ - { - "type": "rich_text", - "block_id": "77tAM", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U04AZ7992SU" - }, - { - "type": "text", - "text": ", sorry to trouble again, is the slack channel still active? for whatever reason i cant get to this workspace" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05CAULTYG2", - "ts": "1698777868.000000" - }, - "thread_ts": "1698706303.956579", - "parent_user_id": "U04AZ7992SU" - }, - { - "client_msg_id": "d5e0beb7-981d-470f-8b61-8c64f2ca47dc", - "type": "message", - "text": "yep it’s still active, maybe you need to join the workspace first? ", - "user": "U04AZ7992SU", - "ts": "1698772526.341899", - "blocks": [ - { - "type": "rich_text", - "block_id": "Ckj1E", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yep it’s still active, maybe you need to join the workspace first? " - }, - { - "type": "link", - "url": "https://join.slack.com/t/marquezproject/shared_invite/zt-266fdhg9g-TE7e0p~EHK50GJMMqNH4tg" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698706303.956579", - "parent_user_id": "U04AZ7992SU" - }, - { - "client_msg_id": "f612b95d-4ddb-4a3d-9a37-b69951d280e6", - "type": "message", - "text": "that was a good call. the link you just shared worked! thank you!", - "user": "U05CAULTYG2", - "ts": "1698773151.879409", - "blocks": [ - { - "type": "rich_text", - "block_id": "mD/jk", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "that was a good call. the link you just shared worked! thank you!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698706303.956579", - "parent_user_id": "U04AZ7992SU" - }, - { - "client_msg_id": "2607d391-2a1a-4769-b713-70637f34577c", - "type": "message", - "text": "yeah from OL perspective this looks good - the inputs and outputs are there, the extraction error facet looks like it should", - "user": "U01RA9B5GG2", - "ts": "1698773275.821549", - "blocks": [ - { - "type": "rich_text", - "block_id": "Akb2+", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yeah from OL perspective this looks good - the inputs and outputs are there, the extraction error facet looks like it should" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698706303.956579", - "parent_user_id": "U04AZ7992SU" - }, - { - "client_msg_id": "147fd829-ef27-4f73-8771-44e60e41ba70", - "type": "message", - "text": "must be some Marquez hiccup :slightly_smiling_face:", - "user": "U01RA9B5GG2", - "ts": "1698773285.952189", - "blocks": [ - { - "type": "rich_text", - "block_id": "qvUWQ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "must be some Marquez hiccup " - }, - { - "type": "emoji", - "name": "slightly_smiling_face", - "unicode": "1f642" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698706303.956579", - "parent_user_id": "U04AZ7992SU", - "reactions": [ - { - "name": "+1", - "users": [ - "U04AZ7992SU" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "a67feb72-4b89-48b2-9e82-067682c3cca1", - "type": "message", - "text": "Makes sense, I’ll tail my marquez logs today to see if I can find anything", - "user": "U04AZ7992SU", - "ts": "1698773325.668029", - "blocks": [ - { - "type": "rich_text", - "block_id": "7PgFA", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Makes sense, I’ll tail my marquez logs today to see if I can find anything" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698706303.956579", - "parent_user_id": "U04AZ7992SU" - }, - { - "client_msg_id": "423e464e-4fab-4f39-ad13-22b8ab8068c0", - "type": "message", - "text": "Somehow this started working after we switched from our beta to prod infrastructure. I suspect something was failing due to constraints on the size of our db and the load of poor quality data it was under after months of testing against it", - "user": "U04AZ7992SU", - "ts": "1698881826.588819", - "blocks": [ - { - "type": "rich_text", - "block_id": "wvBxx", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Somehow this started working after we switched from our beta to prod infrastructure. I suspect something was failing due to constraints on the size of our db and the load of poor quality data it was under after months of testing against it" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698706303.956579", - "parent_user_id": "U04AZ7992SU" - } - ] - }, - { - "client_msg_id": "80b66ffa-7be5-4f30-9bf2-d5c6af7abfde", - "type": "message", - "text": "I realize in Spark 3.4+, some job ids don't have a start event. What part of the code is responsible for triggering the START and COMPLETE event", - "user": "U05T8BJD4DU", - "ts": "1698563188.319939", - "blocks": [ - { - "type": "rich_text", - "block_id": "LpAAj", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I realize in Spark 3.4+, some job ids don't have a start event. What part of the code is responsible for triggering the START and COMPLETE event" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05T8BJD4DU", - "ts": "1698563291.000000" - }, - "thread_ts": "1698563188.319939", - "reply_count": 2, - "reply_users_count": 2, - "latest_reply": "1698699115.599449", - "reply_users": [ - "U02MK6YNAQ5", - "U05T8BJD4DU" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "d78c6581-7c33-43e0-adc2-05ff7e3d131d", - "type": "message", - "text": "hi <@U05T8BJD4DU> could you provide an example of such a job?", - "user": "U02MK6YNAQ5", - "ts": "1698674393.317219", - "blocks": [ - { - "type": "rich_text", - "block_id": "bd1pS", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "hi " - }, - { - "type": "user", - "user_id": "U05T8BJD4DU" - }, - { - "type": "text", - "text": " could you provide an example of such a job?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698563188.319939", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "1d4ef7b2-bcdc-49f4-ac5a-df4a25d4251a", - "type": "message", - "text": "<@U02MK6YNAQ5> same old:\n\n# delete the old table if needed\n_ = spark.sql('DROP TABLE IF EXISTS transactions')\n\n# expected structure of the file\ntransactions_schema = StructType([\n StructField('household_id', IntegerType()),\n StructField('basket_id', LongType()),\n StructField('day', IntegerType()),\n StructField('product_id', IntegerType()),\n StructField('quantity', IntegerType()),\n StructField('sales_amount', FloatType()),\n StructField('store_id', IntegerType()),\n StructField('discount_amount', FloatType()),\n StructField('transaction_time', IntegerType()),\n StructField('week_no', IntegerType()),\n StructField('coupon_discount', FloatType()),\n StructField('coupon_discount_match', FloatType())\n ])\n\n# read data to dataframe\ndf = (spark\n .read\n .csv(\n adlsRootPath + '/examples/data/csv/completejourney/transaction_data.csv',\n header=True,\n schema=transactions_schema))\n\ndf.write\\\n .format('delta')\\\n .mode('overwrite')\\\n .option('overwriteSchema', 'true')\\\n .option('path', adlsRootPath + '/examples/data/csv/completejourney/silver/transactions')\\\n .saveAsTable('transactions')\n\ndf.count()\n\n# # create table object to make delta lake queryable\n# _ = spark.sql(f'''\n# CREATE TABLE transactions\n# USING DELTA\n# LOCATION '{adlsRootPath}/examples/data/csv/completejourney/silver/transactions'\n# ''')\n\n# show data\ndisplay(\n spark.table('transactions')\n )", - "user": "U05T8BJD4DU", - "ts": "1698699115.599449", - "blocks": [ - { - "type": "rich_text", - "block_id": "2gImX", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " same old:\n\n# delete the old table if needed\n_ = spark.sql('DROP TABLE IF EXISTS transactions')\n\n# expected structure of the file\ntransactions_schema = StructType([\n StructField('household_id', IntegerType()),\n StructField('basket_id', LongType()),\n StructField('day', IntegerType()),\n StructField('product_id', IntegerType()),\n StructField('quantity', IntegerType()),\n StructField('sales_amount', FloatType()),\n StructField('store_id', IntegerType()),\n StructField('discount_amount', FloatType()),\n StructField('transaction_time', IntegerType()),\n StructField('week_no', IntegerType()),\n StructField('coupon_discount', FloatType()),\n StructField('coupon_discount_match', FloatType())\n ])\n\n# read data to dataframe\ndf = (spark\n .read\n .csv(\n adlsRootPath + '/examples/data/csv/completejourney/transaction_data.csv',\n header=True,\n schema=transactions_schema))\n\ndf.write\\\n .format('delta')\\\n .mode('overwrite')\\\n .option('overwriteSchema', 'true')\\\n .option('path', adlsRootPath + '/examples/data/csv/completejourney/silver/transactions')\\\n .saveAsTable('transactions')\n\ndf.count()\n\n# # create table object to make delta lake queryable\n# _ = spark.sql(f'''\n# CREATE TABLE transactions\n# USING DELTA\n# LOCATION '{adlsRootPath}/examples/data/csv/completejourney/silver/transactions'\n# ''')\n\n# show data\ndisplay(\n spark.table('transactions')\n )" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698563188.319939", - "parent_user_id": "U05T8BJD4DU" - } - ] - }, - { - "client_msg_id": "f5fddb24-8cfd-4510-8b01-8c70061bb054", - "type": "message", - "text": "Hello, has anyone run into similar error as posted in this github open issues[] while setting up marquez on an EC2 Instance, would appreciate any help to get past the errors", - "user": "U05CAULTYG2", - "ts": "1698440472.145489", - "blocks": [ - { - "type": "rich_text", - "block_id": "Ftpj6", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hello, has anyone run into similar error as posted in this github open issues[" - }, - { - "type": "link", - "url": "https://github.com/MarquezProject/marquez/issues/2468" - }, - { - "type": "text", - "text": "] while setting up marquez on an EC2 Instance, would appreciate any help to get past the errors" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1680563711, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/MarquezProject/marquez/issues/2468", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2468 Connection error PostgreSQL and Marquez", - "text": "Hi all!\n\nI really appreciate this repository and would love to use it for its data lineage aspect! \nLast few days, I've been trying to get the `marquez api` Docker image up and running but whatever I try, it does not seem to work.\n\nIt returns the following errors in Docker Desktop:\n\n`2023-04-03 18:07:04 marquez-db | 2023-04-03 23:07:04.726 GMT [35] FATAL: password authentication failed for user \"marquez\" 2023-04-03 18:07:04 marquez-db | 2023-04-03 23:07:04.726 GMT [35] DETAIL: Role \"marquez\" does not exist.`\n\nand\n\n`ERROR [2023-04-03 22:44:31,217] org.apache.tomcat.jdbc.pool.ConnectionPool: Unable to create initial connections of pool. 2023-04-03 17:44:31 ! org.postgresql.util.PSQLException: FATAL: password authentication failed for user \"marquez\"`\n\nAll the steps in the README.md file were used to configure the PostgreSQL database for Marquez and to set the correct environment variables. \nDuring my trouble shooting practices, I found a relatively similar issue on StackOverflow which refers to the danger in using the same ports for Postgres and Marquez but this did not help me yet in solving the issue ().\n\nCould you please help me out?\n\nKind regards,\n\nTom", - "title": "#2468 Connection error PostgreSQL and Marquez", - "title_link": "https://github.com/MarquezProject/marquez/issues/2468", - "footer": "", - "fields": [ - { - "value": "5", - "title": "Comments", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1698440472.145489", - "reply_count": 33, - "reply_users_count": 2, - "latest_reply": "1698453369.842939", - "reply_users": [ - "U01DCMDFHBK", - "U05CAULTYG2" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "2d34aa83-0a6c-4509-9577-cb6685597919", - "type": "message", - "text": "Hmm, have you looked over our docs?", - "user": "U01DCMDFHBK", - "ts": "1698440670.762299", - "blocks": [ - { - "type": "rich_text", - "block_id": "tYvDg", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hmm, have you looked over our " - }, - { - "type": "link", - "url": "https://marquezproject.ai/docs/deployment/running-on-aws", - "text": "Running on AWS" - }, - { - "type": "text", - "text": " docs?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "cb6742ec-9a22-485f-8542-b87b1d3cc465", - "type": "message", - "text": "More specifically, the AWS RDS section. How are you deploying Marquez on Ec2?", - "user": "U01DCMDFHBK", - "ts": "1698440768.525519", - "blocks": [ - { - "type": "rich_text", - "block_id": "4lIXN", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "More specifically, the AWS RDS section. How are you deploying Marquez on Ec2?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "ee6b2c14-106d-4e0c-8978-d098ca03d902", - "type": "message", - "text": "we were primarily referencing this document on git - ", - "user": "U05CAULTYG2", - "ts": "1698440885.315539", - "blocks": [ - { - "type": "rich_text", - "block_id": "jUo+G", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "we were primarily referencing this document on git - " - }, - { - "type": "link", - "url": "https://github.com/MarquezProject/marquez" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/MarquezProject/marquez", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "MarquezProject/marquez", - "text": "Collect, aggregate, and visualize a data ecosystem's metadata", - "title": "MarquezProject/marquez", - "fields": [ - { - "value": "", - "title": "Website", - "short": true - }, - { - "value": "1450", - "title": "Stars", - "short": true - } - ] - } - ], - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "f564f3a6-19b6-4e3a-86f1-b289b385bf20", - "type": "message", - "text": "leveraged docker and docker-compose", - "user": "U05CAULTYG2", - "ts": "1698440945.117219", - "blocks": [ - { - "type": "rich_text", - "block_id": "jn0Hq", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "leveraged docker and docker-compose" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "640a7548-04df-4d37-9e34-ad60b8e4dbea", - "type": "message", - "text": "hmm so you’re running docker-compose up on an Ec2 instance you’ve ssh’d into? (just trying to understand your setup better)", - "user": "U01DCMDFHBK", - "ts": "1698441190.356069", - "blocks": [ - { - "type": "rich_text", - "block_id": "XDyez", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "hmm so you’re running docker-compose up on an Ec2 instance you’ve ssh’d into? (just trying to understand your setup better)" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "1872d1ac-0d97-4627-be45-3f719c4bd8cb", - "type": "message", - "text": "yes, thats correct", - "user": "U05CAULTYG2", - "ts": "1698441206.923589", - "blocks": [ - { - "type": "rich_text", - "block_id": "uuZAU", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yes, thats correct" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "8003944a-a989-4a14-9c78-150d02232b97", - "type": "message", - "text": "I’ve only used docker compose for local dev or integration tests. but, ok you’re probably in the PoC phase. Can you run the docker cmd on you local machine successfully? What OS is stalled on the Ec2 instance?", - "user": "U01DCMDFHBK", - "ts": "1698441399.016209", - "blocks": [ - { - "type": "rich_text", - "block_id": "yy+k/", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I’ve only used docker compose for local dev or integration tests. but, ok you’re probably in the PoC phase. Can you run the docker cmd on you local machine successfully? What OS is stalled on the Ec2 instance?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U01DCMDFHBK", - "ts": "1698441405.000000" - }, - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "abeb855c-681e-4f12-b3dc-b8037873bffc", - "type": "message", - "text": "yes, i can run and the OS is Ubuntu 20.04.6 LTS", - "user": "U05CAULTYG2", - "ts": "1698441480.358829", - "blocks": [ - { - "type": "rich_text", - "block_id": "VijLp", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yes, i can run and the OS is Ubuntu 20.04.6 LTS" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "24604801-85ba-45b1-a4dd-89ddbe04bc57", - "type": "message", - "text": "we initiallly ran into a permission denied error related to postgressql.conf file and we had to update file permissions to 777 and after which we started to see below errors", - "user": "U05CAULTYG2", - "ts": "1698441567.903199", - "blocks": [ - { - "type": "rich_text", - "block_id": "ZzdHw", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "we initiallly ran into a permission denied error related to postgressql.conf file and we had to update file permissions to 777 and after which we started to see below errors" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "dcdd7b78-918e-49e6-9a9f-d36625ee3be7", - "type": "message", - "text": "marquez-db | 2023-10-27 20:35:52.512 GMT [35] FATAL: no pg_hba.conf entry for host \"172.18.0.5\", user \"marquez\", database \"marquez\", no encryption\n marquez-db | 2023-10-27 20:35:52.529 GMT [36] FATAL: no pg_hba.conf entry for host \"172.18.0.5\", user \"marquez\", database \"marquez\", no encryption", - "user": "U05CAULTYG2", - "ts": "1698441576.039939", - "blocks": [ - { - "type": "rich_text", - "block_id": "g0oZR", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "marquez-db | 2023-10-27 20:35:52.512 GMT [35] FATAL: no pg_hba.conf entry for host \"172.18.0.5\", user \"marquez\", database \"marquez\", no encryption\n marquez-db | 2023-10-27 20:35:52.529 GMT [36] FATAL: no pg_hba.conf entry for host \"172.18.0.5\", user \"marquez\", database \"marquez\", no encryption" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "edfaac42-b2a9-4041-9d56-75e0714cbfbd", - "type": "message", - "text": "we then manually updated pg_hba.conf file to include host user and db details", - "user": "U05CAULTYG2", - "ts": "1698441612.909429", - "blocks": [ - { - "type": "rich_text", - "block_id": "x3bgz", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "we then manually updated pg_hba.conf file to include host user and db details" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "d29f12c2-4054-4b75-9fc8-cdfd29b95bbd", - "type": "message", - "text": "Did you also update the `marquez.yml` with the db user / password?", - "user": "U01DCMDFHBK", - "ts": "1698441642.635089", - "blocks": [ - { - "type": "rich_text", - "block_id": "KynpH", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Did you also update the " - }, - { - "type": "text", - "text": "marquez.yml", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " with the db user / password?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "e28dc315-eddd-4ef6-ad18-1b8a3ac9a007", - "type": "message", - "text": "after which we started to see the errors posted in the github open issues page", - "user": "U05CAULTYG2", - "ts": "1698441648.000319", - "blocks": [ - { - "type": "rich_text", - "block_id": "B/I8C", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "after which we started to see the errors posted in the github open issues page" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "6352952a-445b-449e-bf05-6e1ed6f45f59", - "type": "message", - "text": "hmm are you using an external database or are you spinning up the entire Marquez stack with docker compose?", - "user": "U01DCMDFHBK", - "ts": "1698441693.285399", - "blocks": [ - { - "type": "rich_text", - "block_id": "jgcBs", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "hmm are you using an external database or are you spinning up the entire Marquez stack with docker compose?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "a951c03a-b61d-4544-b053-da995814c476", - "type": "message", - "text": "we are spinning up the entire Marquez stack with docker compose", - "user": "U05CAULTYG2", - "ts": "1698441716.256449", - "blocks": [ - { - "type": "rich_text", - "block_id": "uzxNS", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "we are spinning up the entire Marquez stack with docker compose" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "38ba244b-7896-416b-ae31-3d521c0c7295", - "type": "message", - "text": "we did not change anything in the marquez.yml, i think we did not find that file in the github repo that we cloned into our local instance", - "user": "U05CAULTYG2", - "ts": "1698441804.724249", - "blocks": [ - { - "type": "rich_text", - "block_id": "VCqMj", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "we did not change anything in the marquez.yml, i think we did not find that file in the github repo that we cloned into our local instance" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05CAULTYG2", - "ts": "1698441835.000000" - }, - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "e698c011-e929-447f-a352-9ec8429f6969", - "type": "message", - "text": "It’s important that the script runs, but I don’t think it is", - "user": "U01DCMDFHBK", - "ts": "1698441991.492989", - "blocks": [ - { - "type": "rich_text", - "block_id": "GZZrk", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "It’s important that the " - }, - { - "type": "link", - "url": "https://github.com/MarquezProject/marquez/blob/main/docker/init-db.sh", - "text": "init-db.sh", - "unsafe": true - }, - { - "type": "text", - "text": " script runs, but I don’t think it is" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "65773a50-eaff-475e-9ca7-2dde2df541c9", - "type": "message", - "text": "can you grab all the docker compose logs and share them? it’s hard to debug otherwise", - "user": "U01DCMDFHBK", - "ts": "1698442016.355249", - "blocks": [ - { - "type": "rich_text", - "block_id": "DKq6n", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "can you grab all the docker compose logs and share them? it’s hard to debug otherwise" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U01DCMDFHBK", - "ts": "1698442042.000000" - }, - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "type": "message", - "text": "", - "files": [ - { - "id": "F06375RT0LS", - "created": 1698442196, - "timestamp": 1698442196, - "name": "logs.txt", - "title": "logs.txt", - "mimetype": "text/plain", - "filetype": "text", - "pretty_type": "Plain Text", - "user": "U05CAULTYG2", - "user_team": "T01CWUYP5AR", - "editable": true, - "size": 24143, - "mode": "snippet", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F06375RT0LS/logs.txt", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F06375RT0LS/download/logs.txt", - "permalink": "https://openlineage.slack.com/files/U05CAULTYG2/F06375RT0LS/logs.txt", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F06375RT0LS-4134397fd8", - "edit_link": "https://openlineage.slack.com/files/U05CAULTYG2/F06375RT0LS/logs.txt/edit", - "preview": "root@ip-172-30-4-153:~/marquez# ./docker/up.sh --tag 0.37.0 -a 5000 -m 5001 -w 3000 --build\r\n...creating volumes: marquez_data, marquez_db-conf, marquez_db-init, marquez_db-backup\r\nSuccessfully copied 7.17kB to volumes-provisioner:/data/wait-for-it.sh\r\nAdded files to volume marquez_data: wait-for-it.sh\r\nSuccessfully copied 2.05kB to volumes-provisioner:/db-conf/postgresql.conf\r", - "preview_highlight": "
\n
\n
root@ip-172-30-4-153:~/marquez# ./docker/up.sh --tag 0.37.0 -a 5000 -m 5001 -w 3000 --build
\n
...creating volumes: marquez_data, marquez_db-conf, marquez_db-init, marquez_db-backup
\n
Successfully copied 7.17kB to volumes-provisioner:/data/wait-for-it.sh
\n
Added files to volume marquez_data: wait-for-it.sh
\n
Successfully copied 2.05kB to volumes-provisioner:/db-conf/postgresql.conf
\n
\n
\n
\n", - "lines": 197, - "lines_more": 192, - "preview_is_truncated": true, - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05CAULTYG2", - "display_as_bot": false, - "ts": "1698442199.308129", - "client_msg_id": "42d74ed6-31c7-4f4b-b8cd-c61ac213f3d1", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "5c3d2cac-67b0-4867-a5f3-f700f2b81a7d", - "type": "message", - "text": "I would first suggest to remove the `--build` flag since you are specifying a version of Marquez to use via `--tag`", - "user": "U01DCMDFHBK", - "ts": "1698442395.550239", - "blocks": [ - { - "type": "rich_text", - "block_id": "+nm/W", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I would first suggest to remove the " - }, - { - "type": "text", - "text": "--build", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " flag since you are specifying a version of Marquez to use via " - }, - { - "type": "text", - "text": "--tag", - "style": { - "code": true - } - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U01DCMDFHBK", - "ts": "1698442406.000000" - }, - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "def512a3-f14a-4498-a6dd-d8f50ad2452c", - "type": "message", - "text": "no the issue per se, but will help clear up some of the logs", - "user": "U01DCMDFHBK", - "ts": "1698442429.280759", - "blocks": [ - { - "type": "rich_text", - "block_id": "JNwY0", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "no the issue per se, but will help clear up some of the logs" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "73a94f34-a8b0-4189-966c-6f99563ededd", - "type": "message", - "text": "for sure thanks. we could get the logs without the --build portion, we tried with that option just once", - "user": "U05CAULTYG2", - "ts": "1698442506.399629", - "blocks": [ - { - "type": "rich_text", - "block_id": "0uCEX", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "for sure thanks. we could get the logs without the --build portion, we tried with that option just once" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "3aec3540-c8ab-440e-8b06-2a5ab2361c8f", - "type": "message", - "text": "the errors were the same with/without --build option", - "user": "U05CAULTYG2", - "ts": "1698442540.055639", - "blocks": [ - { - "type": "rich_text", - "block_id": "qAAA2", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "the errors were the same with/without --build option" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "dd6c5ac8-a49a-48fb-8416-1ed633ef38a3", - "type": "message", - "text": "marquez-api | ERROR [2023-10-27 21:34:58,019] org.apache.tomcat.jdbc.pool.ConnectionPool: Unable to create initial connections of pool.\n marquez-api | ! org.postgresql.util.PSQLException: FATAL: password authentication failed for user \"marquez\"\n marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:693)\n marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:203)\n marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258)\n marquez-api | ! at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54)\n marquez-api | ! at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:253)\n marquez-api | ! at org.postgresql.Driver.makeConnection(Driver.java:434)\n marquez-api | ! at org.postgresql.Driver.connect(Driver.java:291)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:346)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:227)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:768)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:696)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.init(ConnectionPool.java:495)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.<init>(ConnectionPool.java:153)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.pCreatePool(DataSourceProxy.java:118)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.createPool(DataSourceProxy.java:107)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:131)\n marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcUtils.openConnection(JdbcUtils.java:48)\n marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcConnectionFactory.<init>(JdbcConnectionFactory.java:75)\n marquez-api | ! at org.flywaydb.core.FlywayExecutor.execute(FlywayExecutor.java:147)\n marquez-api | ! at (Flyway.java:190)\n marquez-api | ! at marquez.db.DbMigration.hasPendingDbMigrations(DbMigration.java:73)\n marquez-api | ! at marquez.db.DbMigration.migrateDbOrError(DbMigration.java:27)\n marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:105)\n marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:48)\n marquez-api | ! at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:67)\n marquez-api | ! at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:98)\n marquez-api | ! at io.dropwizard.cli.Cli.run(Cli.java:78)\n marquez-api | ! at io.dropwizard.Application.run(Application.java:94)\n marquez-api | ! at marquez.MarquezApp.main(MarquezApp.java:60)\n marquez-api | INFO [2023-10-27 21:34:58,024] marquez.MarquezApp: Stopping app...", - "user": "U05CAULTYG2", - "ts": "1698442562.316779", - "blocks": [ - { - "type": "rich_text", - "block_id": "dzUFO", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "marquez-api | ERROR [2023-10-27 21:34:58,019] org.apache.tomcat.jdbc.pool.ConnectionPool: Unable to create initial connections of pool.\n marquez-api | ! org.postgresql.util.PSQLException: FATAL: password authentication failed for user \"marquez\"\n marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:693)\n marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:203)\n marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258)\n marquez-api | ! at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54)\n marquez-api | ! at org.postgresql.jdbc.PgConnection.(PgConnection.java:253)\n marquez-api | ! at org.postgresql.Driver.makeConnection(Driver.java:434)\n marquez-api | ! at org.postgresql.Driver.connect(Driver.java:291)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:346)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:227)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:768)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:696)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.init(ConnectionPool.java:495)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.(ConnectionPool.java:153)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.pCreatePool(DataSourceProxy.java:118)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.createPool(DataSourceProxy.java:107)\n marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:131)\n marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcUtils.openConnection(JdbcUtils.java:48)\n marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcConnectionFactory.(JdbcConnectionFactory.java:75)\n marquez-api | ! at org.flywaydb.core.FlywayExecutor.execute(FlywayExecutor.java:147)\n marquez-api | ! at " - }, - { - "type": "link", - "url": "http://org.flywaydb.core.Flyway.info", - "text": "org.flywaydb.core.Flyway.info" - }, - { - "type": "text", - "text": "(Flyway.java:190)\n marquez-api | ! at marquez.db.DbMigration.hasPendingDbMigrations(DbMigration.java:73)\n marquez-api | ! at marquez.db.DbMigration.migrateDbOrError(DbMigration.java:27)\n marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:105)\n marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:48)\n marquez-api | ! at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:67)\n marquez-api | ! at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:98)\n marquez-api | ! at io.dropwizard.cli.Cli.run(Cli.java:78)\n marquez-api | ! at io.dropwizard.Application.run(Application.java:94)\n marquez-api | ! at marquez.MarquezApp.main(MarquezApp.java:60)\n marquez-api | INFO [2023-10-27 21:34:58,024] marquez.MarquezApp: Stopping app..." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "ce32c470-e688-4743-8a21-9a56adfc16fa", - "type": "message", - "text": "debugging docker issues like this is so difficult", - "user": "U01DCMDFHBK", - "ts": "1698442732.837229", - "blocks": [ - { - "type": "rich_text", - "block_id": "qtDaI", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "debugging docker issues like this is so difficult" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "19293328-ba28-471f-80cf-e9f3fb13c8a2", - "type": "message", - "text": "it could be a number of things, but you are connected to the database it’s just that the `marquez` user hasn’t been created", - "user": "U01DCMDFHBK", - "ts": "1698442844.009939", - "blocks": [ - { - "type": "rich_text", - "block_id": "+gnXP", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "it could be a number of things, but you are connected to the database it’s just that the " - }, - { - "type": "text", - "text": "marquez", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " user hasn’t been created" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U01DCMDFHBK", - "ts": "1698442854.000000" - }, - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "84bf8b37-8370-4feb-981d-2ea007a8b564", - "type": "message", - "text": "the is what manages user creation", - "user": "U01DCMDFHBK", - "ts": "1698442919.199779", - "blocks": [ - { - "type": "rich_text", - "block_id": "+hC5M", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "the " - }, - { - "type": "link", - "url": "https://github.com/MarquezProject/marquez/blob/main/docker/init-db.sh", - "text": "/init-db.sh", - "unsafe": true - }, - { - "type": "text", - "text": " is what manages user creation" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U01DCMDFHBK", - "ts": "1698443706.000000" - }, - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "0e1f65c6-8467-45ff-ae4d-8de0c394366c", - "type": "message", - "text": "so it’s possible that the script isn’t running for whatever reason on your Ec2 instance", - "user": "U01DCMDFHBK", - "ts": "1698442937.126109", - "blocks": [ - { - "type": "rich_text", - "block_id": "Cu5m8", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "so it’s possible that the script isn’t running for whatever reason on your Ec2 instance" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U01DCMDFHBK", - "ts": "1698442952.000000" - }, - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "64cd0bf6-4183-49fe-8660-5125e06b2837", - "type": "message", - "text": "do you have other services running on that Ec2 instance? Like, other than Marquez", - "user": "U01DCMDFHBK", - "ts": "1698443060.353359", - "blocks": [ - { - "type": "rich_text", - "block_id": "vTxrE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "do you have other services running on that Ec2 instance? Like, other than Marquez" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "e13ce2ef-93c0-49f4-b98f-88bf718d1bd5", - "type": "message", - "text": "is there a postgres process running outside of docker?", - "user": "U01DCMDFHBK", - "ts": "1698443092.180609", - "blocks": [ - { - "type": "rich_text", - "block_id": "r88uD", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "is there a postgres process running outside of docker?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "ae9a1d48-ddce-4ff3-9d47-f51cc6f96299", - "type": "message", - "text": "no other services except marquez on this EC2 instance", - "user": "U05CAULTYG2", - "ts": "1698453290.531319", - "blocks": [ - { - "type": "rich_text", - "block_id": "ynK45", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "no other services except marquez on this EC2 instance" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "99164b6f-b536-49cb-94ff-2812cdd776a5", - "type": "message", - "text": "this was a new Ec2 instance that was spun up to install and use marquez", - "user": "U05CAULTYG2", - "ts": "1698453349.420079", - "blocks": [ - { - "type": "rich_text", - "block_id": "GxrIE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "this was a new Ec2 instance that was spun up to install and use marquez" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - }, - { - "client_msg_id": "7643a6c4-a112-4a68-8b8c-84e2e82e757b", - "type": "message", - "text": "n we can confirm that no postgres process runs outside of docker", - "user": "U05CAULTYG2", - "ts": "1698453369.842939", - "blocks": [ - { - "type": "rich_text", - "block_id": "+K3Ga", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "n we can confirm that no postgres process runs outside of docker" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698440472.145489", - "parent_user_id": "U05CAULTYG2" - } - ] - }, - { - "type": "message", - "subtype": "thread_broadcast", - "text": "referencing to conversation - what it takes to move to openlineage provider package from openlineage-airflow. Im updating Airflow to 2.7.2 but moving off of openlineage-airflow to provider package Im trying to estimate the amount of work it takes, any thoughts? reading change_logs I dont think its too much of a change but please share your thoughts and if somewhere its drafted please do share that as well", - "user": "U062WLFMRTP", - "ts": "1698429543.349989", - "thread_ts": "1698340358.557159", - "root": { - "type": "message", - "text": "Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated", - "files": [ - { - "id": "F062ZFJN2UB", - "created": 1698340299, - "timestamp": 1698340299, - "name": "Screenshot 2023-10-26 at 10.11.34 AM.png", - "title": "Screenshot 2023-10-26 at 10.11.34 AM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U062WLFMRTP", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 356434, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F062ZFJN2UB/screenshot_2023-10-26_at_10.11.34_am.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F062ZFJN2UB/download/screenshot_2023-10-26_at_10.11.34_am.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_360.png", - "thumb_360_w": 360, - "thumb_360_h": 222, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_480.png", - "thumb_480_w": 480, - "thumb_480_h": 297, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_720.png", - "thumb_720_w": 720, - "thumb_720_h": 445, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_800.png", - "thumb_800_w": 800, - "thumb_800_h": 494, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_960.png", - "thumb_960_w": 960, - "thumb_960_h": 593, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 633, - "original_w": 1756, - "original_h": 1085, - "thumb_tiny": "AwAdADC8etKKQ9acKADFLRRQAUUUUAMPWnCm96UUAOopM+1GaAFopM0ZoA//2Q==", - "permalink": "https://openlineage.slack.com/files/U062WLFMRTP/F062ZFJN2UB/screenshot_2023-10-26_at_10.11.34_am.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F062ZFJN2UB-c8de1a91b2", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U062WLFMRTP", - "display_as_bot": false, - "ts": "1698340358.557159", - "blocks": [ - { - "type": "rich_text", - "block_id": "CRXyh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated" - } - ] - } - ] - } - ], - "client_msg_id": "925bbcea-663c-4480-8809-2e2b3dd06020", - "thread_ts": "1698340358.557159", - "reply_count": 39, - "reply_users_count": 4, - "latest_reply": "1698786461.272129", - "reply_users": [ - "U062WLFMRTP", - "U02S6F54MAB", - "U01RA9B5GG2", - "U04AZ7992SU" - ], - "is_locked": false, - "subscribed": false - }, - "blocks": [ - { - "type": "rich_text", - "block_id": "bMuXM", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "referencing to " - }, - { - "type": "link", - "url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1698398754823079?thread_ts=1698340358.557159&cid=C01CK9T7HKR", - "text": "this" - }, - { - "type": "text", - "text": " conversation - what it takes to move to openlineage provider package from openlineage-airflow. Im updating Airflow to 2.7.2 but moving off of openlineage-airflow to provider package Im trying to estimate the amount of work it takes, any thoughts? reading change_logs I dont think its too much of a change but please share your thoughts and if somewhere its drafted please do share that as well" - } - ] - } - ] - } - ], - "client_msg_id": "0135497a-c1fc-4c07-b449-3c9178cfa2a8" - }, - { - "client_msg_id": "28438eb8-f58b-4cb9-8de5-547e06c2bf90", - "type": "message", - "text": "Hi People, actually I want to intercept the OpenLineage spark events right after the job ends and before they are emitted, so that I can add some extra information to the events or remove some information that I don't want.\nIs there any way of doing this? Can someone please help me", - "user": "U06315TMT61", - "ts": "1698408752.647169", - "blocks": [ - { - "type": "rich_text", - "block_id": "tvUP2", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi People, actually I want to intercept the OpenLineage spark events right after the job ends and before they are emitted, so that I can add some extra information to the events or remove some information that I don't want.\nIs there any way of doing this? Can someone please help me" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698408752.647169", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1698671037.466989", - "reply_users": [ - "U02LXF3HUN7" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1698671037.466989", - "replies": [ - { - "client_msg_id": "b4580e02-5543-4906-9c7a-419573392b68", - "type": "message", - "text": "It general, I think this kind of use case is probably best served by , but what do you think <@U02MK6YNAQ5>?", - "user": "U02LXF3HUN7", - "ts": "1698671037.466989", - "blocks": [ - { - "type": "rich_text", - "block_id": "FjMSM", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "It general, I think this kind of use case is probably best served by " - }, - { - "type": "link", - "url": "https://openlineage.io/docs/guides/facets", - "text": "facets" - }, - { - "type": "text", - "text": ", but what do you think " - }, - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": "?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.io/docs/guides/facets", - "service_icon": "https://openlineage.io/img/favicon.ico", - "id": 1, - "original_url": "https://openlineage.io/docs/guides/facets", - "fallback": "Understanding and Using Facets | OpenLineage", - "text": "Adapted from the OpenLineage spec.", - "title": "Understanding and Using Facets | OpenLineage", - "title_link": "https://openlineage.io/docs/guides/facets", - "service_name": "openlineage.io" - } - ], - "thread_ts": "1698408752.647169", - "parent_user_id": "U06315TMT61" - } - ] - }, - { - "client_msg_id": "0683d053-f5f3-4f47-9eda-126d40f8055b", - "type": "message", - "text": "*Spark Integration Logs*\nHey There\nAre these events skipped because it's not supported or it's configured somewhere?\n`23/10/27 08:25:58 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart`\n`23/10/27 08:25:58 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd`", - "user": "U05TU0U224A", - "ts": "1698400165.662489", - "blocks": [ - { - "type": "rich_text", - "block_id": "P+NCz", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark Integration Logs", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "\nHey There\nAre these events skipped because it's not supported or it's configured somewhere?\n" - }, - { - "type": "text", - "text": "23/10/27 08:25:58 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "23/10/27 08:25:58 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd", - "style": { - "code": true - } - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05TU0U224A", - "ts": "1698400172.000000" - } - }, - { - "type": "message", - "text": "Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated", - "files": [ - { - "id": "F062ZFJN2UB", - "created": 1698340299, - "timestamp": 1698340299, - "name": "Screenshot 2023-10-26 at 10.11.34 AM.png", - "title": "Screenshot 2023-10-26 at 10.11.34 AM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U062WLFMRTP", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 356434, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F062ZFJN2UB/screenshot_2023-10-26_at_10.11.34_am.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F062ZFJN2UB/download/screenshot_2023-10-26_at_10.11.34_am.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_360.png", - "thumb_360_w": 360, - "thumb_360_h": 222, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_480.png", - "thumb_480_w": 480, - "thumb_480_h": 297, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_720.png", - "thumb_720_w": 720, - "thumb_720_h": 445, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_800.png", - "thumb_800_w": 800, - "thumb_800_h": 494, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_960.png", - "thumb_960_w": 960, - "thumb_960_h": 593, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 633, - "original_w": 1756, - "original_h": 1085, - "thumb_tiny": "AwAdADC8etKKQ9acKADFLRRQAUUUUAMPWnCm96UUAOopM+1GaAFopM0ZoA//2Q==", - "permalink": "https://openlineage.slack.com/files/U062WLFMRTP/F062ZFJN2UB/screenshot_2023-10-26_at_10.11.34_am.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F062ZFJN2UB-c8de1a91b2", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U062WLFMRTP", - "display_as_bot": false, - "ts": "1698340358.557159", - "blocks": [ - { - "type": "rich_text", - "block_id": "CRXyh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated" - } - ] - } - ] - } - ], - "client_msg_id": "925bbcea-663c-4480-8809-2e2b3dd06020", - "thread_ts": "1698340358.557159", - "reply_count": 39, - "reply_users_count": 4, - "latest_reply": "1698786461.272129", - "reply_users": [ - "U062WLFMRTP", - "U02S6F54MAB", - "U01RA9B5GG2", - "U04AZ7992SU" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "36c27c05-57eb-4ff9-8a1b-a4e0b4e7c5ee", - "type": "message", - "text": "<@U02S6F54MAB> any thoughts?", - "user": "U062WLFMRTP", - "ts": "1698340442.113079", - "blocks": [ - { - "type": "rich_text", - "block_id": "Zq5Bz", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02S6F54MAB" - }, - { - "type": "text", - "text": " any thoughts?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "d35a745b-327a-495b-a182-948c7012399a", - "type": "message", - "text": "what version of Airflow are you using?", - "user": "U02S6F54MAB", - "ts": "1698340464.257599", - "blocks": [ - { - "type": "rich_text", - "block_id": "Dheg7", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "what version of Airflow are you using?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "0fb8b18d-009d-45c6-81e1-8bea9a4e1436", - "type": "message", - "text": "2.6.3 that satisfies the requirement", - "user": "U062WLFMRTP", - "ts": "1698340492.528189", - "blocks": [ - { - "type": "rich_text", - "block_id": "vUCUN", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "2.6.3 that satisfies the requirement" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "14706124-4868-4b05-9015-ec735e92f234", - "type": "message", - "text": "is it possible you have some custom operator?", - "user": "U02S6F54MAB", - "ts": "1698340598.078979", - "blocks": [ - { - "type": "rich_text", - "block_id": "vK+L0", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "is it possible you have some custom operator?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "396e6c98-d994-4805-aeba-82ceb7aeb7eb", - "type": "message", - "text": "i think its the base operator causing the issue", - "user": "U062WLFMRTP", - "ts": "1698340635.640199", - "blocks": [ - { - "type": "rich_text", - "block_id": "mcI4u", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "i think its the base operator causing the issue" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "ab03eaa3-7620-4f81-b8fb-950db728e114", - "type": "message", - "text": "so no i believe", - "user": "U062WLFMRTP", - "ts": "1698340656.239529", - "blocks": [ - { - "type": "rich_text", - "block_id": "lqOsV", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "so no i believe" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "786dbec2-66c6-4761-9d00-23401ba14a0a", - "type": "message", - "text": "BaseOperator is parent class for any other operators, it defines how to do deepcopy", - "user": "U02S6F54MAB", - "ts": "1698340723.317029", - "blocks": [ - { - "type": "rich_text", - "block_id": "jre16", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "BaseOperator is parent class for any other operators, it defines how to do deepcopy" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "09433b61-e80a-40b0-8c50-30a67307ed11", - "type": "message", - "text": "yeah so its controlled by Airflow itself, I didnt customize it", - "user": "U062WLFMRTP", - "ts": "1698340751.496409", - "blocks": [ - { - "type": "rich_text", - "block_id": "Gi0r4", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yeah so its controlled by Airflow itself, I didnt customize it" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "640ee14f-9771-46d2-a64a-926341256843", - "type": "message", - "text": "uhm, maybe it's possible you could share dag code? you may hide sensitive data", - "user": "U02S6F54MAB", - "ts": "1698340789.650199", - "blocks": [ - { - "type": "rich_text", - "block_id": "q/Vb4", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "uhm, maybe it's possible you could share dag code? you may hide sensitive data" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "207201cb-bf3f-415c-851c-8ba731be623d", - "type": "message", - "text": "let me try with lower versions of openlineage, what's say", - "user": "U062WLFMRTP", - "ts": "1698340883.515319", - "blocks": [ - { - "type": "rich_text", - "block_id": "uhz4a", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "let me try with lower versions of openlineage, what's say" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "79d09ca2-28ae-49cc-a2a2-6d8d0ff4665e", - "type": "message", - "text": "its a big jump from 0.24.0 to 1.4.1", - "user": "U062WLFMRTP", - "ts": "1698340899.088059", - "blocks": [ - { - "type": "rich_text", - "block_id": "bOtmE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "its a big jump from 0.24.0 to 1.4.1" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "01860990-c896-44c9-b9d0-525f56827276", - "type": "message", - "text": "but i will help here to investigate this issue", - "user": "U062WLFMRTP", - "ts": "1698340945.409289", - "blocks": [ - { - "type": "rich_text", - "block_id": "s22lz", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "but i will help here to investigate this issue" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "9ac38164-dc76-4a5c-9cb3-69dd9eb5fd67", - "type": "message", - "text": "for me it seems that within dag or task you're defining some object that is not easy to copy", - "user": "U02S6F54MAB", - "ts": "1698341043.978079", - "blocks": [ - { - "type": "rich_text", - "block_id": "eWHBd", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "for me it seems that within dag or task you're defining some object that is not easy to copy" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "1cc0e144-e224-4edc-adcb-25418ccadaa6", - "type": "message", - "text": "possible, but with 0.24.0 that issue is not occurring, so worry is that the version upgrade could potentially break things", - "user": "U062WLFMRTP", - "ts": "1698341165.007029", - "blocks": [ - { - "type": "rich_text", - "block_id": "vpwe5", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "possible, but with 0.24.0 that issue is not occurring, so worry is that the version upgrade could potentially break things" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "a805ad1c-218e-4d80-8265-5a75094fc00d", - "type": "message", - "text": "0.24.0 is not that old :thinking_face:", - "user": "U02S6F54MAB", - "ts": "1698341974.726729", - "blocks": [ - { - "type": "rich_text", - "block_id": "PhAvj", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "0.24.0 is not that old " - }, - { - "type": "emoji", - "name": "thinking_face", - "unicode": "1f914" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "add426a1-3614-4a27-8fe3-bbc3939d0a6f", - "type": "message", - "text": "i see the issue with 0.24.0 I see it as warning\n```[airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/threading.py\", line 932, in _bootstrap_inner\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - self.run()\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/threading.py\", line 870, in run\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - self._target(*self._args, **self._kwargs)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/home/upgrade/.local/lib/python3.8/site-packages/openlineage/airflow/listener.py\", line 89, in on_running\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - task_instance_copy = copy.deepcopy(task_instance)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 172, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = _reconstruct(x, memo, *rv)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 270, in _reconstruct\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - state = deepcopy(state, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 146, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 230, in _deepcopy_dict\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 172, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = _reconstruct(x, memo, *rv)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 270, in _reconstruct\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - state = deepcopy(state, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 146, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 230, in _deepcopy_dict\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 153, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/home/upgrade/.local/lib/python3.8/site-packages/airflow/models/dag.py\", line 2162, in __deepcopy__\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - setattr(result, k, copy.deepcopy(v, memo))\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 146, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 230, in _deepcopy_dict\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 153, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/home/upgrade/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1224, in __deepcopy__\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - setattr(result, k, copy.deepcopy(v, memo))\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 172, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = _reconstruct(x, memo, *rv)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 270, in _reconstruct\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - state = deepcopy(state, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 146, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 230, in _deepcopy_dict\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 146, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 230, in _deepcopy_dict\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 153, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/home/upgrade/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1224, in __deepcopy__\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - setattr(result, k, copy.deepcopy(v, memo))\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 146, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 230, in _deepcopy_dict\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 161, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - rv = reductor(4)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - TypeError: cannot pickle 'module' object```\nbut with 1.4.1 its stopped processing any further and threw error", - "user": "U062WLFMRTP", - "ts": "1698342307.308619", - "blocks": [ - { - "type": "rich_text", - "block_id": "AT+nJ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "i see the issue with 0.24.0 I see it as warning\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "[airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/threading.py\", line 932, in _bootstrap_inner\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - self.run()\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/threading.py\", line 870, in run\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - self._target(*self._args, **self._kwargs)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/home/upgrade/.local/lib/python3.8/site-packages/openlineage/airflow/listener.py\", line 89, in on_running\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - task_instance_copy = copy.deepcopy(task_instance)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 172, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = _reconstruct(x, memo, *rv)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 270, in _reconstruct\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - state = deepcopy(state, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 146, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 230, in _deepcopy_dict\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 172, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = _reconstruct(x, memo, *rv)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 270, in _reconstruct\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - state = deepcopy(state, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 146, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 230, in _deepcopy_dict\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 153, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/home/upgrade/.local/lib/python3.8/site-packages/airflow/models/dag.py\", line 2162, in __deepcopy__\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - setattr(result, k, copy.deepcopy(v, memo))\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 146, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 230, in _deepcopy_dict\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 153, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/home/upgrade/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1224, in __deepcopy__\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - setattr(result, k, copy.deepcopy(v, memo))\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 172, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = _reconstruct(x, memo, *rv)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 270, in _reconstruct\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - state = deepcopy(state, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 146, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 230, in _deepcopy_dict\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 146, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 230, in _deepcopy_dict\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 153, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/home/upgrade/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1224, in __deepcopy__\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - setattr(result, k, copy.deepcopy(v, memo))\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 146, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y = copier(x, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 230, in _deepcopy_dict\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - y[deepcopy(key, memo)] = deepcopy(value, memo)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - File \"/usr/lib64/python3.8/copy.py\", line 161, in deepcopy\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - rv = reductor(4)\n[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - TypeError: cannot pickle 'module' object" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "but with 1.4.1 its stopped processing any further and threw error" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "type": "message", - "text": "I see the difference of calling in these 2 versions, current versions checks if Airflow is >2.6 then directly runs on_running but earlier version was running on separate thread. IS this what's raising this exception?", - "files": [ - { - "id": "F06300USGUS", - "created": 1698344212, - "timestamp": 1698344212, - "name": "Screenshot 2023-10-26 at 11.16.47 AM.png", - "title": "Screenshot 2023-10-26 at 11.16.47 AM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U062WLFMRTP", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 67732, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F06300USGUS/screenshot_2023-10-26_at_11.16.47_am.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F06300USGUS/download/screenshot_2023-10-26_at_11.16.47_am.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06300USGUS-7f787b1bdb/screenshot_2023-10-26_at_11.16.47_am_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06300USGUS-7f787b1bdb/screenshot_2023-10-26_at_11.16.47_am_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06300USGUS-7f787b1bdb/screenshot_2023-10-26_at_11.16.47_am_360.png", - "thumb_360_w": 360, - "thumb_360_h": 73, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06300USGUS-7f787b1bdb/screenshot_2023-10-26_at_11.16.47_am_480.png", - "thumb_480_w": 480, - "thumb_480_h": 97, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06300USGUS-7f787b1bdb/screenshot_2023-10-26_at_11.16.47_am_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06300USGUS-7f787b1bdb/screenshot_2023-10-26_at_11.16.47_am_720.png", - "thumb_720_w": 720, - "thumb_720_h": 145, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06300USGUS-7f787b1bdb/screenshot_2023-10-26_at_11.16.47_am_800.png", - "thumb_800_w": 800, - "thumb_800_h": 162, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06300USGUS-7f787b1bdb/screenshot_2023-10-26_at_11.16.47_am_960.png", - "thumb_960_w": 960, - "thumb_960_h": 194, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06300USGUS-7f787b1bdb/screenshot_2023-10-26_at_11.16.47_am_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 207, - "original_w": 1723, - "original_h": 348, - "thumb_tiny": "AwAJADDROKXmiloATmloooAQ80c0UtAH/9k=", - "permalink": "https://openlineage.slack.com/files/U062WLFMRTP/F06300USGUS/screenshot_2023-10-26_at_11.16.47_am.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F06300USGUS-3d699164d7", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U062WLFMRTP", - "display_as_bot": false, - "ts": "1698344288.616259", - "blocks": [ - { - "type": "rich_text", - "block_id": "DiTSY", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I see the difference of calling in these 2 versions, current versions checks if Airflow is >2.6 then directly runs on_running but earlier version was running on separate thread. IS this what's raising this exception?" - } - ] - } - ] - } - ], - "client_msg_id": "0feee1c7-ae91-4997-8a3e-31192460d416", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "f3ed8118-02ef-4c05-ab0f-1fb9839c7f48", - "type": "message", - "text": "this is the issue - while copying the task", - "user": "U062WLFMRTP", - "ts": "1698344689.617209", - "blocks": [ - { - "type": "rich_text", - "block_id": "54jfB", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "this is the issue - " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/c343835c1664eda94d5c315897ae6702854c81bd/integration/airflow/openlineage/airflow/listener.py#L89" - }, - { - "type": "text", - "text": " while copying the task" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/c343835c1664eda94d5c315897ae6702854c81bd/integration/airflow/openlineage/airflow/listener.py#L89", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\n task_instance_copy = copy.deepcopy(task_instance)\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "6359f22f-dc9a-4134-8ea4-6326b48ad13f", - "type": "message", - "text": "since we are directly running if version>2.6.0 therefore its throwing error in main processing", - "user": "U062WLFMRTP", - "ts": "1698344721.996409", - "blocks": [ - { - "type": "rich_text", - "block_id": "AMfCW", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "since we are directly running if version>2.6.0 therefore its throwing error in main processing" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "f32e38cd-277c-4426-a62e-44947b7fced4", - "type": "message", - "text": "may i know which Airflow version we tested this process?", - "user": "U062WLFMRTP", - "ts": "1698344882.925899", - "blocks": [ - { - "type": "rich_text", - "block_id": "9Vhnq", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "may i know which Airflow version we tested this process?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "c29bf7d1-f14f-4a11-be4f-4c34d72b41fc", - "type": "message", - "text": "im on 2.6.3", - "user": "U062WLFMRTP", - "ts": "1698344919.858279", - "blocks": [ - { - "type": "rich_text", - "block_id": "Nq6Px", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "im on 2.6.3" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "11c94018-1a6d-4559-8cad-d2c306a867c7", - "type": "message", - "text": "2.1.4, 2.2.4, 2.3.4, 2.4.3, 2.5.2, 2.6.1\nusually there are not too many changes between minor versions\n\nI still believe it might be some code you might improve and probably is also an antipattern in airflow", - "user": "U02S6F54MAB", - "ts": "1698345053.182559", - "blocks": [ - { - "type": "rich_text", - "block_id": "tzdJj", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "2.1.4, 2.2.4, 2.3.4, 2.4.3, 2.5.2, 2.6.1\nusually there are not too many changes between minor versions\n\nI still believe it might be some code you might improve and probably is also an antipattern in airflow" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02S6F54MAB", - "ts": "1698345063.000000" - }, - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "2c209341-29cc-46c8-a90c-74421a74355e", - "type": "message", - "text": "hummm...that's a valid observation but I dont write DAGS, other teams do, so imagine if many people wrote such DAGS I can't ask everyone to change their patterns right? If something is running on current openlineage version with warning that should still be running on upgraded version isn't it?", - "user": "U062WLFMRTP", - "ts": "1698345266.075579", - "blocks": [ - { - "type": "rich_text", - "block_id": "+DHu0", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "hummm...that's a valid observation but I dont write DAGS, other teams do, so imagine if many people wrote such DAGS I can't ask everyone to change their patterns right? If something is running on current openlineage version with warning that should still be running on upgraded version isn't it?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U062WLFMRTP", - "ts": "1698345284.000000" - }, - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "68f6ad78-00fd-4d6a-a3e6-6ee5d8d82504", - "type": "message", - "text": "however I see ur point", - "user": "U062WLFMRTP", - "ts": "1698345484.622449", - "blocks": [ - { - "type": "rich_text", - "block_id": "MQy78", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "however I see ur point" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "50fe8d3a-8dac-4ca4-a97d-0044a76cef3e", - "type": "message", - "text": "So that specific task has 570 line of query and pretty bulky query, let me split into smaller units", - "user": "U062WLFMRTP", - "ts": "1698346192.384519", - "blocks": [ - { - "type": "rich_text", - "block_id": "z9PWm", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "So that specific task has 570 line of query and pretty bulky query, let me split into smaller units" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "db15173c-b598-4942-a06a-4eeab1e2de60", - "type": "message", - "text": "that should help right? <@U02S6F54MAB>", - "user": "U062WLFMRTP", - "ts": "1698346215.462899", - "blocks": [ - { - "type": "rich_text", - "block_id": "iXCWL", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "that should help right? " - }, - { - "type": "user", - "user_id": "U02S6F54MAB" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "ea4541ab-3a19-40d6-8574-6770444fd9dd", - "type": "message", - "text": "query length shouldn’t be the issue, rather any python code", - "user": "U02S6F54MAB", - "ts": "1698346287.209199", - "blocks": [ - { - "type": "rich_text", - "block_id": "09bOT", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "query length shouldn’t be the issue, rather any python code" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "fb298080-70a7-4d96-a3d6-7550004ee9e9", - "type": "message", - "text": "I get your point too, we might figure out some mechanism to skip irrelevant parts of task instance so that it doesn’t fail then", - "user": "U02S6F54MAB", - "ts": "1698346310.544309", - "blocks": [ - { - "type": "rich_text", - "block_id": "alsM7", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I get your point too, we might figure out some mechanism to skip irrelevant parts of task instance so that it doesn’t fail then" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "11174c4d-7251-4007-9757-bccd95258aa1", - "type": "message", - "text": "actually its failing on that task itself", - "user": "U062WLFMRTP", - "ts": "1698346332.553519", - "blocks": [ - { - "type": "rich_text", - "block_id": "RkGPE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "actually its failing on that task itself" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "1a2bbd0d-3879-4bac-88d7-b558dbd81482", - "type": "message", - "text": "let me try it will be pretty quick", - "user": "U062WLFMRTP", - "ts": "1698346353.120079", - "blocks": [ - { - "type": "rich_text", - "block_id": "BzN/H", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "let me try it will be pretty quick" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "083fd308-34be-4b79-9b01-c0a615c191d8", - "type": "message", - "text": "<@U02S6F54MAB> but ur right we have to fix this at Openlineage side as well. Because ideally Openlineage shouldn't be causing any issue to the main DAG processing", - "user": "U062WLFMRTP", - "ts": "1698346738.780429", - "blocks": [ - { - "type": "rich_text", - "block_id": "1G5HT", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02S6F54MAB" - }, - { - "type": "text", - "text": " but ur right we have to fix this at Openlineage side as well. Because ideally Openlineage shouldn't be causing any issue to the main DAG processing" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "697ec9cc-e9ba-49d8-9e50-7176369827e4", - "type": "message", - "text": "it doesn’t break any airflow functionality, execution is wrapped into try/except block, only exception traceback is logged as you can see", - "user": "U02S6F54MAB", - "ts": "1698357065.052169", - "blocks": [ - { - "type": "rich_text", - "block_id": "XMsF6", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "it doesn’t break any airflow functionality, execution is wrapped into try/except block, only exception traceback is logged as you can see" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "618b0415-da4a-41f9-b614-eba5419054b8", - "type": "message", - "text": "Can you migrate to Airflow 2.7 and use `apache-airflow-providers-openlineage`? Ideally we wouldn't make meaningful changes to `openlineage-airflow`", - "user": "U01RA9B5GG2", - "ts": "1698398754.823079", - "blocks": [ - { - "type": "rich_text", - "block_id": "7U7pX", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Can you migrate to Airflow 2.7 and use " - }, - { - "type": "text", - "text": "apache-airflow-providers-openlineage", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "? Ideally we wouldn't make meaningful changes to " - }, - { - "type": "text", - "text": "openlineage-airflow", - "style": { - "code": true - } - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "c9f5545c-8442-43f7-a73b-0fe2dbdebd35", - "type": "message", - "text": "yup thats what im planning to do", - "user": "U062WLFMRTP", - "ts": "1698420944.617489", - "blocks": [ - { - "type": "rich_text", - "block_id": "tqNAF", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yup thats what im planning to do" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "type": "message", - "subtype": "thread_broadcast", - "text": "referencing to conversation - what it takes to move to openlineage provider package from openlineage-airflow. Im updating Airflow to 2.7.2 but moving off of openlineage-airflow to provider package Im trying to estimate the amount of work it takes, any thoughts? reading change_logs I dont think its too much of a change but please share your thoughts and if somewhere its drafted please do share that as well", - "user": "U062WLFMRTP", - "ts": "1698429543.349989", - "thread_ts": "1698340358.557159", - "root": { - "type": "message", - "text": "Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated", - "files": [ - { - "id": "F062ZFJN2UB", - "created": 1698340299, - "timestamp": 1698340299, - "name": "Screenshot 2023-10-26 at 10.11.34 AM.png", - "title": "Screenshot 2023-10-26 at 10.11.34 AM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U062WLFMRTP", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 356434, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F062ZFJN2UB/screenshot_2023-10-26_at_10.11.34_am.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F062ZFJN2UB/download/screenshot_2023-10-26_at_10.11.34_am.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_360.png", - "thumb_360_w": 360, - "thumb_360_h": 222, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_480.png", - "thumb_480_w": 480, - "thumb_480_h": 297, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_720.png", - "thumb_720_w": 720, - "thumb_720_h": 445, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_800.png", - "thumb_800_w": 800, - "thumb_800_h": 494, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_960.png", - "thumb_960_w": 960, - "thumb_960_h": 593, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 633, - "original_w": 1756, - "original_h": 1085, - "thumb_tiny": "AwAdADC8etKKQ9acKADFLRRQAUUUUAMPWnCm96UUAOopM+1GaAFopM0ZoA//2Q==", - "permalink": "https://openlineage.slack.com/files/U062WLFMRTP/F062ZFJN2UB/screenshot_2023-10-26_at_10.11.34_am.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F062ZFJN2UB-c8de1a91b2", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U062WLFMRTP", - "display_as_bot": false, - "ts": "1698340358.557159", - "blocks": [ - { - "type": "rich_text", - "block_id": "CRXyh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated" - } - ] - } - ] - } - ], - "client_msg_id": "925bbcea-663c-4480-8809-2e2b3dd06020", - "thread_ts": "1698340358.557159", - "reply_count": 39, - "reply_users_count": 4, - "latest_reply": "1698786461.272129", - "reply_users": [ - "U062WLFMRTP", - "U02S6F54MAB", - "U01RA9B5GG2", - "U04AZ7992SU" - ], - "is_locked": false, - "subscribed": false - }, - "blocks": [ - { - "type": "rich_text", - "block_id": "bMuXM", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "referencing to " - }, - { - "type": "link", - "url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1698398754823079?thread_ts=1698340358.557159&cid=C01CK9T7HKR", - "text": "this" - }, - { - "type": "text", - "text": " conversation - what it takes to move to openlineage provider package from openlineage-airflow. Im updating Airflow to 2.7.2 but moving off of openlineage-airflow to provider package Im trying to estimate the amount of work it takes, any thoughts? reading change_logs I dont think its too much of a change but please share your thoughts and if somewhere its drafted please do share that as well" - } - ] - } - ] - } - ], - "client_msg_id": "0135497a-c1fc-4c07-b449-3c9178cfa2a8" - }, - { - "client_msg_id": "33da5b3d-c122-4b04-b221-702067236066", - "type": "message", - "text": "Generally not much - I would maybe think of a operator coverage. For example, for BigQuery old `openlineage-airflow` supports `BigQueryExecuteQueryOperator`. However, new `apache-airflow-providers-openlineage` supports `BigQueryInsertJobOperator` - because it's intended replacement for `BigQueryExecuteQueryOperator` and Airflow community does not want to accept contributions to deprecated operators.", - "user": "U01RA9B5GG2", - "ts": "1698668470.071199", - "blocks": [ - { - "type": "rich_text", - "block_id": "0fZgM", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Generally not much - I would maybe think of a operator coverage. For example, for BigQuery old " - }, - { - "type": "text", - "text": "openlineage-airflow", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " supports " - }, - { - "type": "text", - "text": "BigQueryExecuteQueryOperator", - "style": { - "code": true - } - }, - { - "type": "text", - "text": ". However, new " - }, - { - "type": "text", - "text": "apache-airflow-providers-openlineage", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " supports " - }, - { - "type": "text", - "text": "BigQueryInsertJobOperator", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " - because it's intended replacement for " - }, - { - "type": "text", - "text": "BigQueryExecuteQueryOperator", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " and Airflow community does not want to accept contributions to deprecated operators." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP", - "reactions": [ - { - "name": "pray", - "users": [ - "U062WLFMRTP" - ], - "count": 1 - } - ] - }, - { - "type": "message", - "subtype": "thread_broadcast", - "text": "one question if someone is around - when im keeping both `openlineage-airflow` and `apache-airflow-providers-openlineage` in my requirement file, i see the following error -\n``` from openlineage.airflow.extractors import Extractors\nModuleNotFoundError: No module named 'openlineage.airflow'```\nany thoughts?", - "user": "U062WLFMRTP", - "ts": "1698778838.540239", - "thread_ts": "1698340358.557159", - "root": { - "type": "message", - "text": "Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated", - "files": [ - { - "id": "F062ZFJN2UB", - "created": 1698340299, - "timestamp": 1698340299, - "name": "Screenshot 2023-10-26 at 10.11.34 AM.png", - "title": "Screenshot 2023-10-26 at 10.11.34 AM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U062WLFMRTP", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 356434, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F062ZFJN2UB/screenshot_2023-10-26_at_10.11.34_am.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F062ZFJN2UB/download/screenshot_2023-10-26_at_10.11.34_am.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_360.png", - "thumb_360_w": 360, - "thumb_360_h": 222, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_480.png", - "thumb_480_w": 480, - "thumb_480_h": 297, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_720.png", - "thumb_720_w": 720, - "thumb_720_h": 445, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_800.png", - "thumb_800_w": 800, - "thumb_800_h": 494, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_960.png", - "thumb_960_w": 960, - "thumb_960_h": 593, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062ZFJN2UB-0fca0a48ea/screenshot_2023-10-26_at_10.11.34_am_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 633, - "original_w": 1756, - "original_h": 1085, - "thumb_tiny": "AwAdADC8etKKQ9acKADFLRRQAUUUUAMPWnCm96UUAOopM+1GaAFopM0ZoA//2Q==", - "permalink": "https://openlineage.slack.com/files/U062WLFMRTP/F062ZFJN2UB/screenshot_2023-10-26_at_10.11.34_am.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F062ZFJN2UB-c8de1a91b2", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U062WLFMRTP", - "display_as_bot": false, - "ts": "1698340358.557159", - "blocks": [ - { - "type": "rich_text", - "block_id": "CRXyh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated" - } - ] - } - ] - } - ], - "client_msg_id": "925bbcea-663c-4480-8809-2e2b3dd06020", - "thread_ts": "1698340358.557159", - "reply_count": 39, - "reply_users_count": 4, - "latest_reply": "1698786461.272129", - "reply_users": [ - "U062WLFMRTP", - "U02S6F54MAB", - "U01RA9B5GG2", - "U04AZ7992SU" - ], - "is_locked": false, - "subscribed": false - }, - "blocks": [ - { - "type": "rich_text", - "block_id": "7zvcN", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "one question if someone is around - when im keeping both " - }, - { - "type": "text", - "text": "openlineage-airflow", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " and " - }, - { - "type": "text", - "text": "apache-airflow-providers-openlineage", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " in my requirement file, i see the following error -\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": " from openlineage.airflow.extractors import Extractors\nModuleNotFoundError: No module named 'openlineage.airflow'" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "any thoughts?" - } - ] - } - ] - } - ], - "client_msg_id": "84068da9-05c1-43a6-97a3-cb24d71c5832" - }, - { - "client_msg_id": "5b3abc4b-da1b-4f45-953a-ce8f48a8e43d", - "type": "message", - "text": "I would usually do a `pip freeze | grep openlineage` as a sanity check to validate that the module is actually installed. Not sure how the provider and the module play together though", - "user": "U04AZ7992SU", - "ts": "1698781027.993429", - "blocks": [ - { - "type": "rich_text", - "block_id": "GlF5T", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I would usually do a " - }, - { - "type": "text", - "text": "pip freeze | grep openlineage", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " as a sanity check to validate that the module is actually installed. Not sure how the provider and the module play together though" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - }, - { - "client_msg_id": "ae9c33b6-583e-464c-afa2-257262faa903", - "type": "message", - "text": "yeah so <@U04AZ7992SU> im not getting how i can use the specific extractor when i run my operator. Say for example, I have custom datawarehouseOperator and i want to override get_openlineage_facets_on_start and get_openlineage_facets_on_complete using the redshift extractor then how would i do that?", - "user": "U062WLFMRTP", - "ts": "1698786461.272129", - "blocks": [ - { - "type": "rich_text", - "block_id": "SRKCJ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yeah so " - }, - { - "type": "user", - "user_id": "U04AZ7992SU" - }, - { - "type": "text", - "text": " im not getting how i can use the specific extractor when i run my operator. Say for example, I have custom datawarehouseOperator and i want to override get_openlineage_facets_on_start and get_openlineage_facets_on_complete using the redshift extractor then how would i do that?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698340358.557159", - "parent_user_id": "U062WLFMRTP" - } - ] - }, - { - "client_msg_id": "b8ca2008-c764-472b-9c2e-7317379aed21", - "type": "message", - "text": "Hello Team", - "user": "U062WLFMRTP", - "ts": "1698340277.847709", - "blocks": [ - { - "type": "rich_text", - "block_id": "1qV9L", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hello Team" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "e233e776-2f62-49df-9b40-c8898974d6ef", - "type": "message", - "text": "Hi I want to customise the events which comes from Openlineage spark . Can some one give some information", - "user": "U062Q95A1FG", - "ts": "1698315220.142929", - "blocks": [ - { - "type": "rich_text", - "block_id": "V6ApU", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi I want to customise the events which comes from Openlineage spark . Can some one give some information" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698315220.142929", - "reply_count": 2, - "reply_users_count": 2, - "latest_reply": "1698328387.014389", - "reply_users": [ - "U02MK6YNAQ5", - "U062Q95A1FG" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "d2353d60-c8ea-43cb-953a-b3287837757e", - "type": "message", - "text": "Hi <@U062Q95A1FG>, please get familiar with `Extending` section on our docs: ", - "user": "U02MK6YNAQ5", - "ts": "1698320741.327129", - "blocks": [ - { - "type": "rich_text", - "block_id": "pwnrB", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi " - }, - { - "type": "user", - "user_id": "U062Q95A1FG" - }, - { - "type": "text", - "text": ", please get familiar with " - }, - { - "type": "text", - "text": "Extending", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " section on our docs: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#extending" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698315220.142929", - "parent_user_id": "U062Q95A1FG" - }, - { - "client_msg_id": "1721245b-dcc9-4d3c-8fba-31f7c672dc6c", - "type": "message", - "text": "Okay thank you. Just checking any other docs or git code which also can help me", - "user": "U062Q95A1FG", - "ts": "1698328387.014389", - "blocks": [ - { - "type": "rich_text", - "block_id": "jvKNE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Okay thank you. Just checking any other docs or git code which also can help me" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1698315220.142929", - "parent_user_id": "U062Q95A1FG" - } - ] - }, - { - "client_msg_id": "45451dcf-7e95-4986-92f0-5bda253f541b", - "type": "message", - "text": "Hi,\n\nWe are using openlineage spark connector. We have used spark 3.2 and scala 2.12 so far. We have triggered a new job with Spark 3.4 and scala 2.13 and faced below exception.\n\n\n```java.lang.NoSuchMethodError: 'scala.collection.Seq org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.map(scala.Function1)'\n\tat io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.lambda$buildInputDatasets$6(OpenLineageRunEventBuilder.java:341)\n\tat java.base/java.util.Optional.map(Optional.java:265)\n\tat io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildInputDatasets(OpenLineageRunEventBuilder.java:339)\n\tat io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.populateRun(OpenLineageRunEventBuilder.java:295)\n\tat io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:279)\n\tat io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:222)\n\tat io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:72)\n\tat io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:91)```", - "user": "U0625RZ7KR9", - "ts": "1697840317.080859", - "blocks": [ - { - "type": "rich_text", - "block_id": "mnVGE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi,\n\nWe are using openlineage spark connector. We have used spark 3.2 and scala 2.12 so far. We have triggered a new job with Spark 3.4 and scala 2.13 and faced below exception.\n\n\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "java.lang.NoSuchMethodError: 'scala.collection.Seq org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.map(scala.Function1)'\n\tat io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.lambda$buildInputDatasets$6(OpenLineageRunEventBuilder.java:341)\n\tat java.base/java.util.Optional.map(Optional.java:265)\n\tat io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildInputDatasets(OpenLineageRunEventBuilder.java:339)\n\tat io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.populateRun(OpenLineageRunEventBuilder.java:295)\n\tat io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:279)\n\tat io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:222)\n\tat io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:72)\n\tat io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:91)" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697840317.080859", - "reply_count": 3, - "reply_users_count": 2, - "latest_reply": "1698144434.527919", - "reply_users": [ - "U02MK6YNAQ5", - "U0625RZ7KR9" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "42e1e997-a71f-4c3a-8692-ef6efc8b53e2", - "type": "message", - "text": "Hmy, that is interesting. Did it occur on databricks runtime? Could you give it a try with Scala 2.12? I think we don't test scala 2.13.", - "user": "U02MK6YNAQ5", - "ts": "1698051385.420489", - "blocks": [ - { - "type": "rich_text", - "block_id": "ydjNi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hmy, that is interesting. Did it occur on databricks runtime? Could you give it a try with Scala 2.12? I think we don't test scala 2.13." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697840317.080859", - "parent_user_id": "U0625RZ7KR9" - }, - { - "client_msg_id": "2bfd49e9-104f-473b-8264-635a9f8d7581", - "type": "message", - "text": "I believe our Scala 2.12 jobs are working fine. It's not databricks runtime. We run Spark on Kube.", - "user": "U0625RZ7KR9", - "ts": "1698076933.905129", - "blocks": [ - { - "type": "rich_text", - "block_id": "GvRot", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I believe our Scala 2.12 jobs are working fine. It's not databricks runtime. We run Spark on Kube." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697840317.080859", - "parent_user_id": "U0625RZ7KR9" - }, - { - "client_msg_id": "81acde74-8d20-41f1-8d8e-175ab917bc11", - "type": "message", - "text": "Ok. I think You can raise an issue to support Scala 2.13 for latest Spark versions.", - "user": "U02MK6YNAQ5", - "ts": "1698144434.527919", - "blocks": [ - { - "type": "rich_text", - "block_id": "cx3Rw", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Ok. I think You can raise an issue to support Scala 2.13 for latest Spark versions." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697840317.080859", - "parent_user_id": "U0625RZ7KR9" - } - ] - }, - { - "type": "message", - "subtype": "thread_broadcast", - "text": "<@U05JY6MN8MS>\nI am trying to contribute to Integration tests which is listed here as \nthe mentions that i can trigger CI for integration tests from forked branch.\n.\nbut i am unable to do so, is there a way to trigger CI from forked brach or do i have to get permission from someone to run the CI?\n\ni am getting this error when i run this command `sudo git-push-fork-to-upstream-branch upstream savannavalgi:hacktober`\n> ```Username for '': savannavalgi\n> Password for '': \n> remote: Permission to OpenLineage/OpenLineage.git denied to savannavalgi.\n> fatal: unable to access '': The requested URL returned error: 403```\ni have tried to configure ssh key\nalso tried to trigger CI from another brach,\nand tried all of this after fetching the latest upstream\n\ncc: <@U05JBHLPY8K> <@U01RA9B5GG2> <@U05HD9G5T17>", - "user": "U05KCF3EEUR", - "ts": "1697805105.047909", - "thread_ts": "1691660447.094739", - "root": { - "client_msg_id": "14a34e34-0a7d-4cb0-8a7b-611253e00187", - "type": "message", - "text": "Hi,\nAre there any ways to save list of string directly in the dataset facets? Such as the *myfacets* field in this dict\n``` \"facets\": {\n \"metadata_facet\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"myfacets\": [\"a\", \"b\", \"c\"]\n }\n }```", - "user": "U05HD9G5T17", - "ts": "1691660447.094739", - "blocks": [ - { - "type": "rich_text", - "block_id": "aL6sO", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi,\nAre there any ways to save list of string directly in the dataset facets? Such as the " - }, - { - "type": "text", - "text": "myfacets ", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "field in this dict\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": " \"facets\": {\n \"metadata_facet\": {\n \"_producer\": \"" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/0.29.2/client/python" - }, - { - "type": "text", - "text": "\",\n \"_schemaURL\": \"" - }, - { - "type": "link", - "url": "https://sth/schemas/facets.json#/definitions/SomeFacet" - }, - { - "type": "text", - "text": "\",\n \"myfacets\": [\"a\", \"b\", \"c\"]\n }\n }" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1691660447.094739", - "reply_count": 29, - "reply_users_count": 4, - "latest_reply": "1700033510.906469", - "reply_users": [ - "U05HD9G5T17", - "U01RA9B5GG2", - "U05KCF3EEUR", - "U02MK6YNAQ5" - ], - "is_locked": false, - "subscribed": false - }, - "blocks": [ - { - "type": "rich_text", - "block_id": "OobjK", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U05JY6MN8MS" - }, - { - "type": "text", - "text": "\nI am trying to contribute to Integration tests which is listed here as " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/2143", - "text": "good first issue" - }, - { - "type": "text", - "text": "\nthe " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/CONTRIBUTING.md#triggering-ci-runs-from-forks-committers", - "text": "CONTRIBUTING.md ", - "unsafe": true - }, - { - "type": "text", - "text": "mentions that i can trigger CI for integration tests from forked branch.\n" - }, - { - "type": "link", - "url": "https://github.com/jklukas/git-push-fork-to-upstream-branch/blob/master/README.md#git-push-fork-to-upstream-branch", - "text": "using this tool" - }, - { - "type": "text", - "text": ".\nbut i am unable to do so, is there a way to trigger CI from forked brach or do i have to get permission from someone to run the CI?\n\ni am getting this error when i run this command " - }, - { - "type": "text", - "text": "sudo git-push-fork-to-upstream-branch upstream savannavalgi:hacktober", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "Username for '" - }, - { - "type": "link", - "url": "https://github.com" - }, - { - "type": "text", - "text": "': savannavalgi\nPassword for '" - }, - { - "type": "link", - "url": "https://savannavalgi@github.com" - }, - { - "type": "text", - "text": "': \nremote: Permission to OpenLineage/OpenLineage.git denied to savannavalgi.\nfatal: unable to access '" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage.git/" - }, - { - "type": "text", - "text": "': The requested URL returned error: 403" - } - ], - "border": 1 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "\ni have tried to configure ssh key\nalso tried to trigger CI from another brach,\nand tried all of this after fetching the latest upstream\n\ncc: " - }, - { - "type": "user", - "user_id": "U05JBHLPY8K" - }, - { - "type": "text", - "text": " " - }, - { - "type": "user", - "user_id": "U01RA9B5GG2" - }, - { - "type": "text", - "text": " " - }, - { - "type": "user", - "user_id": "U05HD9G5T17" - } - ] - } - ] - } - ], - "client_msg_id": "28915632-8aef-451e-8c4a-53e9a9820670" - }, - { - "client_msg_id": "16f5d002-7473-4a9f-9e9c-3a2021e9f62d", - "type": "message", - "text": "Hey all - we've been noticing that some events go unreported by openlineage (spark) when the `AsyncEventQueue` fills up and starts dropping events. Wondering if anyone has experienced this before, and knows why it is happening? We've expanded the event queue capacity and thrown more hardware at the problem but no dice\n\nAlso as a note, the query plans from this job are pretty big - could the listener just be choking up? Happy to open a github issue as well if we suspect that it could be the listener itself having issues", - "user": "U03D8K119LJ", - "ts": "1697742042.953399", - "blocks": [ - { - "type": "rich_text", - "block_id": "ya3F6", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hey all - we've been noticing that some events go unreported by openlineage (spark) when the " - }, - { - "type": "text", - "text": "AsyncEventQueue", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " fills up and starts dropping events. Wondering if anyone has experienced this before, and knows why it is happening? We've expanded the event queue capacity and thrown more hardware at the problem but no dice\n\nAlso as a note, the query plans from this job are pretty big - could the listener just be choking up? Happy to open a github issue as well if we suspect that it could be the listener itself having issues" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697742042.953399", - "reply_count": 3, - "reply_users_count": 3, - "latest_reply": "1698184226.297699", - "reply_users": [ - "U04EZ2LPDV4", - "U01RA9B5GG2", - "U03D8K119LJ" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "9192477b-a1fe-4ecb-bcab-8f9cc3750c42", - "type": "message", - "text": "Hi, just checking, are you excluding the sparkPlan from the events? Or is it sending the spark plan too", - "user": "U04EZ2LPDV4", - "ts": "1697785070.297519", - "blocks": [ - { - "type": "rich_text", - "block_id": "QPMnz", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi, just checking, are you excluding the sparkPlan from the events? Or is it sending the spark plan too" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697742042.953399", - "parent_user_id": "U03D8K119LJ" - }, - { - "client_msg_id": "1e190877-fc18-4887-b52d-2c43032062b2", - "type": "message", - "text": "yeah - setting `spark.openlineage.facets.disabled` to `[spark_unknown;spark.logicalPlan]` should help", - "user": "U01RA9B5GG2", - "ts": "1698076780.844459", - "blocks": [ - { - "type": "rich_text", - "block_id": "BnQB8", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yeah - setting " - }, - { - "type": "text", - "text": "spark.openlineage.facets.disabled", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " to " - }, - { - "type": "text", - "text": "[spark_unknown;spark.logicalPlan]", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " should help" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697742042.953399", - "parent_user_id": "U03D8K119LJ" - }, - { - "client_msg_id": "357b5eb1-5099-476e-8572-d2773e8a3130", - "type": "message", - "text": "sorry for the late reply - turns out this job is just whack :smile: we were going in circles trying to figure it out, we end up dropping events without open lineage enabled at all. But good to know that disabling the logical plan should speed us up if we run into this again", - "user": "U03D8K119LJ", - "ts": "1698184226.297699", - "blocks": [ - { - "type": "rich_text", - "block_id": "sjqYh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "sorry for the late reply - turns out this job is just whack " - }, - { - "type": "emoji", - "name": "smile", - "unicode": "1f604" - }, - { - "type": "text", - "text": " we were going in circles trying to figure it out, we end up dropping events without open lineage enabled at all. But good to know that disabling the logical plan should speed us up if we run into this again" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697742042.953399", - "parent_user_id": "U03D8K119LJ" - } - ] - }, - { - "client_msg_id": "d9fbe7bd-c99b-492c-9cf2-85c33e75eced", - "type": "message", - "text": "Hello All, I am completely new for Openlineage, I have to setup the lab to conduct POC on various aspects like Lineage, metadata management , etc. As per openlineage site, i tried downloading Ubuntu, docker and binary files for Marquez. But I am lost somewhere and unable to configure whole setup. Can someone please assist in steps to start from scratch so that i can delve into the Openlineage capabilities. Many thanks", - "user": "U0616K9TSTZ", - "ts": "1697597823.663129", - "blocks": [ - { - "type": "rich_text", - "block_id": "lC/3j", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hello All, I am completely new for Openlineage, I have to setup the lab to conduct POC on various aspects like Lineage, metadata management , etc. As per openlineage site, i tried downloading Ubuntu, docker and binary files for Marquez. But I am lost somewhere and unable to configure whole setup. Can someone please assist in steps to start from scratch so that i can delve into the Openlineage capabilities. Many thanks" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697597823.663129", - "reply_count": 9, - "reply_users_count": 3, - "latest_reply": "1698149757.690949", - "reply_users": [ - "U02S6F54MAB", - "U02LXF3HUN7", - "U0616K9TSTZ" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1698149757.690949", - "replies": [ - { - "client_msg_id": "6118a046-6700-47a4-bbf7-6096255a0f00", - "type": "message", - "text": "hey, did you try to follow one of these guides?\n", - "user": "U02S6F54MAB", - "ts": "1697607121.185689", - "blocks": [ - { - "type": "rich_text", - "block_id": "YSQ4/", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "hey, did you try to follow one of these guides?\n" - }, - { - "type": "link", - "url": "https://openlineage.io/docs/guides/about" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697597823.663129", - "parent_user_id": "U0616K9TSTZ" - }, - { - "client_msg_id": "4aca26b5-9a81-4272-b4c1-2e1256b8277b", - "type": "message", - "text": "Which guide were you using, and what errors/issues are you encountering?", - "user": "U02LXF3HUN7", - "ts": "1697634848.846019", - "blocks": [ - { - "type": "rich_text", - "block_id": "HiX85", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Which guide were you using, and what errors/issues are you encountering?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697597823.663129", - "parent_user_id": "U0616K9TSTZ" - }, - { - "client_msg_id": "47dc607e-b6d1-42b4-ab76-70c9f1232301", - "type": "message", - "text": "Thanks Jakub for the response.", - "user": "U0616K9TSTZ", - "ts": "1697917394.485199", - "blocks": [ - { - "type": "rich_text", - "block_id": "zwKVv", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks Jakub for the response." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697597823.663129", - "parent_user_id": "U0616K9TSTZ" - }, - { - "type": "message", - "text": "In docker, marquez-api image is not running and exiting with the exit code 127.", - "files": [ - { - "id": "F062L6J1PU1", - "created": 1697917536, - "timestamp": 1697917536, - "name": "image.png", - "title": "image.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U0616K9TSTZ", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 166027, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F062L6J1PU1/image.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F062L6J1PU1/download/image.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062L6J1PU1-5ea4ac9cc0/image_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062L6J1PU1-5ea4ac9cc0/image_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062L6J1PU1-5ea4ac9cc0/image_360.png", - "thumb_360_w": 360, - "thumb_360_h": 141, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062L6J1PU1-5ea4ac9cc0/image_480.png", - "thumb_480_w": 480, - "thumb_480_h": 188, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062L6J1PU1-5ea4ac9cc0/image_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062L6J1PU1-5ea4ac9cc0/image_720.png", - "thumb_720_w": 720, - "thumb_720_h": 282, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062L6J1PU1-5ea4ac9cc0/image_800.png", - "thumb_800_w": 800, - "thumb_800_h": 313, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062L6J1PU1-5ea4ac9cc0/image_960.png", - "thumb_960_w": 960, - "thumb_960_h": 375, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F062L6J1PU1-5ea4ac9cc0/image_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 400, - "original_w": 1918, - "original_h": 750, - "thumb_tiny": "AwASADC4IIu6CgwQ9Ai5qQUz+P8AGndisgFvD3jWj7PD/wA81p9KvU0XYWQz7PF/zzWmG2XPCpj021YpGGRgHFF2FkJTh0ptOHSkMKKKKACiiigD/9k=", - "permalink": "https://openlineage.slack.com/files/U0616K9TSTZ/F062L6J1PU1/image.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F062L6J1PU1-cba5b6d4fe", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U0616K9TSTZ", - "display_as_bot": false, - "ts": "1697917542.984189", - "blocks": [ - { - "type": "rich_text", - "block_id": "BhBOi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "In docker, marquez-api image is not running and exiting with the exit code 127." - } - ] - } - ] - } - ], - "client_msg_id": "7a27a78a-6dd9-4185-bed0-b201bfaed0ec", - "thread_ts": "1697597823.663129", - "parent_user_id": "U0616K9TSTZ" - }, - { - "client_msg_id": "67e5bf58-e710-4dd4-becb-423cefa94d87", - "type": "message", - "text": "<@U0616K9TSTZ> thanks. I don't recognize 127, but 9 times out of 10 if the API or DB container fails the reason is a port conflict. Have you checked if port 5000 is available?", - "user": "U02LXF3HUN7", - "ts": "1697981693.157019", - "blocks": [ - { - "type": "rich_text", - "block_id": "QbuKM", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U0616K9TSTZ" - }, - { - "type": "text", - "text": " thanks. I don't recognize 127, but 9 times out of 10 if the API or DB container fails the reason is a port conflict. Have you checked if port 5000 is available?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697597823.663129", - "parent_user_id": "U0616K9TSTZ" - }, - { - "client_msg_id": "34660cf6-f9c6-4401-9a24-bb5655b70cd9", - "type": "message", - "text": "could you please check what’s the output of\n```git config --get core.autocrlf```\nor\n```git config --global --get core.autocrlf```\n?", - "user": "U02S6F54MAB", - "ts": "1697982850.343079", - "blocks": [ - { - "type": "rich_text", - "block_id": "MohY5", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "could you please check what’s the output of\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "git config --get core.autocrlf" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "or\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "git config --global --get core.autocrlf" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697597823.663129", - "parent_user_id": "U0616K9TSTZ" - }, - { - "client_msg_id": "410daf0d-94de-420d-8379-2aacdc9d3acd", - "type": "message", - "text": "<@U02LXF3HUN7> thanks , I checked the port 5000 is not available.\nI tried deleting docker images and recreating them, but still the same issue persist stating \n/Usr/bin/env bash/r not found.\nGradle build is successful.", - "user": "U0616K9TSTZ", - "ts": "1698149354.900009", - "blocks": [ - { - "type": "rich_text", - "block_id": "r8GTZ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02LXF3HUN7" - }, - { - "type": "text", - "text": " thanks , I checked the port 5000 is not available.\nI tried deleting docker images and recreating them, but still the same issue persist stating \n/Usr/bin/env bash/r not found.\nGradle build is successful." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697597823.663129", - "parent_user_id": "U0616K9TSTZ" - }, - { - "client_msg_id": "bccd4e80-b875-49ce-a005-a684ae3d51a2", - "type": "message", - "text": "<@U02S6F54MAB> thanks, first command resulted as true and second command has no response", - "user": "U0616K9TSTZ", - "ts": "1698149394.624269", - "blocks": [ - { - "type": "rich_text", - "block_id": "AqGIf", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02S6F54MAB" - }, - { - "type": "text", - "text": " thanks, first command resulted as true and second command has no response" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697597823.663129", - "parent_user_id": "U0616K9TSTZ" - }, - { - "client_msg_id": "93acc1f4-0c76-4d27-bdba-c504bd769a81", - "type": "message", - "text": "are you running docker and git in Windows or Mac OS before 10.0?", - "user": "U02S6F54MAB", - "ts": "1698149757.690949", - "blocks": [ - { - "type": "rich_text", - "block_id": "0wwJv", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "are you running docker and git in Windows or Mac OS before 10.0?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02S6F54MAB", - "ts": "1698150048.000000" - }, - "thread_ts": "1697597823.663129", - "parent_user_id": "U0616K9TSTZ" - } - ] - }, - { - "client_msg_id": "6fd21064-3869-4712-a777-db2a9524a72e", - "type": "message", - "text": "Hello All :wave:!\nWe are currently trying to work the the *spark integration for OpenLineage in our Databricks instance*. The general setup is done and working with a few hicups here and there.\nBut one thing we are still struggling is how to link all spark jobs events with a Databricks job or a notebook run.\nWe´ve recently noticed that some of the events produced by OL have the \"environment-properties\" attribute with information (for our context) regarding notebook path (if it is a notebook run), or the the job run ID (if its a databricks job run). But the thing is that _these attributes are not always present._\nI ran some samples yesterday for a job with 4 notebook tasks. From all 20 json payload sent by the OL listener, only 3 presented the \"environment-properties\" attribute. Its not only happening with Databricks jobs. When i run single notebooks and each cell has its onw set of spark jobs, not all json events presented that property either.\n\nSo my question is *what is the criteria to have this attributes present or not in the event json file*? Or maybe this in an issue? <@U05T8BJD4DU> did you find out anything about this?\n\n:gear: Spark 3.4 / OL-Spark 1.4.1", - "user": "U05TU0U224A", - "ts": "1697527077.180169", - "blocks": [ - { - "type": "rich_text", - "block_id": "BLL64", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hello All " - }, - { - "type": "emoji", - "name": "wave", - "unicode": "1f44b" - }, - { - "type": "text", - "text": "!\nWe are currently trying to work the the " - }, - { - "type": "text", - "text": "spark integration for OpenLineage in our Databricks instance", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": ". The general setup is done and working with a few hicups here and there.\nBut one thing we are still struggling is how to link all spark jobs events with a Databricks job or a notebook run.\nWe´ve recently noticed that some of the events produced by OL have the \"environment-properties\" attribute with information (for our context) regarding notebook path (if it is a notebook run), or the the job run ID (if its a databricks job run). But the thing is that " - }, - { - "type": "text", - "text": "these attributes are not always present.", - "style": { - "italic": true - } - }, - { - "type": "text", - "text": "\nI ran some samples yesterday for a job with 4 notebook tasks. From all 20 json payload sent by the OL listener, only 3 presented the \"environment-properties\" attribute. Its not only happening with Databricks jobs. When i run single notebooks and each cell has its onw set of spark jobs, not all json events presented that property either.\n\nSo my question is " - }, - { - "type": "text", - "text": "what is the criteria to have this attributes present or not in the event json file", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "? Or maybe this in an issue? " - }, - { - "type": "user", - "user_id": "U05T8BJD4DU" - }, - { - "type": "text", - "text": " did you find out anything about this?\n\n" - }, - { - "type": "emoji", - "name": "gear", - "unicode": "2699-fe0f" - }, - { - "type": "text", - "text": " Spark 3.4 / OL-Spark 1.4.1" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697527077.180169", - "reply_count": 11, - "reply_users_count": 3, - "latest_reply": "1698655576.941979", - "reply_users": [ - "U02MK6YNAQ5", - "U05T8BJD4DU", - "U05TU0U224A" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "fda3c028-90cd-4685-b800-793940aa4314", - "type": "message", - "text": "In general, we assume that OL events per run are cumulative. So, if you have 20 events with the same `runId` , then even if a single event contains some facet, we consider this is OK and let the backend combine it together. That's what we do in Marquez project (a reference backend architecture for OL) and that's why it is worth to use in Marquez as a rest API.\n\nAre you able to use job `namespace` to aggregate all the Spark actions run within the databricks notebook? This is something that should serve this purpose.", - "user": "U02MK6YNAQ5", - "ts": "1697540147.188109", - "blocks": [ - { - "type": "rich_text", - "block_id": "h3dwF", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "In general, we assume that OL events per run are cumulative. So, if you have 20 events with the same " - }, - { - "type": "text", - "text": "runId", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " , then even if a single event contains some facet, we consider this is OK and let the backend combine it together. That's what we do in Marquez project (a reference backend architecture for OL) and that's why it is worth to use in Marquez as a rest API.\n\nAre you able to use job " - }, - { - "type": "text", - "text": "namespace", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " to aggregate all the Spark actions run within the databricks notebook? This is something that should serve this purpose." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697527077.180169", - "parent_user_id": "U05TU0U224A" - }, - { - "client_msg_id": "ff65cf2d-b75c-460a-a165-bc36665c4b3f", - "type": "message", - "text": "<@U05TU0U224A> for Spark 3.4 I don't see the environment-properties showing up at all, but if you run the code as it is, register a listener on SparkListenerJobStart and get the properties, all of those properties will show up. There's an event filter that filters out the SparkListenerJobStart, I suspect that filtered out the \"unneccessary\" events.. was trying to do a custom build to do that, but still trying to setup Hadoop and Spark on my local", - "user": "U05T8BJD4DU", - "ts": "1697561313.153019", - "blocks": [ - { - "type": "rich_text", - "block_id": "7A+Ai", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U05TU0U224A" - }, - { - "type": "text", - "text": " for Spark 3.4 I don't see the environment-properties showing up at all, but if you run the code as it is, register a listener on SparkListenerJobStart and get the properties, all of those properties will show up. There's an event filter that filters out the SparkListenerJobStart, I suspect that filtered out the \"unneccessary\" events.. was trying to do a custom build to do that, but still trying to setup Hadoop and Spark on my local" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697527077.180169", - "parent_user_id": "U05TU0U224A" - }, - { - "type": "message", - "text": "<@U02MK6YNAQ5> you are right. This is what we are doing as well, combining events with the same runId to process the information on our backend. But even so, there are several runIds without this information. I went through these events to have a better view of what was happening. As you can see from 7 runIds, only 3 were showing the \"environment-properties\" attribute. Some condition is not being met here, or maybe it is what <@U05T8BJD4DU> suspects and there's some sort of filtering of unnecessary events", - "files": [ - { - "id": "F06183ZEM39", - "created": 1697620903, - "timestamp": 1697620903, - "name": "image.png", - "title": "image.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05TU0U224A", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 94544, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F06183ZEM39/image.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F06183ZEM39/download/image.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06183ZEM39-1bd03f07db/image_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06183ZEM39-1bd03f07db/image_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06183ZEM39-1bd03f07db/image_360.png", - "thumb_360_w": 360, - "thumb_360_h": 94, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06183ZEM39-1bd03f07db/image_480.png", - "thumb_480_w": 480, - "thumb_480_h": 125, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06183ZEM39-1bd03f07db/image_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06183ZEM39-1bd03f07db/image_720.png", - "thumb_720_w": 720, - "thumb_720_h": 188, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06183ZEM39-1bd03f07db/image_800.png", - "thumb_800_w": 800, - "thumb_800_h": 209, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06183ZEM39-1bd03f07db/image_960.png", - "thumb_960_w": 960, - "thumb_960_h": 250, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F06183ZEM39-1bd03f07db/image_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 267, - "original_w": 1326, - "original_h": 346, - "thumb_tiny": "AwAMADC4CR0Jo3N6mgdKSgCRfmGTz9aaSA5yM05OlIPvmgBynI44pE+5Tqav3aAP/9k=", - "permalink": "https://openlineage.slack.com/files/U05TU0U224A/F06183ZEM39/image.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F06183ZEM39-eb5ae0b512", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05TU0U224A", - "display_as_bot": false, - "ts": "1697620996.359989", - "blocks": [ - { - "type": "rich_text", - "block_id": "VMnXI", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " you are right. This is what we are doing as well, combining events with the same runId to process the information on our backend. But even so, there are several runIds without this information. I went through these events to have a better view of what was happening. As you can see from 7 runIds, only 3 were showing the \"environment-properties\" attribute. Some condition is not being met here, or maybe it is what " - }, - { - "type": "user", - "user_id": "U05T8BJD4DU" - }, - { - "type": "text", - "text": " suspects and there's some sort of filtering of unnecessary events" - } - ] - } - ] - } - ], - "edited": { - "user": "U05TU0U224A", - "ts": "1697621162.000000" - }, - "client_msg_id": "5dadc8a4-4dde-4110-bc14-fb6395b9d7fd", - "thread_ts": "1697527077.180169", - "parent_user_id": "U05TU0U224A" - }, - { - "client_msg_id": "60fbeb4e-698f-46a8-8c8f-cbe064f527ed", - "type": "message", - "text": "<@U05TU0U224A>, If you are able to provide a small Spark script such that none of the OL events contain the environment-properties, but at least one should, please raise an issue for this.", - "user": "U02MK6YNAQ5", - "ts": "1697696883.665629", - "blocks": [ - { - "type": "rich_text", - "block_id": "pCmkO", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U05TU0U224A" - }, - { - "type": "text", - "text": ", If you are able to provide a small Spark script such that none of the OL events contain the environment-properties, but at least one should, please raise an issue for this." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697527077.180169", - "parent_user_id": "U05TU0U224A" - }, - { - "client_msg_id": "68dff01d-6618-41b9-b079-39d9c6a1b818", - "type": "message", - "text": "It's extremely helpful when community open issues that are not only described well, but also contain small piece of code needed to reproduce this.", - "user": "U02MK6YNAQ5", - "ts": "1697696951.035439", - "blocks": [ - { - "type": "rich_text", - "block_id": "J3sJz", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "It's extremely helpful when community open issues that are not only described well, but also contain small piece of code needed to reproduce this." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697527077.180169", - "parent_user_id": "U05TU0U224A" - }, - { - "client_msg_id": "c2342cdb-bf2b-43cb-9773-43e73083028c", - "type": "message", - "text": "I know. that's the goal. that is why I wanted to understand in the first place if there was any condition preventing this from happening, but now i get that this is not expected behaviour.", - "user": "U05TU0U224A", - "ts": "1697698779.304399", - "blocks": [ - { - "type": "rich_text", - "block_id": "versw", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I know. that's the goal. that is why I wanted to understand in the first place if there was any condition preventing this from happening, but now i get that this is not expected behaviour." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697527077.180169", - "parent_user_id": "U05TU0U224A", - "reactions": [ - { - "name": "+1", - "users": [ - "U02MK6YNAQ5" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "79cc1d29-f02b-41ee-a715-03920f9f4c15", - "type": "message", - "text": "<@U02MK6YNAQ5> <@U05TU0U224A> I am referring to this: ", - "user": "U05T8BJD4DU", - "ts": "1697737440.755279", - "blocks": [ - { - "type": "rich_text", - "block_id": "q5MGb", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " " - }, - { - "type": "user", - "user_id": "U05TU0U224A" - }, - { - "type": "text", - "text": " I am referring to this: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/main/java/io/openlineage/spark/agent/filters/DeltaEventFilter.java#L51" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/main/java/io/openlineage/spark/agent/filters/DeltaEventFilter.java#L51", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\n || isOnJobStartOrEnd(event);\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1697527077.180169", - "parent_user_id": "U05TU0U224A" - }, - { - "client_msg_id": "4667b9d1-fe7f-4f5d-93ed-27ce05a8a033", - "type": "message", - "text": "Please note that I am getting the same behavior, no code is needed, Spark 3.4+ won't be generating no matter what. I have been testing the same code for 2 months from this issue: \n\nI tried the code without OL and it worked perfectly, so it is OL filtering out the event for sure. I will try posting the code I use to collect the properties.", - "user": "U05T8BJD4DU", - "ts": "1697741343.557699", - "blocks": [ - { - "type": "rich_text", - "block_id": "KgtDT", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Please note that I am getting the same behavior, no code is needed, Spark 3.4+ won't be generating no matter what. I have been testing the same code for 2 months from this issue: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/2124" - }, - { - "type": "text", - "text": "\n\nI tried the code without OL and it worked perfectly, so it is OL filtering out the event for sure. I will try posting the code I use to collect the properties." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1695498902, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/issues/2124", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2124 Same Delta Table not catching the location on write", - "text": "*What is the target system?*\n\nSpark / Databricks\n\n*What kind of integration is this?*\n\n☐ Produces OpenLineage metadata\n☐ Consumes OpenLineage metadata\n☐ Something else\n\n*How should this integration be implemented?*\n\nI am using OL 1.2.2, Azure Databricks Runtime 11.3 LTS. When creating a table writing into a ADLS location, OL won't be able to catch the location of the output. But when I read the same object it will be able to read the location as INPUT.\n\nPlease note I have also tested Databricks Runtime 13.3 LTS, Spark 3.4.1 - it will give correct ADLS location in INPUT but the input will only show up once in a blue moon. Most of the time the inputs and outputs are blank.\n\n```\n \"inputs\": [],\n \"outputs\": []\n```\n\n```\nCREATE OR REPLACE TABLE transactions_adj\nUSING DELTA LOCATION ''\nAS\n SELECT\n household_id,\n basket_id,\n week_no,\n day,\n transaction_time,\n store_id,\n product_id,\n amount_list,\n campaign_coupon_discount,\n manuf_coupon_discount,\n manuf_coupon_match_discount,\n total_coupon_discount,\n instore_discount,\n amount_paid,\n units\n FROM (\n SELECT \n household_id,\n basket_id,\n week_no,\n day,\n transaction_time,\n store_id,\n product_id,\n COALESCE(sales_amount - discount_amount - coupon_discount - coupon_discount_match,0.0) as amount_list,\n CASE \n WHEN COALESCE(coupon_discount_match,0.0) = 0.0 THEN -1 * COALESCE(coupon_discount,0.0) \n ELSE 0.0 \n END as campaign_coupon_discount,\n CASE \n WHEN COALESCE(coupon_discount_match,0.0) != 0.0 THEN -1 * COALESCE(coupon_discount,0.0) \n ELSE 0.0 \n END as manuf_coupon_discount,\n -1 * COALESCE(coupon_discount_match,0.0) as manuf_coupon_match_discount,\n -1 * COALESCE(coupon_discount - coupon_discount_match,0.0) as total_coupon_discount,\n COALESCE(-1 * discount_amount,0.0) as instore_discount,\n COALESCE(sales_amount,0.0) as `amount_paid,`\n quantity as units\n FROM transactions\n );\n```\n\nHere's the COMPLETE event:\n\n```\n\n \"outputs\":[\n {\n \"namespace\":\"dbfs\",\n \"name\":\"/user/hive/warehouse/journey.db/transactions_adj\",\n \"facets\":{\n \"dataSource\":{\n \"_producer\":\"\",\n \"_schemaURL\":\"\",\n \"name\":\"dbfs\",\n \"uri\":\"dbfs\"\n },\n\n```\n\nBelow logical plan shows the path:\n\n```\n== Analyzed Logical Plan ==\nnum_affected_rows: bigint, num_inserted_rows: bigint\nReplaceTableAsSelect TableSpec(Map(),Some(DELTA),Map(),Some(),None,None,false,Set()), true\n:- ResolvedIdentifier com.databricks.sql.managedcatalog.UnityCatalogV2Proxy@6251a8df, default.transactions_adj\n+- Project [household_id#184, basket_id#185L, week_no#193, day#186, transaction_time#192, store_id#190, product_id#187, amount_list#147, campaign_coupon_discount#148, manuf_coupon_discount#149, manuf_coupon_match_discount#150, total_coupon_discount#151, instore_discount#152, amount_paid#153, units#154]\n +- SubqueryAlias __auto_generated_subquery_name\n +- Project [household_id#184, basket_id#185L, week_no#193, day#186, transaction_time#192, store_id#190, product_id#187, coalesce(cast((((sales_amount#189 - discount_amount#191) - coupon_discount#194) - coupon_discount_match#195) as double), cast(0.0 as double)) AS amount_list#147, CASE WHEN (coalesce(cast(coupon_discount_match#195 as double), cast(0.0 as double)) = cast(0.0 as double)) THEN (cast(-1 as double) * coalesce(cast(coupon_discount#194 as double), cast(0.0 as double))) ELSE cast(0.0 as double) END AS campaign_coupon_discount#148, CASE WHEN NOT (coalesce(cast(coupon_discount_match#195 as double), cast(0.0 as double)) = cast(0.0 as double)) THEN (cast(-1 as double) * coalesce(cast(coupon_discount#194 as double), cast(0.0 as double))) ELSE cast(0.0 as double) END AS manuf_coupon_discount#149, (cast(-1 as double) * coalesce(cast(coupon_discount_match#195 as double), cast(0.0 as double))) AS manuf_coupon_match_discount#150, (cast(-1 as double) * coalesce(cast((coupon_discount#194 - coupon_discount_match#195) as double), cast(0.0 as double))) AS total_coupon_discount#151, coalesce(cast((cast(-1 as float) * discount_amount#191) as double), cast(0.0 as double)) AS instore_discount#152, coalesce(cast(sales_amount#189 as double), cast(0.0 as double)) AS amount_paid#153, quantity#188 AS units#154]\n +- SubqueryAlias spark_catalog.default.transactions\n +- Relation spark_catalog.default.transactions[household_id#184,basket_id#185L,day#186,product_id#187,quantity#188,sales_amount#189,store_id#190,discount_amount#191,transaction_time#192,week_no#193,coupon_discount#194,coupon_discount_match#195] parquet\n```\n\n*Where should this integration be implemented?*\n\n☐ In the target system\n☐ In the OpenLineage repo\n☐ Somewhere else\n\n*Do you plan to make this contribution yourself?*\n\n☐ I am interested in doing this work", - "title": "#2124 Same Delta Table not catching the location on write", - "title_link": "https://github.com/OpenLineage/OpenLineage/issues/2124", - "footer": "", - "fields": [ - { - "value": "integration/spark, integration/databricks", - "title": "Labels", - "short": true - }, - { - "value": "3", - "title": "Comments", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1697527077.180169", - "parent_user_id": "U05TU0U224A" - }, - { - "client_msg_id": "a8e01f77-a721-4e7e-ad4a-377f292c532a", - "type": "message", - "text": "this code proves that the prosperities are still there, somehow got filtered out by OL:\n\n```%scala\nimport org.apache.spark.scheduler._\n\nclass JobStartListener extends SparkListener {\n override def onJobStart(jobStart: SparkListenerJobStart): Unit = {\n // Extract properties here\n val jobId = jobStart.jobId\n val stageInfos = jobStart.stageInfos\n val properties = jobStart.properties\n\n // You can print properties or save them somewhere\n println(s\"JobId: $jobId, Stages: ${stageInfos.size}, Properties: $properties\")\n }\n}\n\nval listener = new JobStartListener()\nspark.sparkContext.addSparkListener(listener)\n\nval df = spark.range(1000).repartition(10)\ndf.count()```", - "user": "U05T8BJD4DU", - "ts": "1697773577.620419", - "blocks": [ - { - "type": "rich_text", - "block_id": "X9SFZ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "this code proves that the prosperities are still there, somehow got filtered out by OL:\n\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "%scala\nimport org.apache.spark.scheduler._\n\nclass JobStartListener extends SparkListener {\n override def onJobStart(jobStart: SparkListenerJobStart): Unit = {\n // Extract properties here\n val jobId = jobStart.jobId\n val stageInfos = jobStart.stageInfos\n val properties = jobStart.properties\n\n // You can print properties or save them somewhere\n println(s\"JobId: $jobId, Stages: ${stageInfos.size}, Properties: $properties\")\n }\n}\n\nval listener = new JobStartListener()\nspark.sparkContext.addSparkListener(listener)\n\nval df = spark.range(1000).repartition(10)\ndf.count()" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697527077.180169", - "parent_user_id": "U05TU0U224A" - }, - { - "client_msg_id": "ea8f6b5d-4d94-4275-b07e-cb72675cda7e", - "type": "message", - "text": "of course feel free to test this logic as well, it still works -- if not the filtering:\n\n", - "user": "U05T8BJD4DU", - "ts": "1697774105.261429", - "blocks": [ - { - "type": "rich_text", - "block_id": "d99Zk", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "of course feel free to test this logic as well, it still works -- if not the filtering:\n\n" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/main/java/io/openlineage/spark/agent/facets/builder/DatabricksEnvironmentFacetBuilder.java", - "text": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/[…]ark/agent/facets/builder/DatabricksEnvironmentFacetBuilder.java" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05T8BJD4DU", - "ts": "1697774126.000000" - }, - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/shared/src/main/java/io/openlineage/spark/agent/facets/builder/DatabricksEnvironmentFacetBuilder.java", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\n/*\n/* Copyright 2018-2023 contributors to the OpenLineage project\n/* SPDX-License-Identifier: Apache-2.0\n*/\n\npackage io.openlineage.spark.agent.facets.builder;\n\nimport com.databricks.backend.daemon.dbutils.MountInfo;\nimport com.databricks.dbutils_v1.DbfsUtils;\nimport io.openlineage.spark.agent.facets.EnvironmentFacet;\nimport io.openlineage.spark.agent.models.DatabricksMountpoint;\nimport io.openlineage.spark.api.CustomFacetBuilder;\nimport io.openlineage.spark.api.OpenLineageContext;\nimport java.lang.reflect.Constructor;\nimport java.lang.reflect.InvocationTargetException;\nimport java.lang.reflect.Parameter;\nimport java.util.ArrayList;\nimport java.util.Arrays;\nimport java.util.HashMap;\nimport java.util.List;\nimport java.util.Map;\nimport java.util.Optional;\nimport java.util.function.BiConsumer;\nimport lombok.extern.slf4j.Slf4j;\nimport org.apache.spark.scheduler.SparkListenerJobStart;\nimport scala.collection.JavaConversions;\n\n/**\n * {@link CustomFacetBuilder} that generates a {@link EnvironmentFacet} when using OpenLineage on\n * Databricks.\n */\n@Slf4j\npublic class DatabricksEnvironmentFacetBuilder\n extends CustomFacetBuilder {\n private Map dbProperties;\n private Class dbutilsClass;\n private DbfsUtils dbutils;\n\n public static boolean isDatabricksRuntime() {\n return System.getenv().containsKey(\"DATABRICKS_RUNTIME_VERSION\");\n }\n\n public DatabricksEnvironmentFacetBuilder() {}\n\n public DatabricksEnvironmentFacetBuilder(OpenLineageContext openLineageContext) {\n dbProperties = new HashMap<>();\n // extract some custom environment variables if needed\n openLineageContext\n .getCustomEnvironmentVariables()\n .ifPresent(\n envVars ->\n envVars.forEach(envVar -> dbProperties.put(envVar, System.getenv().get(envVar))));\n }\n\n @Override\n protected void build(\n SparkListenerJobStart event, BiConsumer consumer) {\n consumer.accept(\n \"environment-properties\",\n new EnvironmentFacet(getDatabricksEnvironmentalAttributes(event)));\n }\n\n private Map getDatabricksEnvironmentalAttributes(SparkListenerJobStart jobStart) {\n if (dbProperties == null) {\n dbProperties = new HashMap<>();\n }\n\n // These are useful properties to extract if they are available\n List dbPropertiesKeys =\n Arrays.asList(\n \"orgId\",\n \"spark.databricks.clusterUsageTags.clusterOwnerOrgId\",\n \"spark.databricks.notebook.path\",\n \"spark.databricks.job.type\",\n \"spark.databricks.job.id\",\n \"spark.databricks.job.runId\",\n \"user\",\n \"userId\",\n \"spark.databricks.clusterUsageTags.clusterName\",\n \"spark.databricks.clusterUsageTags.clusterAllTags\",\n \"spark.databricks.clusterUsageTags.azureSubscriptionId\");\n dbPropertiesKeys.stream()\n .forEach(\n (p) -> {\n dbProperties.put(p, jobStart.properties().getProperty(p));\n });\n\n /**\n * Azure Databricks makes available a dbutils mount point to list aliased paths to cloud\n * storage. However, that dbutils object is not available inside a spark listener. We must\n * access it via reflection.\n */\n try {\n Optional dbfsUtils = getDbfsUtils();\n if (!dbfsUtils.isPresent()) {\n dbProperties.put(\"mountPoints\", new ArrayList());\n } else {\n dbProperties.put(\"mountPoints\", getDatabricksMountpoints(dbfsUtils.get()));\n }\n\n } catch (Exception e) {\n log.warn(\"Failed to load dbutils in OpenLineageListener:\", e);\n dbProperties.put(\"mountPoints\", new ArrayList());\n }\n return dbProperties;\n }\n\n // Starting in Databricks Runtime 11, there is a new constructor for DbFsUtils\n // If running on an older version, the constructor has no parameters.\n // If running on DBR 11 or above, you need to specify whether you allow mount operations (true or\n // false)\n private static Optional getDbfsUtils()\n throws ClassNotFoundException, InstantiationException, IllegalAccessException,\n IllegalArgumentException, InvocationTargetException {\n Class dbutilsClass = Class.forName(\"com.databricks.dbutils_v1.impl.DbfsUtilsImpl\");\n Constructor[] dbutilsConstructors = dbutilsClass.getDeclaredConstructors();\n if (dbutilsConstructors.length == 0) {\n log.warn(\n \"Failed to load dbutils in OpenLineageListener as there were no declared constructors\");\n return Optional.empty();\n }\n Constructor firstConstructor = dbutilsConstructors[0];\n Parameter[] constructorParams = firstConstructor.getParameters();\n if (constructorParams.length == 0) {\n log.debug(\"DbUtils constructor had no parameters\");\n return Optional.of((DbfsUtils) firstConstructor.newInstance());\n } else if (constructorParams.length == 1\n && constructorParams[0].getName().equals(\"allowMountOperations\")) {\n log.debug(\"DbUtils constructor had one parameter named allowMountOperations\");\n return Optional.of((DbfsUtils) firstConstructor.newInstance(true));\n } else {\n log.warn(\n \"dbutils had {} constructors and the first constructor had {} params\",\n dbutilsConstructors.length,\n constructorParams.length);\n return Optional.empty();\n }\n }\n\n private static List getDatabricksMountpoints(DbfsUtils dbutils) {\n List mountpoints = new ArrayList<>();\n List mountsList = JavaConversions.seqAsJavaList(dbutils.mounts());\n for (MountInfo mount : mountsList) {\n mountpoints.add(new DatabricksMountpoint(mount.mountPoint(), mount.source()));\n }\n return mountpoints;\n }\n}\n\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1697527077.180169", - "parent_user_id": "U05TU0U224A" - }, - { - "client_msg_id": "5541ebd3-b058-4421-854a-69da0f2fadb5", - "type": "message", - "text": "Any ideas on how could i test it?", - "user": "U05TU0U224A", - "ts": "1698655576.941979", - "blocks": [ - { - "type": "rich_text", - "block_id": "/YRlt", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Any ideas on how could i test it?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697527077.180169", - "parent_user_id": "U05TU0U224A" - } - ] - }, - { - "client_msg_id": "fe0af91b-6ec2-4d34-b71e-aea9c88ce2c4", - "type": "message", - "text": "Hi team, I am running the following pyspark code in a cell:\n```print(\"SELECTING 100 RECORDS FROM METADATA TABLE\")\ndf = spark.sql(\"\"\"select * from
limit 100\"\"\")\n\nprint(\"WRITING (1) 100 RECORDS FROM METADATA TABLE\")\ndf.write.mode(\"overwrite\").format('delta').save(\"\")\ndf.createOrReplaceTempView(\"temp_metadata\")\n\nprint(\"WRITING (2) 100 RECORDS FROM METADATA TABLE\")\ndf.write.mode(\"overwrite\").format(\"delta\").save(\"\")\n\nprint(\"READING (1) 100 RECORDS FROM METADATA TABLE\")\ndf_read = spark.read.format('delta').load(\"\")\ndf_read.createOrReplaceTempView(\"metadata_1\")\n\nprint(\"DOING THE MERGE INTO SQL STEP!\")\ndf_new = spark.sql(\"\"\"\n MERGE INTO metadata_1\n USING
\n ON metadata_1.id = temp_metadata.id\n WHEN MATCHED THEN UPDATE SET \n metadata_1.id = temp_metadata.id,\n metadata_1.aspect = temp_metadata.aspect\n WHEN NOT MATCHED THEN INSERT (id, aspect) \n VALUES (temp_metadata.id, temp_metadata.aspect)\n\"\"\")```\nI am running with debug log levels. I actually don't see any of the events being logged for `SaveIntoDataSourceCommand` or the `MergeIntoCommand`, but OL is in fact emitting events to the backend. It seems like the events are just not being logged... I actually observe this for all delta table related spark sql queries...", - "user": "U04EZ2LPDV4", - "ts": "1697179720.032079", - "blocks": [ - { - "type": "rich_text", - "block_id": "NqsvT", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi team, I am running the following pyspark code in a cell:\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "print(\"SELECTING 100 RECORDS FROM METADATA TABLE\")\ndf = spark.sql(\"\"\"select * from
limit 100\"\"\")\n\nprint(\"WRITING (1) 100 RECORDS FROM METADATA TABLE\")\ndf.write.mode(\"overwrite\").format('delta').save(\"\")\ndf.createOrReplaceTempView(\"temp_metadata\")\n\nprint(\"WRITING (2) 100 RECORDS FROM METADATA TABLE\")\ndf.write.mode(\"overwrite\").format(\"delta\").save(\"\")\n\nprint(\"READING (1) 100 RECORDS FROM METADATA TABLE\")\ndf_read = spark.read.format('delta').load(\"\")\ndf_read.createOrReplaceTempView(\"metadata_1\")\n\nprint(\"DOING THE MERGE INTO SQL STEP!\")\ndf_new = spark.sql(\"\"\"\n MERGE INTO metadata_1\n USING
\n ON metadata_1.id = temp_metadata.id\n WHEN MATCHED THEN UPDATE SET \n metadata_1.id = temp_metadata.id,\n metadata_1.aspect = temp_metadata.aspect\n WHEN NOT MATCHED THEN INSERT (id, aspect) \n VALUES (temp_metadata.id, temp_metadata.aspect)\n\"\"\")" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I am running with debug log levels. I actually don't see any of the events being logged for " - }, - { - "type": "text", - "text": "SaveIntoDataSourceCommand", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " or the " - }, - { - "type": "text", - "text": "MergeIntoCommand", - "style": { - "code": true - } - }, - { - "type": "text", - "text": ", but OL is in fact emitting events to the backend. It seems like the events are just not being logged... I actually observe this for all delta table related spark sql queries..." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U04EZ2LPDV4", - "ts": "1697428845.000000" - }, - "thread_ts": "1697179720.032079", - "reply_count": 6, - "reply_users_count": 2, - "latest_reply": "1698310610.069939", - "reply_users": [ - "U04EZ2LPDV4", - "U05FLJE4GDU" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "2192ef5f-6edb-41ea-ab67-74a24936b1e5", - "type": "message", - "text": "Hi <@U02MK6YNAQ5> is this expected? CMIIW but we should expect to see the events being logged when running with debug log level right?", - "user": "U04EZ2LPDV4", - "ts": "1697428902.144169", - "blocks": [ - { - "type": "rich_text", - "block_id": "MjNBU", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi " - }, - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " is this expected? CMIIW but we should expect to see the events being logged when running with debug log level right?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697179720.032079", - "parent_user_id": "U04EZ2LPDV4" - }, - { - "client_msg_id": "33219d08-21e7-4f93-91a5-7003dc0afe9c", - "type": "message", - "text": "It's impossible to know without seeing how you've configured the listener.\n\nCan you show this configuration?", - "user": "U05FLJE4GDU", - "ts": "1697444250.594509", - "blocks": [ - { - "type": "rich_text", - "block_id": "cDgl4", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "It's impossible to know without seeing how you've configured the listener.\n\nCan you show this configuration?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697179720.032079", - "parent_user_id": "U04EZ2LPDV4" - }, - { - "client_msg_id": "8645ab34-2922-4e1b-9a0f-993e64435b62", - "type": "message", - "text": "```spark.openlineage.transport.url <url>\nspark.openlineage.transport.endpoint /<endpoint>\nspark.openlineage.transport.type http\nspark.extraListeners io.openlineage.spark.agent.OpenLineageSparkListener\nspark.openlineage.facets.custom_environment_variables [BUNCH_OF_VARIABLES;]\nspark.openlineage.facets.disabled [spark_unknown\\;spark.logicalPlan]```\nThese are my spark configs... I'm setting log level to debug with `sc.setLogLevel(\"DEBUG\")`", - "user": "U04EZ2LPDV4", - "ts": "1697526920.491239", - "blocks": [ - { - "type": "rich_text", - "block_id": "NF5xw", - "elements": [ - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "spark.openlineage.transport.url \nspark.openlineage.transport.endpoint /\nspark.openlineage.transport.type http\nspark.extraListeners io.openlineage.spark.agent.OpenLineageSparkListener\nspark.openlineage.facets.custom_environment_variables [BUNCH_OF_VARIABLES;]\nspark.openlineage.facets.disabled [spark_unknown\\;spark.logicalPlan]" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "These are my spark configs... I'm setting log level to debug with " - }, - { - "type": "text", - "text": "sc.setLogLevel(\"DEBUG\")", - "style": { - "code": true - } - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697179720.032079", - "parent_user_id": "U04EZ2LPDV4" - }, - { - "client_msg_id": "b7b05cd0-b076-4719-87cc-84ae866cedaf", - "type": "message", - "text": "Two things:\n\n1. If you want debug logs, you're going to have to provide a `log4j.properties` file or `log4j2.properties` file depending on the version of spark you're running. In that file, you will need to configure the logging levels. If I am not mistaken, the `sc.setLogLevel` controls ONLY the log levels of Spark namespaced components (i.e., `org.apache.spark`)\n2. You're telling the listener to emit to a URL. If you want to see the events emitted to the console, then set `spark.openlineage.transport.type=console`, and remove the other `spark.openlineage.transport.*` configurations.\nDo either (1) or (2).", - "user": "U05FLJE4GDU", - "ts": "1697532003.340519", - "blocks": [ - { - "type": "rich_text", - "block_id": "BJmP5", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Two things:\n\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "If you want debug logs, you're going to have to provide a " - }, - { - "type": "text", - "text": "log4j.properties", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " file or " - }, - { - "type": "text", - "text": "log4j2.properties", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " file depending on the version of spark you're running. In that file, you will need to configure the logging levels. If I am not mistaken, the " - }, - { - "type": "text", - "text": "sc.setLogLevel", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " controls ONLY the log levels of Spark namespaced components (i.e., " - }, - { - "type": "text", - "text": "org.apache.spark", - "style": { - "code": true - } - }, - { - "type": "text", - "text": ")" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "You're telling the listener to emit to a URL. If you want to see the events emitted to the console, then set " - }, - { - "type": "text", - "text": "spark.openlineage.transport.type=console", - "style": { - "code": true - } - }, - { - "type": "text", - "text": ", and remove the other " - }, - { - "type": "text", - "text": "spark.openlineage.transport.*", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " configurations." - } - ] - } - ], - "style": "ordered", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "\nDo either (1) or (2)." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697179720.032079", - "parent_user_id": "U04EZ2LPDV4" - }, - { - "client_msg_id": "b78fafde-258c-4317-bb95-8dca99235ab6", - "type": "message", - "text": "<@U05FLJE4GDU> Hi, sflr.\n1. So enabling `sc.setLogLevel` does actually enable debug logs from Openlineage. I can see the events and everyting being logged if I save it as a parquet format instead of delta. \n2. I do want to emit events to the url. But, I would like to just see what exactly are the events being emitted for some specific jobs, since I see that the lineage is incorrect for some MergeInto cases", - "user": "U04EZ2LPDV4", - "ts": "1697777385.439859", - "blocks": [ - { - "type": "rich_text", - "block_id": "g8Thq", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U05FLJE4GDU" - }, - { - "type": "text", - "text": " Hi, sflr.\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "So enabling " - }, - { - "type": "text", - "text": "sc.setLogLevel", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " does actually enable debug logs from Openlineage. I can see the events and everyting being logged if I save it as a parquet format instead of delta. " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I do want to emit events to the url. But, I would like to just see what exactly are the events being emitted for some specific jobs, since I see that the lineage is incorrect for some MergeInto cases" - } - ] - } - ], - "style": "ordered", - "indent": 0, - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U04EZ2LPDV4", - "ts": "1697777397.000000" - }, - "thread_ts": "1697179720.032079", - "parent_user_id": "U04EZ2LPDV4" - }, - { - "client_msg_id": "11b61616-c248-4409-a823-38e839f30dca", - "type": "message", - "text": "Hi <@U05FLJE4GDU> would like to check again on whether you'd have any thoughts about this... Thanks! :slightly_smiling_face:", - "user": "U04EZ2LPDV4", - "ts": "1698310610.069939", - "blocks": [ - { - "type": "rich_text", - "block_id": "eKcmt", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi " - }, - { - "type": "user", - "user_id": "U05FLJE4GDU" - }, - { - "type": "text", - "text": " would like to check again on whether you'd have any thoughts about this... Thanks! " - }, - { - "type": "emoji", - "name": "slightly_smiling_face", - "unicode": "1f642" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697179720.032079", - "parent_user_id": "U04EZ2LPDV4" - } - ] - }, - { - "client_msg_id": "7e2e374d-b98a-475c-80af-541046c52476", - "type": "message", - "text": "This might be a dumb question, I guess I need to setup local Spark in order for the Spark tests to run successfully?", - "user": "U05T8BJD4DU", - "ts": "1697137714.503349", - "blocks": [ - { - "type": "rich_text", - "block_id": "U1OfA", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "This might be a dumb question, I guess I need to setup local Spark in order for the Spark tests to run successfully?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697137714.503349", - "reply_count": 15, - "reply_users_count": 3, - "latest_reply": "1698025583.327319", - "reply_users": [ - "U02MK6YNAQ5", - "U05QL7LN2GH", - "U05T8BJD4DU" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "d14cb78f-6549-4460-a171-fb04e26bb044", - "type": "message", - "text": "just follow these instructions: ", - "user": "U02MK6YNAQ5", - "ts": "1697176579.973179", - "blocks": [ - { - "type": "rich_text", - "block_id": "ouUqx", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "just follow these instructions: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#build" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697137714.503349", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "e6dcc9be-50b2-408c-ae02-6f4072d94a2e", - "type": "message", - "text": "when trying to install openlineage-java in local via this command --> cd ../../client/java/ && ./gradlew publishToMavenLocal, i am receiving this error\n```> Task :signMavenJavaPublication FAILED\n\nFAILURE: Build failed with an exception.\n\n* What went wrong:\nExecution failed for task ':signMavenJavaPublication'.\n> Cannot perform signing task ':signMavenJavaPublication' because it has no configured signatory```", - "user": "U05QL7LN2GH", - "ts": "1697193716.992739", - "blocks": [ - { - "type": "rich_text", - "block_id": "Ebo87", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "when trying to install openlineage-java in local via this command --> cd ../../client/java/ && ./gradlew publishToMavenLocal, i am receiving this error\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "> Task :signMavenJavaPublication FAILED\n\nFAILURE: Build failed with an exception.\n\n* What went wrong:\nExecution failed for task ':signMavenJavaPublication'.\n> Cannot perform signing task ':signMavenJavaPublication' because it has no configured signatory" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697137714.503349", - "parent_user_id": "U05T8BJD4DU" - }, - { - "type": "message", - "text": "<@U02MK6YNAQ5> this is what I am getting", - "files": [ - { - "id": "F061S5ZMF08", - "created": 1697218504, - "timestamp": 1697218504, - "name": "image.png", - "title": "image.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05T8BJD4DU", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 107167, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F061S5ZMF08/image.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F061S5ZMF08/download/image.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F061S5ZMF08-85392c9570/image_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F061S5ZMF08-85392c9570/image_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F061S5ZMF08-85392c9570/image_360.png", - "thumb_360_w": 360, - "thumb_360_h": 60, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F061S5ZMF08-85392c9570/image_480.png", - "thumb_480_w": 480, - "thumb_480_h": 80, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F061S5ZMF08-85392c9570/image_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F061S5ZMF08-85392c9570/image_720.png", - "thumb_720_w": 720, - "thumb_720_h": 120, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F061S5ZMF08-85392c9570/image_800.png", - "thumb_800_w": 800, - "thumb_800_h": 133, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F061S5ZMF08-85392c9570/image_960.png", - "thumb_960_w": 960, - "thumb_960_h": 160, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F061S5ZMF08-85392c9570/image_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 171, - "original_w": 2779, - "original_h": 463, - "thumb_tiny": "AwAHADCpkelJx6UUlABSZoooAM0A80lKOtAH/9k=", - "permalink": "https://openlineage.slack.com/files/U05T8BJD4DU/F061S5ZMF08/image.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F061S5ZMF08-b83e34fc5b", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05T8BJD4DU", - "display_as_bot": false, - "ts": "1697218506.981419", - "blocks": [ - { - "type": "rich_text", - "block_id": "d1Buh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " this is what I am getting" - } - ] - } - ] - } - ], - "client_msg_id": "a357d38b-8e93-4925-a1b4-0713d750b564", - "thread_ts": "1697137714.503349", - "parent_user_id": "U05T8BJD4DU" - }, - { - "type": "message", - "text": "attaching the html", - "files": [ - { - "id": "F061FR39UE5", - "created": 1697218554, - "timestamp": 1697218554, - "name": "io.openlineage.spark.agent.lifecycle.plan.AlterTableAddPartitionCommandVisitorTest.html", - "title": "io.openlineage.spark.agent.lifecycle.plan.AlterTableAddPartitionCommandVisitorTest.html", - "mimetype": "text/plain", - "filetype": "html", - "pretty_type": "HTML", - "user": "U05T8BJD4DU", - "user_team": "T01CWUYP5AR", - "editable": true, - "size": 17191, - "mode": "snippet", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F061FR39UE5/io.openlineage.spark.agent.lifecycle.plan.altertableaddpartitioncommandvisitortest.html", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F061FR39UE5/download/io.openlineage.spark.agent.lifecycle.plan.altertableaddpartitioncommandvisitortest.html", - "permalink": "https://openlineage.slack.com/files/U05T8BJD4DU/F061FR39UE5/io.openlineage.spark.agent.lifecycle.plan.altertableaddpartitioncommandvisitortest.html", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F061FR39UE5-b2ba858587", - "edit_link": "https://openlineage.slack.com/files/U05T8BJD4DU/F061FR39UE5/io.openlineage.spark.agent.lifecycle.plan.altertableaddpartitioncommandvisitortest.html/edit", - "preview": "\r\n\r\n\r\n\r\n\r", - "preview_highlight": "
\n
\n
<!DOCTYPE html>
\n
<html>
\n
<head>
\n
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
\n
<meta http-equiv="x-ua-compatible" content="IE=edge"/>
\n
\n
\n", - "lines": 244, - "lines_more": 239, - "preview_is_truncated": true, - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05T8BJD4DU", - "display_as_bot": false, - "ts": "1697218560.627389", - "blocks": [ - { - "type": "rich_text", - "block_id": "mRJBE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "attaching the html" - } - ] - } - ] - } - ], - "client_msg_id": "69b264e3-7415-433b-bcad-a449fe6d4fc5", - "thread_ts": "1697137714.503349", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "0b50bf9f-40c4-4bf5-96e8-41be9cf2c202", - "type": "message", - "text": "which java are you using? what is your operation system (is it windows?)?", - "user": "U02MK6YNAQ5", - "ts": "1697439733.216699", - "blocks": [ - { - "type": "rich_text", - "block_id": "/d7dd", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "which java are you using? what is your operation system (is it windows?)?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697137714.503349", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "ade3fbfa-0b80-4eb6-99d3-07b6e59ffdaf", - "type": "message", - "text": "yes it is Windows, i downloaded java 8 but I can try to build it with Linux subsystem or Mac", - "user": "U05T8BJD4DU", - "ts": "1697441718.743329", - "blocks": [ - { - "type": "rich_text", - "block_id": "rdTqb", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yes it is Windows, i downloaded java 8 but I can try to build it with Linux subsystem or Mac" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697137714.503349", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "ccf03d2b-686c-46d0-ae2e-dc2ddbe80330", - "type": "message", - "text": "In my case it is Mac", - "user": "U05QL7LN2GH", - "ts": "1697441751.675019", - "blocks": [ - { - "type": "rich_text", - "block_id": "TQvZ2", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "In my case it is Mac" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697137714.503349", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "167256fa-fa11-4a98-a407-da6ece5878be", - "type": "message", - "text": "* Where:\nBuild file '/mnt/c/Users/jason/Downloads/github/OpenLineage/integration/spark/build.gradle' line: 9\n\n* What went wrong:\nAn exception occurred applying plugin request [id: 'com.adarshr.test-logger', version: '3.2.0']\n> Failed to apply plugin [id 'com.adarshr.test-logger']\n > Could not generate a proxy class for class com.adarshr.gradle.testlogger.TestLoggerExtension.\n\n* Try:", - "user": "U05T8BJD4DU", - "ts": "1697442969.487219", - "blocks": [ - { - "type": "rich_text", - "block_id": "oi3kW", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "* Where:\nBuild file '/mnt/c/Users/jason/Downloads/github/OpenLineage/integration/spark/build.gradle' line: 9\n\n* What went wrong:\nAn exception occurred applying plugin request [id: 'com.adarshr.test-logger', version: '3.2.0']\n> Failed to apply plugin [id 'com.adarshr.test-logger']\n > Could not generate a proxy class for class com.adarshr.gradle.testlogger.TestLoggerExtension.\n\n* Try:" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697137714.503349", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "9a25cde0-5748-40eb-a40c-ca6f7cd93514", - "type": "message", - "text": "tried with Linux subsystem", - "user": "U05T8BJD4DU", - "ts": "1697442983.551079", - "blocks": [ - { - "type": "rich_text", - "block_id": "Knyq2", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "tried with Linux subsystem" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697137714.503349", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "aabbc5df-cec7-4979-aca6-e0a716b7d7e9", - "type": "message", - "text": "we don't have any restrictions for windows builds, however it is something we don't test regularly. 2h ago we did have a successful build on circle CI ", - "user": "U02MK6YNAQ5", - "ts": "1697443469.020679", - "blocks": [ - { - "type": "rich_text", - "block_id": "GfkaO", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "we don't have any restrictions for windows builds, however it is something we don't test regularly. 2h ago we did have a successful build on circle CI " - }, - { - "type": "link", - "url": "https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/8271/workflows/0ec521ae-cd21-444a-bfec-554d101770ea" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697137714.503349", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "de82063b-d85b-4d99-9675-3d73375d9fd9", - "type": "message", - "text": "... 111 more\nCaused by: java.lang.ClassNotFoundException: org.gradle.api.provider.HasMultipleValues\n ... 117 more", - "user": "U05T8BJD4DU", - "ts": "1697443984.091359", - "blocks": [ - { - "type": "rich_text", - "block_id": "yNzSD", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "... 111 more\nCaused by: java.lang.ClassNotFoundException: org.gradle.api.provider.HasMultipleValues\n ... 117 more" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697137714.503349", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "77981fd1-5be7-4a24-aedd-a6d8a94c4c09", - "type": "message", - "text": "<@U02MK6YNAQ5> now I am doing gradlew instead of gradle on windows coz Linux one doesn't work. The doc didn't mention about setting up Spark / Hadoop and that's my original question -- do I need to setup local Spark? Now it's throwing an error on Hadoop: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.", - "user": "U05T8BJD4DU", - "ts": "1697516767.807949", - "blocks": [ - { - "type": "rich_text", - "block_id": "PqJ+j", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " now I am doing gradlew instead of gradle on windows coz Linux one doesn't work. The doc didn't mention about setting up Spark / Hadoop and that's my original question -- do I need to setup local Spark? Now it's throwing an error on Hadoop: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697137714.503349", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "5f6c4714-ae33-41d6-ad0b-18efd81e66f9", - "type": "message", - "text": "Got it working with Mac, couldn't get it working with Windows / Linux subsystem", - "user": "U05T8BJD4DU", - "ts": "1697945628.208239", - "blocks": [ - { - "type": "rich_text", - "block_id": "mYifP", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Got it working with Mac, couldn't get it working with Windows / Linux subsystem" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697137714.503349", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "2cabcbc1-35a9-4380-91b5-8d4020b25075", - "type": "message", - "text": "Now getting class not found despite build and test succeeded", - "user": "U05T8BJD4DU", - "ts": "1697994520.647229", - "blocks": [ - { - "type": "rich_text", - "block_id": "Mceh7", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Now getting class not found despite build and test succeeded" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697137714.503349", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "ee3c9260-7865-4e88-a230-c398e578ecc6", - "type": "message", - "text": "I uploaded the wrong jar.. there are so many jars, only the jar in the spark folder works, not subfolder", - "user": "U05T8BJD4DU", - "ts": "1698025583.327319", - "blocks": [ - { - "type": "rich_text", - "block_id": "Ix2VZ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I uploaded the wrong jar.. there are so many jars, only the jar in the spark folder works, not subfolder" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697137714.503349", - "parent_user_id": "U05T8BJD4DU" - } - ] - }, - { - "client_msg_id": "340c47c3-a98d-48d8-a43d-2a7cda30c1e0", - "type": "message", - "text": "\nFriendly reminder: this month’s TSC meeting, open to all, is tomorrow at 10 am PT: ", - "user": "U02LXF3HUN7", - "ts": "1697043601.182719", - "blocks": [ - { - "type": "rich_text", - "block_id": "p7fLH", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nFriendly reminder: this month’s TSC meeting, open to all, is tomorrow at 10 am PT: " - }, - { - "type": "link", - "url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1696531454431629" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1696531454431629", - "ts": "1696531454.431629", - "author_id": "U02LXF3HUN7", - "channel_id": "C01CK9T7HKR", - "channel_team": "T01CWUYP5AR", - "is_msg_unfurl": true, - "message_blocks": [ - { - "team": "T01CWUYP5AR", - "channel": "C01CK9T7HKR", - "ts": "1696531454.431629", - "message": { - "blocks": [ - { - "type": "rich_text", - "block_id": "TBGz6", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "\nThis month’s TSC meeting is next Thursday the 12th at 10am PT. On the tentative agenda:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "announcements" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "recent releases" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Airflow Summit recap" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "tutorial: migrating to the Airflow Provider" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "discussion topic: observability for OpenLineage/Marquez" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "open discussion" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "more (TBA)" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "More info and the meeting link can be found on the " - }, - { - "type": "link", - "url": "https://openlineage.io/meetings/", - "text": "website" - }, - { - "type": "text", - "text": ". All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda." - } - ] - } - ] - } - ] - } - } - ], - "id": 1, - "original_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1696531454431629", - "fallback": "[October 5th, 2023 11:44 AM] michael282: **\nThis month’s TSC meeting is next Thursday the 12th at 10am PT. On the tentative agenda:\n• announcements\n• recent releases\n• Airflow Summit recap\n• tutorial: migrating to the Airflow Provider\n• discussion topic: observability for OpenLineage/Marquez\n• open discussion\n• more (TBA)\nMore info and the meeting link can be found on the . All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda.", - "text": "**\nThis month’s TSC meeting is next Thursday the 12th at 10am PT. On the tentative agenda:\n• announcements\n• recent releases\n• Airflow Summit recap\n• tutorial: migrating to the Airflow Provider\n• discussion topic: observability for OpenLineage/Marquez\n• open discussion\n• more (TBA)\nMore info and the meeting link can be found on the . All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda.", - "author_name": "Michael Robinson", - "author_link": "https://openlineage.slack.com/team/U02LXF3HUN7", - "author_icon": "https://avatars.slack-edge.com/2022-01-25/3019716733729_66fea720e9504dc08144_48.jpg", - "author_subname": "Michael Robinson", - "mrkdwn_in": [ - "text" - ], - "footer": "Slack Conversation" - } - ], - "thread_ts": "1697043601.182719", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1697048805.134799", - "reply_users": [ - "U02LXF3HUN7" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1697048805.134799", - "replies": [ - { - "client_msg_id": "dd879f8b-63c3-4dd5-9414-fe93655d970e", - "type": "message", - "text": "Newly added discussion topics:\n• a proposal to add a Registry of Consumers and Producers\n• a dbt issue to add OpenLineage Dataset names to the Manifest\n• a proposal to add Dataset support in Spark LogicalPlan Nodes\n• a proposal to institute a certification process for new integrations", - "user": "U02LXF3HUN7", - "ts": "1697048805.134799", - "blocks": [ - { - "type": "rich_text", - "block_id": "L1v/X", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Newly added discussion topics:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "a proposal to add a Registry of Consumers and Producers" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "a dbt issue to add OpenLineage Dataset names to the Manifest" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "a proposal to add Dataset support in Spark LogicalPlan Nodes" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "a proposal to institute a certification process for new integrations" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02LXF3HUN7", - "ts": "1697048825.000000" - }, - "thread_ts": "1697043601.182719", - "parent_user_id": "U02LXF3HUN7" - } - ] - }, - { - "client_msg_id": "385c50e2-1da3-406e-a122-600f2da833c4", - "type": "message", - "text": "Hi @there, I am trying to make API call to get column-lineage information could you please let me know the url construct to retrieve the same? As per the API documentation I am passing the following url to GET column-lineage: but getting error code:400. Thanks", - "user": "U05HK41VCH1", - "ts": "1697040264.029839", - "blocks": [ - { - "type": "rich_text", - "block_id": "STdaF", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi @there, I am trying to make API call to get column-lineage information could you please let me know the url construct to retrieve the same? As per the API documentation I am passing the following url to GET column-lineage: " - }, - { - "type": "link", - "url": "http://localhost:5000/api/v1/column-lineage" - }, - { - "type": "text", - "text": " but getting error code:400. Thanks" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697040264.029839", - "reply_count": 3, - "reply_users_count": 2, - "latest_reply": "1697537226.424999", - "reply_users": [ - "U01DCMDFHBK", - "U05HK41VCH1" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "b64cbd07-3aee-41dd-918a-85279907f1ca", - "type": "message", - "text": "Make sure to provide a dataset field `nodeId` as a query param in your request. If you’ve seeded Marquez with test metadata, you can use:\n```curl -XGET \"\"```\nYou can view the API docs for column lineage !", - "user": "U01DCMDFHBK", - "ts": "1697133326.602989", - "blocks": [ - { - "type": "rich_text", - "block_id": "+yo0+", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Make sure to provide a dataset field " - }, - { - "type": "text", - "text": "nodeId", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " as a query param in your request. If you’ve seeded Marquez with test metadata, you can use:\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "curl -XGET \"" - }, - { - "type": "link", - "url": "http://localhost:5002/api/v1/column-lineage?nodeId=datasetField%3Afood_delivery%3Apublic.delivery_7_days%3Acustomer_email" - }, - { - "type": "text", - "text": "\"" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "You can view the API docs for column lineage " - }, - { - "type": "link", - "url": "https://marquezproject.github.io/marquez/openapi.html#operation/getLineage", - "text": "here" - }, - { - "type": "text", - "text": "!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697040264.029839", - "parent_user_id": "U05HK41VCH1" - }, - { - "client_msg_id": "f371513d-c971-4254-8b86-1b79fed2fdd5", - "type": "message", - "text": "Thanks Willy. The documentation says 'name space' so i constructed API Like this:\n''\nbut it is still not working :disappointed:", - "user": "U05HK41VCH1", - "ts": "1697536656.423129", - "blocks": [ - { - "type": "rich_text", - "block_id": "P7W1y", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks Willy. The documentation says 'name space' so i constructed API Like this:\n'" - }, - { - "type": "link", - "url": "http://marquez-web:3000/api/v1/column-lineage/nodeId=datasetField:file:/home/jovyan/Downloads/event_attribute.csv:eventType" - }, - { - "type": "text", - "text": "'\nbut it is still not working " - }, - { - "type": "emoji", - "name": "disappointed", - "unicode": "1f61e" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697040264.029839", - "parent_user_id": "U05HK41VCH1" - }, - { - "client_msg_id": "01407b70-dab5-46ea-a935-cf1fe36d81a9", - "type": "message", - "text": "nodeId is constructed like this: datasetField:<namespace>:<dataset>:<field name>", - "user": "U05HK41VCH1", - "ts": "1697537226.424999", - "blocks": [ - { - "type": "rich_text", - "block_id": "5aVHi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "nodeId is constructed like this: datasetField:::" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697040264.029839", - "parent_user_id": "U05HK41VCH1" - } - ] - }, - { - "client_msg_id": "35fe2423-f25c-4cda-952b-d4cad50f40ba", - "type": "message", - "text": " When i am running this sql as part of a databricks notebook, i am recieving an OL event where i see only an output dataset and there is no input dataset or a symlink facet inside the dataset to map it to the underlying azure storage object. Can anyone kindly help on this\n```spark.sql(f\"CREATE TABLE IF NOT EXISTS covid_research.uscoviddata USING delta LOCATION ''\")\n{\n \"eventTime\": \"2023-10-11T10:47:36.296Z\",\n \"producer\": \"\",\n \"schemaURL\": \"\",\n \"eventType\": \"COMPLETE\",\n \"run\": {\n \"runId\": \"d0f40be9-b921-4c84-ac9f-f14a86c29ff7\",\n \"facets\": {\n \"spark.logicalPlan\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"plan\": [\n {\n \"class\": \"org.apache.spark.sql.catalyst.plans.logical.CreateTable\",\n \"num-children\": 1,\n \"name\": 0,\n \"tableSchema\": [],\n \"partitioning\": [],\n \"tableSpec\": null,\n \"ignoreIfExists\": true\n },\n {\n \"class\": \"org.apache.spark.sql.catalyst.analysis.ResolvedIdentifier\",\n \"num-children\": 0,\n \"catalog\": null,\n \"identifier\": null\n }\n ]\n },\n \"spark_version\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"spark-version\": \"3.3.0\",\n \"openlineage-spark-version\": \"1.2.2\"\n },\n \"processing_engine\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"version\": \"3.3.0\",\n \"name\": \"spark\",\n \"openlineageAdapterVersion\": \"1.2.2\"\n }\n }\n },\n \"job\": {\n \"namespace\": \"default\",\n \"name\": \"adb-3942203504488904.4.azuredatabricks.net.create_table.covid_research_db_uscoviddata\",\n \"facets\": {}\n },\n \"inputs\": [],\n \"outputs\": [\n {\n \"namespace\": \"dbfs\",\n \"name\": \"/user/hive/warehouse/covid_research.db/uscoviddata\",\n \"facets\": {\n \"dataSource\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"name\": \"dbfs\",\n \"uri\": \"dbfs\"\n },\n \"schema\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"fields\": []\n },\n \"storage\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"storageLayer\": \"unity\",\n \"fileFormat\": \"parquet\"\n },\n \"symlinks\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"identifiers\": [\n {\n \"namespace\": \"/user/hive/warehouse/covid_research.db\",\n \"name\": \"covid_research.uscoviddata\",\n \"type\": \"TABLE\"\n }\n ]\n },\n \"lifecycleStateChange\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"lifecycleStateChange\": \"CREATE\"\n }\n },\n \"outputFacets\": {}\n }\n ]\n}```", - "user": "U05QL7LN2GH", - "ts": "1697021758.073929", - "blocks": [ - { - "type": "rich_text", - "block_id": "AFoNq", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "here" - }, - { - "type": "text", - "text": " When i am running this sql as part of a databricks notebook, i am recieving an OL event where i see only an output dataset and there is no input dataset or a symlink facet inside the dataset to map it to the underlying azure storage object. Can anyone kindly help on this\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "spark.sql(f\"CREATE TABLE IF NOT EXISTS covid_research.uscoviddata USING delta LOCATION '" - }, - { - "type": "link", - "url": "abfss://oltptestdata@jeevanacceldata.dfs.core.windows.net/testdata/modified-delta" - }, - { - "type": "text", - "text": "'\")\n{\n \"eventTime\": \"2023-10-11T10:47:36.296Z\",\n \"producer\": \"" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark" - }, - { - "type": "text", - "text": "\",\n \"schemaURL\": \"" - }, - { - "type": "link", - "url": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent" - }, - { - "type": "text", - "text": "\",\n \"eventType\": \"COMPLETE\",\n \"run\": {\n \"runId\": \"d0f40be9-b921-4c84-ac9f-f14a86c29ff7\",\n \"facets\": {\n \"spark.logicalPlan\": {\n \"_producer\": \"" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark" - }, - { - "type": "text", - "text": "\",\n \"_schemaURL\": \"" - }, - { - "type": "link", - "url": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet" - }, - { - "type": "text", - "text": "\",\n \"plan\": [\n {\n \"class\": \"org.apache.spark.sql.catalyst.plans.logical.CreateTable\",\n \"num-children\": 1,\n \"name\": 0,\n \"tableSchema\": [],\n \"partitioning\": [],\n \"tableSpec\": null,\n \"ignoreIfExists\": true\n },\n {\n \"class\": \"org.apache.spark.sql.catalyst.analysis.ResolvedIdentifier\",\n \"num-children\": 0,\n \"catalog\": null,\n \"identifier\": null\n }\n ]\n },\n \"spark_version\": {\n \"_producer\": \"" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark" - }, - { - "type": "text", - "text": "\",\n \"_schemaURL\": \"" - }, - { - "type": "link", - "url": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet" - }, - { - "type": "text", - "text": "\",\n \"spark-version\": \"3.3.0\",\n \"openlineage-spark-version\": \"1.2.2\"\n },\n \"processing_engine\": {\n \"_producer\": \"" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark" - }, - { - "type": "text", - "text": "\",\n \"_schemaURL\": \"" - }, - { - "type": "link", - "url": "https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet" - }, - { - "type": "text", - "text": "\",\n \"version\": \"3.3.0\",\n \"name\": \"spark\",\n \"openlineageAdapterVersion\": \"1.2.2\"\n }\n }\n },\n \"job\": {\n \"namespace\": \"default\",\n \"name\": \"adb-3942203504488904.4.azuredatabricks.net.create_table.covid_research_db_uscoviddata\",\n \"facets\": {}\n },\n \"inputs\": [],\n \"outputs\": [\n {\n \"namespace\": \"dbfs\",\n \"name\": \"/user/hive/warehouse/covid_research.db/uscoviddata\",\n \"facets\": {\n \"dataSource\": {\n \"_producer\": \"" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark" - }, - { - "type": "text", - "text": "\",\n \"_schemaURL\": \"" - }, - { - "type": "link", - "url": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet" - }, - { - "type": "text", - "text": "\",\n \"name\": \"dbfs\",\n \"uri\": \"dbfs\"\n },\n \"schema\": {\n \"_producer\": \"" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark" - }, - { - "type": "text", - "text": "\",\n \"_schemaURL\": \"" - }, - { - "type": "link", - "url": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet" - }, - { - "type": "text", - "text": "\",\n \"fields\": []\n },\n \"storage\": {\n \"_producer\": \"" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark" - }, - { - "type": "text", - "text": "\",\n \"_schemaURL\": \"" - }, - { - "type": "link", - "url": "https://openlineage.io/spec/facets/1-0-0/StorageDatasetFacet.json#/$defs/StorageDatasetFacet" - }, - { - "type": "text", - "text": "\",\n \"storageLayer\": \"unity\",\n \"fileFormat\": \"parquet\"\n },\n \"symlinks\": {\n \"_producer\": \"" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark" - }, - { - "type": "text", - "text": "\",\n \"_schemaURL\": \"" - }, - { - "type": "link", - "url": "https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet" - }, - { - "type": "text", - "text": "\",\n \"identifiers\": [\n {\n \"namespace\": \"/user/hive/warehouse/covid_research.db\",\n \"name\": \"covid_research.uscoviddata\",\n \"type\": \"TABLE\"\n }\n ]\n },\n \"lifecycleStateChange\": {\n \"_producer\": \"" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark" - }, - { - "type": "text", - "text": "\",\n \"_schemaURL\": \"" - }, - { - "type": "link", - "url": "https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet" - }, - { - "type": "text", - "text": "\",\n \"lifecycleStateChange\": \"CREATE\"\n }\n },\n \"outputFacets\": {}\n }\n ]\n}" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697021758.073929", - "reply_count": 3, - "reply_users_count": 2, - "latest_reply": "1697021974.938879", - "reply_users": [ - "U05FLJE4GDU", - "U05QL7LN2GH" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "db148454-967c-4e72-942e-54a088b71cdf", - "type": "message", - "text": "Hey Guntaka - can I ask you a favour? Can you please stop using `@here` or `@channel` - please keep in mind, you're pinging over 1000 people when you use that mention. Its incredibly distracting to have Slack notify me of a message that isn't pertinent to me.", - "user": "U05FLJE4GDU", - "ts": "1697021866.183179", - "blocks": [ - { - "type": "rich_text", - "block_id": "HKa8J", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hey Guntaka - can I ask you a favour? Can you please stop using " - }, - { - "type": "text", - "text": "@here", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " or " - }, - { - "type": "text", - "text": "@channel", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " - please keep in mind, you're pinging over 1000 people when you use that mention. Its incredibly distracting to have Slack notify me of a message that isn't pertinent to me." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697021758.073929", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "7357be3d-f271-4190-9a03-aaefe7e602cf", - "type": "message", - "text": "sure noted <@U05FLJE4GDU>", - "user": "U05QL7LN2GH", - "ts": "1697021930.666629", - "blocks": [ - { - "type": "rich_text", - "block_id": "KBawo", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "sure noted " - }, - { - "type": "user", - "user_id": "U05FLJE4GDU" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697021758.073929", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "6180e304-09ba-4165-b9c5-a2e6ddf55398", - "type": "message", - "text": "Thank you!", - "user": "U05FLJE4GDU", - "ts": "1697021974.938879", - "blocks": [ - { - "type": "rich_text", - "block_id": "Ye0AT", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thank you!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697021758.073929", - "parent_user_id": "U05QL7LN2GH" - } - ] - }, - { - "client_msg_id": "73ca512c-b9c3-4e01-ac03-f7c97c65b991", - "type": "message", - "text": " i am trying out the databricks spark integration and in one of the events i am getting a openlineage event where the output dataset is having a facet called `symlinks` , the statement that generated this event is this sql\n```CREATE TABLE IF NOT EXISTS covid_research.covid_data \nUSING CSV\nLOCATION '' \nOPTIONS (header \"true\", inferSchema \"true\");```\nCan someone kindly let me know what this `symlinks` facet is. i tried seeing the spec but did not get it completely", - "user": "U05QL7LN2GH", - "ts": "1696995819.546399", - "blocks": [ - { - "type": "rich_text", - "block_id": "N0r6R", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "here" - }, - { - "type": "text", - "text": " i am trying out the databricks spark integration and in one of the events i am getting a openlineage event where the output dataset is having a facet called " - }, - { - "type": "text", - "text": "symlinks", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " , the statement that generated this event is this sql\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "CREATE TABLE IF NOT EXISTS covid_research.covid_data \nUSING CSV\nLOCATION '" - }, - { - "type": "link", - "url": "abfss://oltptestdata@jeevanacceldata.dfs.core.windows.net/testdata/johns-hopkins-covid-19-daily-dashboard-cases-by-states.csv" - }, - { - "type": "text", - "text": "' \nOPTIONS (header \"true\", inferSchema \"true\");" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Can someone kindly let me know what this " - }, - { - "type": "text", - "text": "symlinks", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " facet is. i tried seeing the spec but did not get it completely" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696995819.546399", - "reply_count": 3, - "reply_users_count": 2, - "latest_reply": "1697001944.554579", - "reply_users": [ - "U05T8BJD4DU", - "U05QL7LN2GH" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "a2c46736-b265-4087-a5b4-900d15a2dfab", - "type": "message", - "text": "I use it to get the table with database name", - "user": "U05T8BJD4DU", - "ts": "1696995893.445939", - "blocks": [ - { - "type": "rich_text", - "block_id": "Hdg1w", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I use it to get the table with database name" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696995819.546399", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "e7c1d108-3df6-49fd-8cdc-70381e9b03bb", - "type": "message", - "text": "so can i think it like if there is a synlink, then that table is kind of a reference to the original dataset", - "user": "U05QL7LN2GH", - "ts": "1696996035.076329", - "blocks": [ - { - "type": "rich_text", - "block_id": "D+R+W", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "so can i think it like if there is a synlink, then that table is kind of a reference to the original dataset" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696995819.546399", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "514dff8b-7ce4-4a68-9d4c-e2eb623a1a1f", - "type": "message", - "text": "yes", - "user": "U05T8BJD4DU", - "ts": "1697001944.554579", - "blocks": [ - { - "type": "rich_text", - "block_id": "IB8ze", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yes" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696995819.546399", - "parent_user_id": "U05QL7LN2GH", - "reactions": [ - { - "name": "raised_hands", - "users": [ - "U02MK6YNAQ5" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "98148df4-23e3-4211-806d-5954410eabfb", - "type": "message", - "text": "example:\n\n```{\"environment-properties\":{\"spark.databricks.clusterUsageTags.clusterName\":\"'s Cluster\",\"spark.databricks.job.runId\":\"\",\"spark.databricks.job.type\":\"\",\"spark.databricks.clusterUsageTags.azureSubscriptionId\":\"a4f54399-8db8-4849-adcc-a42aed1fb97f\",\"spark.databricks.notebook.path\":\"/Repos/jason.yip@tredence.com/segmentation/01_Data Prep\",\"spark.databricks.clusterUsageTags.clusterOwnerOrgId\":\"4679476628690204\",\"MountPoints\":[{\"MountPoint\":\"/databricks-datasets\",\"Source\":\"databricks-datasets\"},{\"MountPoint\":\"/Volumes\",\"Source\":\"UnityCatalogVolumes\"},{\"MountPoint\":\"/databricks/mlflow-tracking\",\"Source\":\"databricks/mlflow-tracking\"},{\"MountPoint\":\"/databricks-results\",\"Source\":\"databricks-results\"},{\"MountPoint\":\"/databricks/mlflow-registry\",\"Source\":\"databricks/mlflow-registry\"},{\"MountPoint\":\"/Volume\",\"Source\":\"DbfsReserved\"},{\"MountPoint\":\"/volumes\",\"Source\":\"DbfsReserved\"},{\"MountPoint\":\"/\",\"Source\":\"DatabricksRoot\"},{\"MountPoint\":\"/volume\",\"Source\":\"DbfsReserved\"}],\"User\":\"\",\"UserId\":\"4768657035718622\",\"OrgId\":\"4679476628690204\"}}```", - "user": "U05T8BJD4DU", - "ts": "1696985639.868119", - "blocks": [ - { - "type": "rich_text", - "block_id": "Av2WJ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "example:\n\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "{\"environment-properties\":{\"spark.databricks.clusterUsageTags.clusterName\":\"jason.yip@tredence.com's Cluster\",\"spark.databricks.job.runId\":\"\",\"spark.databricks.job.type\":\"\",\"spark.databricks.clusterUsageTags.azureSubscriptionId\":\"a4f54399-8db8-4849-adcc-a42aed1fb97f\",\"spark.databricks.notebook.path\":\"/Repos/jason.yip@tredence.com/segmentation/01_Data Prep\",\"spark.databricks.clusterUsageTags.clusterOwnerOrgId\":\"4679476628690204\",\"MountPoints\":[{\"MountPoint\":\"/databricks-datasets\",\"Source\":\"databricks-datasets\"},{\"MountPoint\":\"/Volumes\",\"Source\":\"UnityCatalogVolumes\"},{\"MountPoint\":\"/databricks/mlflow-tracking\",\"Source\":\"databricks/mlflow-tracking\"},{\"MountPoint\":\"/databricks-results\",\"Source\":\"databricks-results\"},{\"MountPoint\":\"/databricks/mlflow-registry\",\"Source\":\"databricks/mlflow-registry\"},{\"MountPoint\":\"/Volume\",\"Source\":\"DbfsReserved\"},{\"MountPoint\":\"/volumes\",\"Source\":\"DbfsReserved\"},{\"MountPoint\":\"/\",\"Source\":\"DatabricksRoot\"},{\"MountPoint\":\"/volume\",\"Source\":\"DbfsReserved\"}],\"User\":\"jason.yip@tredence.com\",\"UserId\":\"4768657035718622\",\"OrgId\":\"4679476628690204\"}}" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696985639.868119", - "reply_count": 4, - "reply_users_count": 2, - "latest_reply": "1697439536.801279", - "reply_users": [ - "U02MK6YNAQ5", - "U05T8BJD4DU" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "dd99f2a1-192d-453b-8e43-7712cea81387", - "type": "message", - "text": "Is this related to any OL version? In OL 1.2.2. we've added extra variable ``spark.databricks.clusterUsageTags.clusterAllTags`` to be captured, but this should not break things.\n\nI think we're facing some issues on recent databricks runtime versions. Here is an issue for this: \n\nIs the problem you describe specific to some databricks runtime versions?", - "user": "U02MK6YNAQ5", - "ts": "1697010373.970599", - "blocks": [ - { - "type": "rich_text", - "block_id": "07u8a", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Is this related to any OL version? In OL 1.2.2. we've added extra variable `" - }, - { - "type": "text", - "text": "spark.databricks.clusterUsageTags.clusterAllTags`", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " to be captured, but this should not break things.\n\nI think we're facing some issues on recent databricks runtime versions. Here is an issue for this: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/2131" - }, - { - "type": "text", - "text": "\n\nIs the problem you describe specific to some databricks runtime versions?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1695709470, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/issues/2131", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2131 [Spark Databricks] NoSuchMethodError on ReplaceTableAsSelect", - "text": "More details here: \nLooks like different class implementation on databricks platform.", - "title": "#2131 [Spark Databricks] NoSuchMethodError on ReplaceTableAsSelect", - "title_link": "https://github.com/OpenLineage/OpenLineage/issues/2131", - "footer": "", - "fields": [ - { - "value": "integration/spark, integration/databricks", - "title": "Labels", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1696985639.868119", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "0c2505bc-3772-41ad-a213-bb6ad7b7cdd3", - "type": "message", - "text": "yes, exactly Spark 3.4+", - "user": "U05T8BJD4DU", - "ts": "1697037426.350799", - "blocks": [ - { - "type": "rich_text", - "block_id": "TXfld", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yes, exactly Spark 3.4+" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696985639.868119", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "3ae5348a-6474-49f5-8dd8-df7aaf32bb4d", - "type": "message", - "text": "Btw I don't understand the code flow entirely, if we are talking about a different classpath only, I see there's Unity Catalog handler in the code and it says it works the same as Delta, but I am not seeing it subclassing Delta. I suppose it will work the same. \n\nI am happy to jump on a call to show you if needed", - "user": "U05T8BJD4DU", - "ts": "1697073147.278359", - "blocks": [ - { - "type": "rich_text", - "block_id": "31LD4", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Btw I don't understand the code flow entirely, if we are talking about a different classpath only, I see there's Unity Catalog handler in the code and it says it works the same as Delta, but I am not seeing it subclassing Delta. I suppose it will work the same. \n\nI am happy to jump on a call to show you if needed" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05T8BJD4DU", - "ts": "1697073159.000000" - }, - "thread_ts": "1696985639.868119", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "dfa5f011-7ccc-45dd-8ea4-36537c2ff811", - "type": "message", - "text": "<@U02MK6YNAQ5> do you think in Spark 3.4+ only one event would happen?\n\n /**\n * We get exact copies of OL events for org.apache.spark.scheduler.SparkListenerJobStart and\n * org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart. The same happens for end\n * events.\n *\n * @return\n */\n private boolean isOnJobStartOrEnd(SparkListenerEvent event) {\n return event instanceof SparkListenerJobStart || event instanceof SparkListenerJobEnd;\n }", - "user": "U05T8BJD4DU", - "ts": "1697439536.801279", - "blocks": [ - { - "type": "rich_text", - "block_id": "Ijzgw", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " do you think in Spark 3.4+ only one event would happen?\n\n /**\n * We get exact copies of OL events for org.apache.spark.scheduler.SparkListenerJobStart and\n * org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart. The same happens for end\n * events.\n *\n * @return\n */\n private boolean isOnJobStartOrEnd(SparkListenerEvent event) {\n return event instanceof SparkListenerJobStart || event instanceof SparkListenerJobEnd;\n }" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696985639.868119", - "parent_user_id": "U05T8BJD4DU" - } - ] - }, - { - "client_msg_id": "3e35ff82-64c5-47e0-9c43-e5e2ae88094d", - "type": "message", - "text": "Any idea why \"environment-properties\" is gone in Spark 3.4+ in StartEvent?", - "user": "U05T8BJD4DU", - "ts": "1696914311.793789", - "blocks": [ - { - "type": "rich_text", - "block_id": "4qums", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Any idea why \"environment-properties\" is gone in Spark 3.4+ in StartEvent?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "8363d0f5-1aa7-4f2e-9cbd-ce61843beea3", - "type": "message", - "text": "Hello. I am getting started with OL and Marquez with dbt. I am using dbt-ol. The namespace of the dataset showing up in Marquez is not the namespace I provide using OPENLINEAGE_NAMESPACE. It happens to be the same as the source in Marquez which is the snowflake account uri. It's obviously picking up the other env variable OPENLINEAGE_URL so i am pretty sure its not the environment. Is this expected?", - "user": "U021QJMRP47", - "ts": "1696884935.692409", - "blocks": [ - { - "type": "rich_text", - "block_id": "DM7nS", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hello. I am getting started with OL and Marquez with dbt. I am using dbt-ol. The namespace of the dataset showing up in Marquez is not the namespace I provide using OPENLINEAGE_NAMESPACE. It happens to be the same as the source in Marquez which is the snowflake account uri. It's obviously picking up the other env variable OPENLINEAGE_URL so i am pretty sure its not the environment. Is this expected?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696884935.692409", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1696892173.535019", - "reply_users": [ - "U02LXF3HUN7" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1696892173.535019", - "replies": [ - { - "client_msg_id": "0fc68344-e20e-4d3f-9948-577766f8043a", - "type": "message", - "text": "Hi Drew, thank you for using OpenLineage! I don’t know the details of your use case, but I believe this is expected, yes. In general, the dataset namespace is different. Jobs are namespaced separately from datasets, which are namespaced by their containing datasources. This is the case so datasets have the same name regardless of the job writing to them, as datasets are sometimes shared by jobs in different namespaces.", - "user": "U02LXF3HUN7", - "ts": "1696892173.535019", - "blocks": [ - { - "type": "rich_text", - "block_id": "h6sQS", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi Drew, thank you for using OpenLineage! I don’t know the details of your use case, but I believe this is expected, yes. In general, the dataset namespace is different. Jobs are namespaced separately from datasets, which are namespaced by their containing datasources. This is the case so datasets have the same name regardless of the job writing to them, as datasets are sometimes shared by jobs in different namespaces." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696884935.692409", - "parent_user_id": "U021QJMRP47", - "reactions": [ - { - "name": "+1", - "users": [ - "U021QJMRP47" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "69cca1dd-d1be-466e-babd-e3492cfe5a61", - "type": "message", - "text": "\n*We released OpenLineage 1.4.1!*\n*Additions:*\n• *Client:* *allow setting client’s endpoint via environment variable* <@U01HVNU6A4C> \n• *Flink: expand Iceberg source types* <@U05QA2D1XNV> \n• *Spark: add debug facet* <@U02MK6YNAQ5> \n• *Spark: enable Nessie REST catalog* \nThanks to all the contributors, especially new contributors <@U05QA2D1XNV> and !\n*Release:* \n*Changelog:* \n*Commit history:* \n*Maven:* \n*PyPI:* ", - "user": "U02LXF3HUN7", - "ts": "1696879514.895109", - "blocks": [ - { - "type": "rich_text", - "block_id": "uk4kh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "We released OpenLineage 1.4.1!", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Additions:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Client:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "text", - "text": "allow setting client’s endpoint via environment variable", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "http://2151.com/OpenLineage/OpenLineage/pull/2151", - "text": "2151" - }, - { - "type": "text", - "text": " " - }, - { - "type": "user", - "user_id": "U01HVNU6A4C" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Flink: expand Iceberg source types", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2149", - "text": "2149" - }, - { - "type": "text", - "text": " " - }, - { - "type": "user", - "user_id": "U05QA2D1XNV" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark: add debug facet", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2147", - "text": "2147" - }, - { - "type": "text", - "text": " " - }, - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark: enable Nessie REST catalog", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2165", - "text": "2165" - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/julwin", - "text": "@julwin", - "unsafe": true - }, - { - "type": "text", - "text": " " - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks to all the contributors, especially new contributors " - }, - { - "type": "user", - "user_id": "U05QA2D1XNV" - }, - { - "type": "text", - "text": " and " - }, - { - "type": "link", - "url": "https://github.com/julwin", - "text": "@julwin", - "unsafe": true - }, - { - "type": "text", - "text": "!\n" - }, - { - "type": "text", - "text": "Release:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/releases/tag/1.4.1" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Changelog: ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Commit history:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/compare/1.3.1...1.4.1" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Maven:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://oss.sonatype.org/#nexus-search;quick~openlineage" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "PyPI:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://pypi.org/project/openlineage-python/" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "reactions": [ - { - "name": "+1", - "users": [ - "U05T8BJD4DU", - "U053LLVTHRN", - "U01HVNU6A4C", - "U01HNKK4XAM", - "U05TU0U224A" - ], - "count": 5 - } - ] - }, - { - "type": "message", - "text": " I am trying out the openlineage integration of spark on databricks. There is no event getting emitted from Openlineage, I see logs saying OpenLineage Event Skipped. I am attaching the Notebook that i am trying to run and the cluster logs. Kindly can someone help me on this", - "files": [ - { - "id": "F0608U3FJ3D", - "created": 1696823902, - "timestamp": 1696823902, - "name": "Screenshot 2023-10-09 at 9.28.18 AM.png", - "title": "Screenshot 2023-10-09 at 9.28.18 AM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05QL7LN2GH", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 375392, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F0608U3FJ3D/screenshot_2023-10-09_at_9.28.18_am.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F0608U3FJ3D/download/screenshot_2023-10-09_at_9.28.18_am.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F0608U3FJ3D-886b1a6404/screenshot_2023-10-09_at_9.28.18_am_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F0608U3FJ3D-886b1a6404/screenshot_2023-10-09_at_9.28.18_am_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F0608U3FJ3D-886b1a6404/screenshot_2023-10-09_at_9.28.18_am_360.png", - "thumb_360_w": 360, - "thumb_360_h": 178, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F0608U3FJ3D-886b1a6404/screenshot_2023-10-09_at_9.28.18_am_480.png", - "thumb_480_w": 480, - "thumb_480_h": 238, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F0608U3FJ3D-886b1a6404/screenshot_2023-10-09_at_9.28.18_am_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F0608U3FJ3D-886b1a6404/screenshot_2023-10-09_at_9.28.18_am_720.png", - "thumb_720_w": 720, - "thumb_720_h": 356, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F0608U3FJ3D-886b1a6404/screenshot_2023-10-09_at_9.28.18_am_800.png", - "thumb_800_w": 800, - "thumb_800_h": 396, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F0608U3FJ3D-886b1a6404/screenshot_2023-10-09_at_9.28.18_am_960.png", - "thumb_960_w": 960, - "thumb_960_h": 475, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F0608U3FJ3D-886b1a6404/screenshot_2023-10-09_at_9.28.18_am_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 507, - "original_w": 2614, - "original_h": 1294, - "thumb_tiny": "AwAXADDRwQelKaWkNADfm/yKXml70UAHNGKKKAFpDQ1B6UAGPeigUtACUY96Wg9KAP/Z", - "permalink": "https://openlineage.slack.com/files/U05QL7LN2GH/F0608U3FJ3D/screenshot_2023-10-09_at_9.28.18_am.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F0608U3FJ3D-5e001bbbb0", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - }, - { - "id": "F05V8FALBL7", - "created": 1696823970, - "timestamp": 1696823970, - "name": "cluster-logs", - "title": "cluster-logs", - "mimetype": "text/plain", - "filetype": "text", - "pretty_type": "Plain Text", - "user": "U05QL7LN2GH", - "user_team": "T01CWUYP5AR", - "editable": true, - "size": 24284, - "mode": "snippet", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05V8FALBL7/cluster-logs", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05V8FALBL7/download/cluster-logs", - "permalink": "https://openlineage.slack.com/files/U05QL7LN2GH/F05V8FALBL7/cluster-logs", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05V8FALBL7-705f16d652", - "edit_link": "https://openlineage.slack.com/files/U05QL7LN2GH/F05V8FALBL7/cluster-logs/edit", - "preview": "23/10/09 03:53:29 INFO InMemoryFileIndex: Start listing leaf files and directories. Size of Paths: 1; threshold: 32\n23/10/09 03:53:29 INFO InMemoryFileIndex: Start listing leaf files and directories. Size of Paths: 0; threshold: 32\n23/10/09 03:53:29 INFO InMemoryFileIndex: It took 18 ms to list leaf files for 1 paths.\n23/10/09 03:53:29 INFO InMemoryFileIndex: Start listing leaf files and directories. Size of Paths: 1; threshold: 32\n23/10/09 03:53:29 INFO InMemoryFileIndex: Start listing leaf files and directories. Size of Paths: 0; threshold: 32", - "preview_highlight": "
\n
\n
23/10/09 03:53:29 INFO InMemoryFileIndex: Start listing leaf files and directories. Size of Paths: 1; threshold: 32
\n
23/10/09 03:53:29 INFO InMemoryFileIndex: Start listing leaf files and directories. Size of Paths: 0; threshold: 32
\n
23/10/09 03:53:29 INFO InMemoryFileIndex: It took 18 ms to list leaf files for 1 paths.
\n
23/10/09 03:53:29 INFO InMemoryFileIndex: Start listing leaf files and directories. Size of Paths: 1; threshold: 32
\n
23/10/09 03:53:29 INFO InMemoryFileIndex: Start listing leaf files and directories. Size of Paths: 0; threshold: 32
\n
\n
\n", - "lines": 197, - "lines_more": 192, - "preview_is_truncated": true, - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05QL7LN2GH", - "ts": "1696823976.297949", - "blocks": [ - { - "type": "rich_text", - "block_id": "2pd86", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "here" - }, - { - "type": "text", - "text": " I am trying out the openlineage integration of spark on databricks. There is no event getting emitted from Openlineage, I see logs saying OpenLineage Event Skipped. I am attaching the Notebook that i am trying to run and the cluster logs. Kindly can someone help me on this" - } - ] - } - ] - } - ], - "client_msg_id": "b06c052e-bedd-4793-82fb-0dacb289b416", - "thread_ts": "1696823976.297949", - "reply_count": 16, - "reply_users_count": 3, - "latest_reply": "1696843014.053669", - "reply_users": [ - "U05T8BJD4DU", - "U05QL7LN2GH", - "U02MK6YNAQ5" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "befe4efc-64c4-4c6b-b7f8-0c9e1f8f5251", - "type": "message", - "text": "from my experience, it will only work on Spark 3.3.x or below, aka Runtime 12.2 or below. Anything above the events will show up once in a blue moon", - "user": "U05T8BJD4DU", - "ts": "1696824130.926979", - "blocks": [ - { - "type": "rich_text", - "block_id": "R6oYJ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "from my experience, it will only work on Spark 3.3.x or below, aka Runtime 12.2 or below. Anything above the events will show up once in a blue moon" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "34df7151-21b7-45f4-b5cc-8d710ca64471", - "type": "message", - "text": "ohh, thanks for the information <@U05T8BJD4DU>, I am trying out with 13.3 Databricks Version and Spark 3.4.1, will try using a below version as you suggested. Any issue tracking this bug <@U05T8BJD4DU>", - "user": "U05QL7LN2GH", - "ts": "1696824278.069699", - "blocks": [ - { - "type": "rich_text", - "block_id": "eW6Pl", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "ohh, thanks for the information " - }, - { - "type": "user", - "user_id": "U05T8BJD4DU" - }, - { - "type": "text", - "text": ", I am trying out with 13.3 Databricks Version and Spark 3.4.1, will try using a below version as you suggested. Any issue tracking this bug " - }, - { - "type": "user", - "user_id": "U05T8BJD4DU" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "6f03969d-50fc-46d3-bdb9-e23282b1ac09", - "type": "message", - "text": "", - "user": "U05T8BJD4DU", - "ts": "1696824366.892649", - "blocks": [ - { - "type": "rich_text", - "block_id": "vmCqe", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/2124" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1695498902, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/issues/2124", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2124 Same Delta Table not catching the location on write", - "text": "*What is the target system?*\n\nSpark / Databricks\n\n*What kind of integration is this?*\n\n☐ Produces OpenLineage metadata\n☐ Consumes OpenLineage metadata\n☐ Something else\n\n*How should this integration be implemented?*\n\nI am using OL 1.2.2, Azure Databricks Runtime 11.3 LTS. When creating a table writing into a ADLS location, OL won't be able to catch the location of the output. But when I read the same object it will be able to read the location as INPUT.\n\nPlease note I have also tested Databricks Runtime 13.3 LTS, Spark 3.4.1 - it will give correct ADLS location in INPUT but the input will only show up once in a blue moon. Most of the time the inputs and outputs are blank.\n\n```\n \"inputs\": [],\n \"outputs\": []\n```\n\n```\nCREATE OR REPLACE TABLE transactions_adj\nUSING DELTA LOCATION ''\nAS\n SELECT\n household_id,\n basket_id,\n week_no,\n day,\n transaction_time,\n store_id,\n product_id,\n amount_list,\n campaign_coupon_discount,\n manuf_coupon_discount,\n manuf_coupon_match_discount,\n total_coupon_discount,\n instore_discount,\n amount_paid,\n units\n FROM (\n SELECT \n household_id,\n basket_id,\n week_no,\n day,\n transaction_time,\n store_id,\n product_id,\n COALESCE(sales_amount - discount_amount - coupon_discount - coupon_discount_match,0.0) as amount_list,\n CASE \n WHEN COALESCE(coupon_discount_match,0.0) = 0.0 THEN -1 * COALESCE(coupon_discount,0.0) \n ELSE 0.0 \n END as campaign_coupon_discount,\n CASE \n WHEN COALESCE(coupon_discount_match,0.0) != 0.0 THEN -1 * COALESCE(coupon_discount,0.0) \n ELSE 0.0 \n END as manuf_coupon_discount,\n -1 * COALESCE(coupon_discount_match,0.0) as manuf_coupon_match_discount,\n -1 * COALESCE(coupon_discount - coupon_discount_match,0.0) as total_coupon_discount,\n COALESCE(-1 * discount_amount,0.0) as instore_discount,\n COALESCE(sales_amount,0.0) as `amount_paid,`\n quantity as units\n FROM transactions\n );\n```\n\nHere's the COMPLETE event:\n\n```\n\n \"outputs\":[\n {\n \"namespace\":\"dbfs\",\n \"name\":\"/user/hive/warehouse/journey.db/transactions_adj\",\n \"facets\":{\n \"dataSource\":{\n \"_producer\":\"\",\n \"_schemaURL\":\"\",\n \"name\":\"dbfs\",\n \"uri\":\"dbfs\"\n },\n\n```\n\nBelow logical plan shows the path:\n\n```\n== Analyzed Logical Plan ==\nnum_affected_rows: bigint, num_inserted_rows: bigint\nReplaceTableAsSelect TableSpec(Map(),Some(DELTA),Map(),Some(),None,None,false,Set()), true\n:- ResolvedIdentifier com.databricks.sql.managedcatalog.UnityCatalogV2Proxy@6251a8df, default.transactions_adj\n+- Project [household_id#184, basket_id#185L, week_no#193, day#186, transaction_time#192, store_id#190, product_id#187, amount_list#147, campaign_coupon_discount#148, manuf_coupon_discount#149, manuf_coupon_match_discount#150, total_coupon_discount#151, instore_discount#152, amount_paid#153, units#154]\n +- SubqueryAlias __auto_generated_subquery_name\n +- Project [household_id#184, basket_id#185L, week_no#193, day#186, transaction_time#192, store_id#190, product_id#187, coalesce(cast((((sales_amount#189 - discount_amount#191) - coupon_discount#194) - coupon_discount_match#195) as double), cast(0.0 as double)) AS amount_list#147, CASE WHEN (coalesce(cast(coupon_discount_match#195 as double), cast(0.0 as double)) = cast(0.0 as double)) THEN (cast(-1 as double) * coalesce(cast(coupon_discount#194 as double), cast(0.0 as double))) ELSE cast(0.0 as double) END AS campaign_coupon_discount#148, CASE WHEN NOT (coalesce(cast(coupon_discount_match#195 as double), cast(0.0 as double)) = cast(0.0 as double)) THEN (cast(-1 as double) * coalesce(cast(coupon_discount#194 as double), cast(0.0 as double))) ELSE cast(0.0 as double) END AS manuf_coupon_discount#149, (cast(-1 as double) * coalesce(cast(coupon_discount_match#195 as double), cast(0.0 as double))) AS manuf_coupon_match_discount#150, (cast(-1 as double) * coalesce(cast((coupon_discount#194 - coupon_discount_match#195) as double), cast(0.0 as double))) AS total_coupon_discount#151, coalesce(cast((cast(-1 as float) * discount_amount#191) as double), cast(0.0 as double)) AS instore_discount#152, coalesce(cast(sales_amount#189 as double), cast(0.0 as double)) AS amount_paid#153, quantity#188 AS units#154]\n +- SubqueryAlias spark_catalog.default.transactions\n +- Relation spark_catalog.default.transactions[household_id#184,basket_id#185L,day#186,product_id#187,quantity#188,sales_amount#189,store_id#190,discount_amount#191,transaction_time#192,week_no#193,coupon_discount#194,coupon_discount_match#195] parquet\n```\n\n*Where should this integration be implemented?*\n\n☐ In the target system\n☐ In the OpenLineage repo\n☐ Somewhere else\n\n*Do you plan to make this contribution yourself?*\n\n☐ I am interested in doing this work", - "title": "#2124 Same Delta Table not catching the location on write", - "title_link": "https://github.com/OpenLineage/OpenLineage/issues/2124", - "footer": "", - "fields": [ - { - "value": "integration/spark, integration/databricks", - "title": "Labels", - "short": true - }, - { - "value": "2", - "title": "Comments", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "ed33723b-e22c-40a2-9799-de7dd1030a5e", - "type": "message", - "text": "tried with databricks 12.2 --> spark 3.3.2, still the same behaviour no event getting emitted", - "user": "U05QL7LN2GH", - "ts": "1696824714.873719", - "blocks": [ - { - "type": "rich_text", - "block_id": "m4glh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "tried with databricks 12.2 --> spark 3.3.2, still the same behaviour no event getting emitted" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05QL7LN2GH", - "ts": "1696824726.000000" - }, - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "188281e1-7fa2-4ae6-a139-e873476ef677", - "type": "message", - "text": "you can do 11.3, its the most stable one I know", - "user": "U05T8BJD4DU", - "ts": "1696824755.134769", - "blocks": [ - { - "type": "rich_text", - "block_id": "Obry3", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "you can do 11.3, its the most stable one I know" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "bec85693-dc8e-4bb3-8ab7-a5e41817b485", - "type": "message", - "text": "sure, let me try that out", - "user": "U05QL7LN2GH", - "ts": "1696824766.361999", - "blocks": [ - { - "type": "rich_text", - "block_id": "6qxi+", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "sure, let me try that out" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "95ca842b-8bc8-4158-89ce-7332950b6202", - "type": "message", - "text": "still the same problem…the jar that i am using is the latest _openlineage-spark-1.3.1.jar, do you think that can be the problem_", - "user": "U05QL7LN2GH", - "ts": "1696825911.714029", - "blocks": [ - { - "type": "rich_text", - "block_id": "5kxtF", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "still the same problem…the jar that i am using is the latest " - }, - { - "type": "text", - "text": "openlineage-spark-1.3.1.jar, do you think that can be the problem", - "style": { - "italic": true - } - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05QL7LN2GH", - "ts": "1696825923.000000" - }, - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "7ace5bd1-ec85-4f9c-835a-580615adf030", - "type": "message", - "text": "tried with _openlineage-spark-1.2.2.jar, still the same issue, seems like they are skipping some events_", - "user": "U05QL7LN2GH", - "ts": "1696826639.326199", - "blocks": [ - { - "type": "rich_text", - "block_id": "4H36W", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "tried with " - }, - { - "type": "text", - "text": "openlineage-spark-1.2.2.jar, still the same issue, seems like they are skipping some events", - "style": { - "italic": true - } - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "d33daeb6-44d9-4759-9796-2196663d0791", - "type": "message", - "text": "Probably not all events will be captured, I have only tested create tables and jobs", - "user": "U05T8BJD4DU", - "ts": "1696830440.910309", - "blocks": [ - { - "type": "rich_text", - "block_id": "WJ2BZ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Probably not all events will be captured, I have only tested create tables and jobs" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "f31b094c-55cd-46d7-bf16-1a14b11df963", - "type": "message", - "text": "Hi <@U05QL7LN2GH>, how did you configure openlineage and what is your job doing?\n\nWe do have a bunch of integration tests on Databricks platform and they're passing on databricks runtime `13.0.x-scala2.12`.\n\nCould you also try running code same as our test does ()? If you run it and see OL events, this will make us sure your config is OK and we can continue further debug.\n\nLooking at your spark script: could you save your dataset and see if you still don't see any events?", - "user": "U02MK6YNAQ5", - "ts": "1696840272.582079", - "blocks": [ - { - "type": "rich_text", - "block_id": "lq3NX", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi " - }, - { - "type": "user", - "user_id": "U05QL7LN2GH" - }, - { - "type": "text", - "text": ", how did you configure openlineage and what is your job doing?\n\nWe do have a bunch of integration tests on Databricks platform " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/1.3.1/integration/spark/app/src/test/java/io/openlineage/spark/agent/DatabricksIntegrationTest.java", - "text": "available here" - }, - { - "type": "text", - "text": " and they're passing on databricks runtime " - }, - { - "type": "text", - "text": "13.0.x-scala2.12", - "style": { - "code": true - } - }, - { - "type": "text", - "text": ".\n\nCould you also try running code same as our test does (" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/1.3.1/integration/spark/app/src/test/resources/databricks_notebooks/ctas.py", - "text": "this one" - }, - { - "type": "text", - "text": ")? If you run it and see OL events, this will make us sure your config is OK and we can continue further debug.\n\nLooking at your spark script: could you save your dataset and see if you still don't see any events?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/1.3.1/integration/spark/app/src/test/java/io/openlineage/spark/agent/DatabricksIntegrationTest.java", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - }, - { - "id": 2, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/1.3.1/integration/spark/app/src/test/resources/databricks_notebooks/ctas.py", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "25205ec0-ce7f-48fc-976d-24e50726d962", - "type": "message", - "text": "```babynames = spark.read.format(\"csv\").option(\"header\", \"true\").option(\"inferSchema\", \"true\").load(\"dbfs:/FileStore/babynames.csv\")\nbabynames.createOrReplaceTempView(\"babynames_table\")\nyears = spark.sql(\"select distinct(Year) from babynames_table\").rdd.map(lambda row : row[0]).collect()\nyears.sort()\ndbutils.widgets.dropdown(\"year\", \"2014\", [str(x) for x in years])\ndisplay(babynames.filter(babynames.Year == dbutils.widgets.get(\"year\")))```", - "user": "U05QL7LN2GH", - "ts": "1696842401.831669", - "blocks": [ - { - "type": "rich_text", - "block_id": "gtFWb", - "elements": [ - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "babynames = spark.read.format(\"csv\").option(\"header\", \"true\").option(\"inferSchema\", \"true\").load(\"dbfs:/FileStore/babynames.csv\")\nbabynames.createOrReplaceTempView(\"babynames_table\")\nyears = spark.sql(\"select distinct(Year) from babynames_table\").rdd.map(lambda row : row[0]).collect()\nyears.sort()\ndbutils.widgets.dropdown(\"year\", \"2014\", [str(x) for x in years])\ndisplay(babynames.filter(babynames.Year == dbutils.widgets.get(\"year\")))" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "47f38de3-3b70-4221-9aee-d900c38e2b46", - "type": "message", - "text": "this is the script that i am running <@U02MK6YNAQ5>…kindly let me know if i’m doing any mistake. I have added the init script at the cluster level and from the logs i could see that openlineage is configured as i see a log statement", - "user": "U05QL7LN2GH", - "ts": "1696842489.837109", - "blocks": [ - { - "type": "rich_text", - "block_id": "TAZ4O", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "this is the script that i am running " - }, - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": "…kindly let me know if i’m doing any mistake. I have added the init script at the cluster level and from the logs i could see that openlineage is configured as i see a log statement" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "0c59edb2-d1a5-4d73-bc86-c3f5d8195a99", - "type": "message", - "text": "there's nothing wrong in that script. It's just we decided to limit amount of OL events for jobs that don't write their data anywhere and just do `collect` operation", - "user": "U02MK6YNAQ5", - "ts": "1696842630.801999", - "blocks": [ - { - "type": "rich_text", - "block_id": "7MwlU", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "there's nothing wrong in that script. It's just we decided to limit amount of OL events for jobs that don't write their data anywhere and just do " - }, - { - "type": "text", - "text": "collect", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " operation" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "d6540e9a-2c67-494e-8362-fd911feda346", - "type": "message", - "text": "this is also a potential reason why can't you see any events", - "user": "U02MK6YNAQ5", - "ts": "1696842662.946709", - "blocks": [ - { - "type": "rich_text", - "block_id": "RjYDU", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "this is also a potential reason why can't you see any events" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "f7360d94-2779-478b-8a55-bd6c36c10ad6", - "type": "message", - "text": "ohh…okk, will try out the test script that you have mentioned above. Kindly correct me if my understanding is correct, so if there are a few transformatiosna nd finally writing somewhere that is where the OL events are expected to be emitted?", - "user": "U05QL7LN2GH", - "ts": "1696842873.465329", - "blocks": [ - { - "type": "rich_text", - "block_id": "575as", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "ohh…okk, will try out the test script that you have mentioned above. Kindly correct me if my understanding is correct, so if there are a few transformatiosna nd finally writing somewhere that is where the OL events are expected to be emitted?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "d7d6a2b8-8149-43bc-a904-b2fd82116e4b", - "type": "message", - "text": "yes. main purpose of the lineage is to track dependencies between the datasets, when a job reads from dataset A and writes to dataset B. In case of databricks notebook, that do `show` or `collect` and print some query result on the screen, there may be no reason to track it in the sense of lineage.", - "user": "U02MK6YNAQ5", - "ts": "1696843014.053669", - "blocks": [ - { - "type": "rich_text", - "block_id": "NZ3aU", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yes. main purpose of the lineage is to track dependencies between the datasets, when a job reads from dataset A and writes to dataset B. In case of databricks notebook, that do " - }, - { - "type": "text", - "text": "show", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " or " - }, - { - "type": "text", - "text": "collect", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " and print some query result on the screen, there may be no reason to track it in the sense of lineage." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696823976.297949", - "parent_user_id": "U05QL7LN2GH" - } - ] - }, - { - "client_msg_id": "bc8de3a0-c11b-4ac1-9d0c-b2d67771f442", - "type": "message", - "text": "<@U02LXF3HUN7> can we cut a new release to include this change?\n• ", - "user": "U01HVNU6A4C", - "ts": "1696591141.778179", - "blocks": [ - { - "type": "rich_text", - "block_id": "DOR9k", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02LXF3HUN7" - }, - { - "type": "text", - "text": " can we cut a new release to include this change?\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2151" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1696337382, - "color": "6f42c1", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/pull/2151", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2151 Allow setting client's endpoint via environment variable", - "text": "*Problem*\n\nCurrently, it's not possible to set the OpenLineage endpoint (hard-coded to `/api/v1/lineage`) using an environment variable when running the Airflow integration.\n\n*Solution*\n\nGiven that it's not possible to create the client manually in Airflow, especially now that OpenLineage has become an official Airflow provider, this change seems like the only feasible solution.\n\n☐ Your change modifies the OpenLineage model\n☐ Your change modifies one or more OpenLineage \n\n*One-line summary:*\n\nAllow setting client's endpoint via environment variable.\n\n*Checklist*\n\n☑︎ You've your work\n☑︎ Your pull request title follows our \n☑︎ Your changes are accompanied by tests (_if relevant_)\n☑︎ Your change contains a and is self-contained\n☑︎ You've updated any relevant documentation (_if relevant_)\n☑︎ Your comment includes a one-liner for the changelog about the specific purpose of the change (_if necessary_)\n☐ You've versioned the core OpenLineage model or facets according to (_if relevant_)\n☐ You've added a to source files (_if relevant_)\n\n* * *\n\nSPDX-License-Identifier: Apache-2.0 \nCopyright 2018-2023 contributors to the OpenLineage project", - "title": "#2151 Allow setting client's endpoint via environment variable", - "title_link": "https://github.com/OpenLineage/OpenLineage/pull/2151", - "footer": "", - "fields": [ - { - "value": "documentation, client/python", - "title": "Labels", - "short": true - }, - { - "value": "6", - "title": "Comments", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1696591141.778179", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1696634162.280029", - "reply_users": [ - "U02LXF3HUN7" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1696634162.280029", - "reactions": [ - { - "name": "heavy_plus_sign", - "users": [ - "U01HNKK4XAM", - "U02S6F54MAB", - "U01DCLP0GU9", - "U02LXF3HUN7", - "U01RA9B5GG2" - ], - "count": 5 - } - ], - "replies": [ - { - "client_msg_id": "24ac081b-fae5-447d-8767-8b560faa18db", - "type": "message", - "text": "Thanks for requesting a release, <@U01HVNU6A4C>. It has been approved and will be initiated within 2 business days of next Monday.", - "user": "U02LXF3HUN7", - "ts": "1696634162.280029", - "blocks": [ - { - "type": "rich_text", - "block_id": "IA9Xp", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks for requesting a release, " - }, - { - "type": "user", - "user_id": "U01HVNU6A4C" - }, - { - "type": "text", - "text": ". It has been approved and will be initiated within 2 business days of next Monday." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696591141.778179", - "parent_user_id": "U01HVNU6A4C", - "reactions": [ - { - "name": "pray", - "users": [ - "U01HVNU6A4C" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "D5AAA7E8-6425-4534-B01B-2BB732AD3E31", - "type": "message", - "text": "The Marquez meetup in San Francisco is happening right now!\n", - "user": "U01DCLP0GU9", - "ts": "1696552840.350759", - "blocks": [ - { - "type": "rich_text", - "block_id": "MOzWE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "The Marquez meetup in San Francisco is happening right now!\n" - }, - { - "type": "link", - "url": "https://www.meetup.com/meetup-group-bnfqymxe/events/295444209/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "text": "https://www.meetup.com/meetup-group-bnfqymxe/events/295444209/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U01DCLP0GU9", - "ts": "1696553106.000000" - }, - "attachments": [ - { - "image_url": "https://secure.meetupstatic.com/photos/event/a/1/8/c/600_515141356.jpeg", - "image_width": 600, - "image_height": 338, - "image_bytes": 12395, - "from_url": "https://www.meetup.com/meetup-group-bnfqymxe/events/295444209/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "service_icon": "https://secure.meetupstatic.com/next/images/general/m_swarm_120x120.png", - "id": 1, - "original_url": "https://www.meetup.com/meetup-group-bnfqymxe/events/295444209/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "fallback": "Meetup: Marquez Meetup @ Astronomer, Thu, Oct 5, 2023, 5:30 PM | Meetup", - "text": "Join us on Thursday, October 5th, from 5:30-8:30 pm to learn about the Marquez project. Meet other members of the community, get tips on making the most of the latest impro", - "title": "Marquez Meetup @ Astronomer, Thu, Oct 5, 2023, 5:30 PM | Meetup", - "title_link": "https://www.meetup.com/meetup-group-bnfqymxe/events/295444209/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "service_name": "Meetup" - } - ], - "reactions": [ - { - "name": "tada", - "users": [ - "U02MK6YNAQ5", - "U05TU0U224A" - ], - "count": 2 - } - ] - }, - { - "type": "message", - "subtype": "thread_broadcast", - "text": "I have created a ticket to make this easier to find. Once I get more feedback I’ll turn it into a md file in the repo: \n", - "user": "U01DCLP0GU9", - "ts": "1696541652.452819", - "thread_ts": "1694737381.437569", - "root": { - "client_msg_id": "32e40b58-9b35-45a3-99aa-e37404cd6329", - "type": "message", - "text": "Per discussion in the OpenLineage sync today here is a very early strawman proposal for an OpenLineage registry that producers and consumers could be registered in.\nFeedback or alternate proposals welcome\n\nOnce this is sufficiently fleshed out, I’ll create an actual proposal on github", - "user": "U01DCLP0GU9", - "ts": "1694737381.437569", - "blocks": [ - { - "type": "rich_text", - "block_id": "KKjtL", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Per discussion in the OpenLineage sync today here is a very early strawman proposal for an OpenLineage registry that producers and consumers could be registered in.\nFeedback or alternate proposals welcome\n" - }, - { - "type": "link", - "url": "https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit" - }, - { - "type": "text", - "text": "\nOnce this is sufficiently fleshed out, I’ll create an actual proposal on github" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694737381.437569", - "reply_count": 2, - "reply_users_count": 1, - "latest_reply": "1696541652.452819", - "reply_users": [ - "U01DCLP0GU9" - ], - "is_locked": false, - "subscribed": false - }, - "blocks": [ - { - "type": "rich_text", - "block_id": "EbQGP", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I have created a ticket to make this easier to find. Once I get more feedback I’ll turn it into a md file in the repo: " - }, - { - "type": "link", - "url": "https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit#heading=h.enpbmvu7n8gu" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/2161" - } - ] - } - ] - } - ], - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1696541261, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/issues/2161", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2161 [PROPOSAL] Add a Registry of Producers and Consumers in OpenLineage", - "text": "*Purpose*\n\nThis is the early stage of an idea to get community feedback on what an OpenLineage registry for producers, custom facets and consumers could be. Once this document is stable enough, I’ll create an official proposal on the OpenLineage repo.\n\n*Goal*\n\nAllow third parties to register their implementations or custom extensions to make them easy to discover. \nShorten “Producer” and “schema url” values\n\n*Proposed implementation*\n\nCurrent draft for discussion:\n\n", - "title": "#2161 [PROPOSAL] Add a Registry of Producers and Consumers in OpenLineage", - "title_link": "https://github.com/OpenLineage/OpenLineage/issues/2161", - "footer": "", - "fields": [ - { - "value": "proposal", - "title": "Labels", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "client_msg_id": "b2b64b49-4c25-427f-b1ae-f1fa92aad028", - "edited": { - "user": "U01DCLP0GU9", - "ts": "1696541673.000000" - } - }, - { - "client_msg_id": "3406f12f-2796-42b3-8c6a-37131a83311e", - "type": "message", - "text": "**\nThis month’s TSC meeting is next Thursday the 12th at 10am PT. On the tentative agenda:\n• announcements\n• recent releases\n• Airflow Summit recap\n• tutorial: migrating to the Airflow Provider\n• discussion topic: observability for OpenLineage/Marquez\n• open discussion\n• more (TBA)\nMore info and the meeting link can be found on the . All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda.", - "user": "U02LXF3HUN7", - "ts": "1696531454.431629", - "blocks": [ - { - "type": "rich_text", - "block_id": "TBGz6", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "\nThis month’s TSC meeting is next Thursday the 12th at 10am PT. On the tentative agenda:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "announcements" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "recent releases" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Airflow Summit recap" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "tutorial: migrating to the Airflow Provider" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "discussion topic: observability for OpenLineage/Marquez" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "open discussion" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "more (TBA)" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "More info and the meeting link can be found on the " - }, - { - "type": "link", - "url": "https://openlineage.io/meetings/", - "text": "website" - }, - { - "type": "text", - "text": ". All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02LXF3HUN7", - "ts": "1696888845.000000" - }, - "attachments": [ - { - "from_url": "https://openlineage.io/meetings/", - "service_icon": "https://openlineage.io/img/favicon.ico", - "id": 1, - "original_url": "https://openlineage.io/meetings/", - "fallback": "TSC Meetings | OpenLineage", - "text": "The OpenLineage Technical Steering Committee meets monthly, and is open to all.", - "title": "TSC Meetings | OpenLineage", - "title_link": "https://openlineage.io/meetings/", - "service_name": "openlineage.io" - } - ], - "reactions": [ - { - "name": "eyes", - "users": [ - "U0323HG8C8H", - "U0544QC1DS9", - "U01SW738WCF" - ], - "count": 3 - } - ] - }, - { - "type": "message", - "subtype": "thread_broadcast", - "text": "I have cleaned up the registry proposal.\n\nIn particular:\n• I clarified that option 2 is preferred at this point.\n• I moved discussion notes to the bottom. they will go away at some point\n• Once it is stable, I’ll create a with the preferred option.\n• we need a good proposal for the core facets prefix. My suggestion is to move core facets to `core` in the registry. The drawback is prefix would be inconsistent.\n", - "user": "U01DCLP0GU9", - "ts": "1696379615.265919", - "thread_ts": "1694737381.437569", - "root": { - "client_msg_id": "32e40b58-9b35-45a3-99aa-e37404cd6329", - "type": "message", - "text": "Per discussion in the OpenLineage sync today here is a very early strawman proposal for an OpenLineage registry that producers and consumers could be registered in.\nFeedback or alternate proposals welcome\n\nOnce this is sufficiently fleshed out, I’ll create an actual proposal on github", - "user": "U01DCLP0GU9", - "ts": "1694737381.437569", - "blocks": [ - { - "type": "rich_text", - "block_id": "KKjtL", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Per discussion in the OpenLineage sync today here is a very early strawman proposal for an OpenLineage registry that producers and consumers could be registered in.\nFeedback or alternate proposals welcome\n" - }, - { - "type": "link", - "url": "https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit" - }, - { - "type": "text", - "text": "\nOnce this is sufficiently fleshed out, I’ll create an actual proposal on github" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694737381.437569", - "reply_count": 2, - "reply_users_count": 1, - "latest_reply": "1696541652.452819", - "reply_users": [ - "U01DCLP0GU9" - ], - "is_locked": false, - "subscribed": false - }, - "blocks": [ - { - "type": "rich_text", - "block_id": "UD6d9", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I have cleaned up the registry proposal.\n" - }, - { - "type": "link", - "url": "https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit" - }, - { - "type": "text", - "text": "\nIn particular:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I clarified that option 2 is preferred at this point." - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I moved discussion notes to the bottom. they will go away at some point" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Once it is stable, I’ll create a " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/main/proposals", - "text": "proposal" - }, - { - "type": "text", - "text": " with the preferred option." - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "we need a good proposal for the core facets prefix. My suggestion is to move core facets to " - }, - { - "type": "text", - "text": "core", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " in the registry. The drawback is prefix would be inconsistent." - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [] - } - ] - } - ], - "client_msg_id": "4f7a5bb9-269f-4e6d-98b4-669b2760c1bf" - }, - { - "client_msg_id": "dba3fb6e-73e4-4aad-857f-f45506333a4d", - "type": "message", - "text": "Hey everyone - does anyone have a good mechanism for alerting on issues with open lineage? For example, maybe alerting when an event times out - perhaps to prometheus or some other kind of generic endpoint? Not sure the best approach here (if the meta inf extension would be able to achieve it)", - "user": "U03D8K119LJ", - "ts": "1696350897.139129", - "blocks": [ - { - "type": "rich_text", - "block_id": "LXBVs", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hey everyone - does anyone have a good mechanism for alerting on issues with open lineage? For example, maybe alerting when an event times out - perhaps to prometheus or some other kind of generic endpoint? Not sure the best approach here (if the meta inf extension would be able to achieve it)" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696350897.139129", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1696402862.520309", - "reply_users": [ - "U02MK6YNAQ5" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "f2d5f6c3-57ee-4ef2-88b0-6cf59d7f2505", - "type": "message", - "text": "That's a great usecase for OpenLineage. Unfortunately, we don't have any doc or recomendation on that.\n\nI would try using FluentD proxy we have () to copy event stream (alerting is just one of usecases for lineage events) and write fluentd plugin to send it asynchronously further to alerting service like PagerDuty.\n\nIt looks cool to me but I never had enough time to test this approach.", - "user": "U02MK6YNAQ5", - "ts": "1696402862.520309", - "blocks": [ - { - "type": "rich_text", - "block_id": "6FxCH", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "That's a great usecase for OpenLineage. Unfortunately, we don't have any doc or recomendation on that.\n\nI would try using FluentD proxy we have (" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/main/proxy/fluentd" - }, - { - "type": "text", - "text": ") to copy event stream (alerting is just one of usecases for lineage events) and write fluentd plugin to send it asynchronously further to alerting service like PagerDuty.\n\nIt looks cool to me but I never had enough time to test this approach." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696350897.139129", - "parent_user_id": "U03D8K119LJ", - "reactions": [ - { - "name": "+1", - "users": [ - "U03D8K119LJ" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "5916a03e-8696-4c6d-ab94-103622360360", - "type": "message", - "text": "\n*We released OpenLineage 1.3.1!*\n*Added:*\n• Airflow: add some basic stats to the Airflow integration `#1845` \n• Airflow: add columns as schema facet for `airflow.lineage.Table` (if defined) `#2138` \n• DBT: add SQLSERVER to supported dbt profile types `#2136` \n• Spark: support for latest 3.5 `#2118` \n*Fixed:*\n• Airflow: fix find-links path in tox `#2139` \n• Airflow: add more graceful logging when no OpenLineage provider installed `#2141` \n• Spark: fix bug in PathUtils’ `prepareDatasetIdentifierFromDefaultTablePath` (CatalogTable) to correctly preserve scheme from `CatalogTable`’s location `#2142` \nThanks to all the contributors, including new contributor <@U05TZE47F2S>!\n*Release:* \n*Changelog:* \n*Commit history:* \n*Maven:* \n*PyPI:* ", - "user": "U02LXF3HUN7", - "ts": "1696344963.496819", - "blocks": [ - { - "type": "rich_text", - "block_id": "0+5XB", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "We released OpenLineage 1.3.1!", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Added:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Airflow: add some basic stats to the Airflow integration " - }, - { - "type": "text", - "text": "#1845", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/harels", - "text": "@harels", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Airflow: add columns as schema facet for " - }, - { - "type": "text", - "text": "airflow.lineage.Table", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " (if defined) " - }, - { - "type": "text", - "text": "#2138", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/erikalfthan", - "text": "@erikalfthan", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "DBT: add SQLSERVER to supported dbt profile types " - }, - { - "type": "text", - "text": "#2136", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/erikalfthan", - "text": "@erikalfthan", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark: support for latest 3.5 " - }, - { - "type": "text", - "text": "#2118", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/pawel-big-lebowski", - "text": "@pawel-big-lebowski", - "unsafe": true - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Fixed:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Airflow: fix find-links path in tox " - }, - { - "type": "text", - "text": "#2139", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/JDarDagran", - "text": "@JDarDagran", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Airflow: add more graceful logging when no OpenLineage provider installed " - }, - { - "type": "text", - "text": "#2141", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/JDarDagran", - "text": "@JDarDagran", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark: fix bug in PathUtils’ " - }, - { - "type": "text", - "text": "prepareDatasetIdentifierFromDefaultTablePath", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " (CatalogTable) to correctly preserve scheme from " - }, - { - "type": "text", - "text": "CatalogTable", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "’s location " - }, - { - "type": "text", - "text": "#2142", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/d-m-h", - "text": "@d-m-h", - "unsafe": true - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks to all the contributors, including new contributor " - }, - { - "type": "user", - "user_id": "U05TZE47F2S" - }, - { - "type": "text", - "text": "!\n" - }, - { - "type": "text", - "text": "Release:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/releases/tag/1.3.1" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Changelog: ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Commit history:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/compare/1.2.2...1.3.1" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Maven:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://oss.sonatype.org/#nexus-search;quick~openlineage" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "PyPI:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://pypi.org/project/openlineage-python/" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02LXF3HUN7", - "ts": "1696346004.000000" - }, - "thread_ts": "1696344963.496819", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1696419779.225109", - "reply_users": [ - "U01HVNU6A4C" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1696419779.225109", - "reactions": [ - { - "name": "+1", - "users": [ - "U05T8BJD4DU", - "U05KKM07PJP", - "U05QA2D1XNV", - "U01HVNU6A4C" - ], - "count": 4 - }, - { - "name": "tada", - "users": [ - "U0323HG8C8H" - ], - "count": 1 - } - ], - "replies": [ - { - "client_msg_id": "08089f4a-1e26-4692-91ee-b81c4e791143", - "type": "message", - "text": "Any chance we can do a 1.3.2 soonish to include instead of waiting for the next monthly release?", - "user": "U01HVNU6A4C", - "ts": "1696419779.225109", - "blocks": [ - { - "type": "rich_text", - "block_id": "qcdyH", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Any chance we can do a 1.3.2 soonish to include " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2151" - }, - { - "type": "text", - "text": " instead of waiting for the next monthly release?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1696337382, - "color": "6f42c1", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/pull/2151", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2151 Allow setting client's endpoint via environment variable", - "text": "*Problem*\n\nCurrently, it's not possible to set the OpenLineage endpoint (hard-coded to `/api/v1/lineage`) using an environment variable when running the Airflow integration.\n\n*Solution*\n\nGiven that it's not possible to create the client manually in Airflow, especially now that OpenLineage has become an official Airflow provider, this change seems like the only feasible solution.\n\n☐ Your change modifies the OpenLineage model\n☐ Your change modifies one or more OpenLineage \n\n*One-line summary:*\n\nAllow setting client's endpoint via environment variable.\n\n*Checklist*\n\n☑︎ You've your work\n☑︎ Your pull request title follows our \n☑︎ Your changes are accompanied by tests (_if relevant_)\n☑︎ Your change contains a and is self-contained\n☑︎ You've updated any relevant documentation (_if relevant_)\n☑︎ Your comment includes a one-liner for the changelog about the specific purpose of the change (_if necessary_)\n☐ You've versioned the core OpenLineage model or facets according to (_if relevant_)\n☐ You've added a to source files (_if relevant_)\n\n* * *\n\nSPDX-License-Identifier: Apache-2.0 \nCopyright 2018-2023 contributors to the OpenLineage project", - "title": "#2151 Allow setting client's endpoint via environment variable", - "title_link": "https://github.com/OpenLineage/OpenLineage/pull/2151", - "footer": "", - "fields": [ - { - "value": "documentation, client/python", - "title": "Labels", - "short": true - }, - { - "value": "4", - "title": "Comments", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1696344963.496819", - "parent_user_id": "U02LXF3HUN7" - } - ] - }, - { - "client_msg_id": "675885a1-9307-4628-842d-94abb653b406", - "type": "message", - "text": "Hi folks - I'm wondering if its just me, but does `io.openlineage:openlineage-sql-java:1.2.2` ship with the `arm64.dylib` binary? When i try and run code that uses the Java package on an Apple M1, the binary isn't found, The workaround is to checkout 1.2.2 and then build and publish it locally.", - "user": "U05FLJE4GDU", - "ts": "1696319076.770719", - "blocks": [ - { - "type": "rich_text", - "block_id": "n5OHe", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi folks - I'm wondering if its just me, but does " - }, - { - "type": "text", - "text": "io.openlineage:openlineage-sql-java:1.2.2", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " ship with the " - }, - { - "type": "text", - "text": "arm64.dylib", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " binary? When i try and run code that uses the Java package on an Apple M1, the binary isn't found, The workaround is to checkout 1.2.2 and then build and publish it locally." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696319076.770719", - "reply_count": 5, - "reply_users_count": 3, - "latest_reply": "1698076572.116649", - "reply_users": [ - "U02MK6YNAQ5", - "U05FLJE4GDU", - "U01RA9B5GG2" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "ab23da6a-0e7b-4a69-ae45-674463cffd7c", - "type": "message", - "text": "Not sure if I follow your question. Whenever OL is released, there is a script new-version.sh - being run and modify the codebase.\n\nSo, If you pull the code, it contains OL version that has not been released yet and in case of dependencies, one need to build them on their own.\n\nFor example, here Preparation section describes how to build openlineage-java and openlineage-sql in order to build openlineage-spark.", - "user": "U02MK6YNAQ5", - "ts": "1696338098.877059", - "blocks": [ - { - "type": "rich_text", - "block_id": "7ZhDU", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Not sure if I follow your question. Whenever OL is released, there is a script new-version.sh - " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/new-version.sh" - }, - { - "type": "text", - "text": " being run and modify the codebase.\n\nSo, If you pull the code, it contains OL version that has not been released yet and in case of dependencies, one need to build them on their own.\n\nFor example, here " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#preparation" - }, - { - "type": "text", - "text": " Preparation section describes how to build openlineage-java and openlineage-sql in order to build openlineage-spark." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/main/new-version.sh", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1696319076.770719", - "parent_user_id": "U05FLJE4GDU" - }, - { - "client_msg_id": "ee51bf80-f877-41b6-87e7-9ba830e97872", - "type": "message", - "text": "Hmm. Let's elaborate my use case a bit.\n\nWe run Apache Hive on-premise. Hive provides query execution hooks for pre-query, post-query, and I *think* failed query.\n\nAny way, as part of the hook, you're given the query string.\n\nSo I, naturally, tried to pass the query string into `OpenLineageSql.parse(Collections.singletonList(hookContext.getQueryPlan().getQueryStr()), \"hive\")` in order to test this out.\n\nI was using `openlineage-sql-java:1.2.2` at that time, and no matter what query string I gave it, *nothing* was returned.\n\nI then stepped through the code and noticed that it was looking for the `arm64` lib, and I noticed that that package (downloaded from maven central) lacked that particular native binary.", - "user": "U05FLJE4GDU", - "ts": "1696411646.822449", - "blocks": [ - { - "type": "rich_text", - "block_id": "BdeNT", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hmm. Let's elaborate my use case a bit.\n\nWe run Apache Hive on-premise. Hive provides query execution hooks for pre-query, post-query, and I " - }, - { - "type": "text", - "text": "think", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " failed query.\n\nAny way, as part of the hook, you're given the query string.\n\nSo I, naturally, tried to pass the query string into " - }, - { - "type": "text", - "text": "OpenLineageSql.parse(Collections.singletonList(hookContext.getQueryPlan().getQueryStr()), \"hive\")", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " in order to test this out.\n\nI was using " - }, - { - "type": "text", - "text": "openlineage-sql-java:1.2.2", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " at that time, and no matter what query string I gave it, " - }, - { - "type": "text", - "text": "nothing", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " was returned.\n\nI then stepped through the code and noticed that it was looking for the " - }, - { - "type": "text", - "text": "arm64", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " lib, and I noticed that that package (downloaded from maven central) lacked that particular native binary." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05FLJE4GDU", - "ts": "1696411650.000000" - }, - "thread_ts": "1696319076.770719", - "parent_user_id": "U05FLJE4GDU" - }, - { - "client_msg_id": "be4bbc1f-2209-4b5a-97e2-60b48eaac361", - "type": "message", - "text": "I hope that helps.", - "user": "U05FLJE4GDU", - "ts": "1696411656.832229", - "blocks": [ - { - "type": "rich_text", - "block_id": "R3Bea", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I hope that helps." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696319076.770719", - "parent_user_id": "U05FLJE4GDU", - "reactions": [ - { - "name": "+1", - "users": [ - "U02MK6YNAQ5" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "dfc061b2-4bba-4361-8d9f-6aef9649904b", - "type": "message", - "text": "I get in now. In Circle CI we do have 3 build steps:\n``` - build-integration-sql-x86\n - build-integration-sql-arm\n - build-integration-sql-macos```\nbut no mac m1. I think at that time circle CI did not have a proper resource class in free plan. Additionally, <@U01RA9B5GG2> would prefer to migrate this to github actions as he claims this can be achieved there in a cleaner way ().\n\nFeel free to create an issue for this. Others would be able to upvote it in case they have similar experience.", - "user": "U02MK6YNAQ5", - "ts": "1696424582.779769", - "blocks": [ - { - "type": "rich_text", - "block_id": "a95tz", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I get in now. In Circle CI we do have 3 build steps:\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": " - build-integration-sql-x86\n - build-integration-sql-arm\n - build-integration-sql-macos" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "but no mac m1. I think at that time circle CI did not have a proper resource class in free plan. Additionally, " - }, - { - "type": "user", - "user_id": "U01RA9B5GG2" - }, - { - "type": "text", - "text": " would prefer to migrate this to github actions as he claims this can be achieved there in a cleaner way (" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/1624" - }, - { - "type": "text", - "text": ").\n\nFeel free to create an issue for this. Others would be able to upvote it in case they have similar experience." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1676307633, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/issues/1624", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#1624 CI: PoC building openlineage-sql on GitHub Actions with actions-rs", - "text": "Building sql parser currently is the most convoluted CI process due to need to construct different binaries in multiple dimensions; both for Java and Python, and for multiple architectures; like Linux x86, Linux ARM, MacOS x86 etc. The jobs also differ in different context: release workflow has different jobs than build one, which in turn does not build all of the architectures.\n\nTo simplify that, we should try using GitHub Actions with that should solve the problems we've currently had to replicate manually.\n\nEnd result of that task should be having various SQL artifacts produced by GitHub actions and available by GH Actions artifacts API:", - "title": "#1624 CI: PoC building openlineage-sql on GitHub Actions with actions-rs", - "title_link": "https://github.com/OpenLineage/OpenLineage/issues/1624", - "footer": "", - "fields": [ - { - "value": "", - "title": "Assignees", - "short": true - }, - { - "value": "ci, integration/sql", - "title": "Labels", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1696319076.770719", - "parent_user_id": "U05FLJE4GDU" - }, - { - "client_msg_id": "35f6d93f-4aa6-4076-8694-f9320acdba2b", - "type": "message", - "text": "It doesn't have the free resource class still :disappointed:\nWe're blocked on that unfortunately. Other solution would be to migrate to GH actions, where most of our solution could be replaced by something like that ", - "user": "U01RA9B5GG2", - "ts": "1698076572.116649", - "blocks": [ - { - "type": "rich_text", - "block_id": "/SjVu", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "It doesn't have the free resource class still " - }, - { - "type": "emoji", - "name": "disappointed", - "unicode": "1f61e" - }, - { - "type": "text", - "text": "\nWe're blocked on that unfortunately. Other solution would be to migrate to GH actions, where most of our solution could be replaced by something like that " - }, - { - "type": "link", - "url": "https://github.com/PyO3/maturin-action" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/PyO3/maturin-action", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "PyO3/maturin-action", - "text": "GitHub Action to install and run a custom maturin command with built-in support for cross compilation", - "title": "PyO3/maturin-action", - "fields": [ - { - "value": "98", - "title": "Stars", - "short": true - }, - { - "value": "TypeScript", - "title": "Language", - "short": true - } - ] - } - ], - "thread_ts": "1696319076.770719", - "parent_user_id": "U05FLJE4GDU" - } - ] - }, - { - "client_msg_id": "9be7823a-a74a-457e-a605-351e82874e29", - "type": "message", - "text": "\nThe September issue of is here! This issue covers the big news about OpenLineage coming out of Airflow Summit, progress on the Airflow Provider, highlights from our meetup in Toronto, and much more.\nTo get the newsletter directly in your inbox each month, sign up .", - "user": "U02LXF3HUN7", - "ts": "1696264108.497989", - "blocks": [ - { - "type": "rich_text", - "block_id": "uK1Ve", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nThe September issue of " - }, - { - "type": "link", - "url": "https://mailchi.mp/3b9bbb3eba23/openlineage-news-july-9591485?e=ce16eef4ef", - "text": "OpenLineage News" - }, - { - "type": "text", - "text": " is here! This issue covers the big news about OpenLineage coming out of Airflow Summit, progress on the Airflow Provider, highlights from our meetup in Toronto, and much more.\nTo get the newsletter directly in your inbox each month, sign up " - }, - { - "type": "link", - "url": "http://bit.ly/OL_news", - "text": "here" - }, - { - "type": "text", - "text": "." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "http://bit.ly/OL_news", - "id": 1, - "original_url": "http://bit.ly/OL_news", - "fallback": "OpenLineage Project", - "text": "OpenLineage Project Email Forms", - "title": "OpenLineage Project", - "title_link": "http://bit.ly/OL_news", - "service_name": "apache.us14.list-manage.com" - } - ], - "reactions": [ - { - "name": "duck", - "users": [ - "U01HNKK4XAM", - "U02MK6YNAQ5" - ], - "count": 2 - }, - { - "name": "fire", - "users": [ - "U01DCMDFHBK", - "U02S6F54MAB", - "U02MK6YNAQ5" - ], - "count": 3 - } - ] - }, - { - "client_msg_id": "a754cd47-0e9f-4a33-b417-c0dc64ccfe5e", - "type": "message", - "text": "\nHello all, I’d like to open a vote to release OpenLineage 1.3.0, including:\n• support for Spark 3.5 in the Spark integration\n• scheme preservation bug fix in the Spark integration\n• find-links path in tox bug in the Airflow integration fix\n• more graceful logging when no OL provider is installed in the Airflow integration\n• columns as schema facet for airflow.lineage.Table addition\n• SQLSERVER to supported dbt profile types addition\nThree +1s from committers will authorize. Thanks in advance.", - "user": "U02LXF3HUN7", - "ts": "1696262312.791719", - "blocks": [ - { - "type": "rich_text", - "block_id": "QqIB1", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nHello all, I’d like to open a vote to release OpenLineage 1.3.0, including:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "support for Spark 3.5 in the Spark integration" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "scheme preservation bug fix in the Spark integration" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "find-links path in tox bug in the Airflow integration fix" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "more graceful logging when no OL provider is installed in the Airflow integration" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "columns as schema facet for airflow.lineage.Table addition" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "SQLSERVER to supported dbt profile types addition" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Three +1s from committers will authorize. Thanks in advance." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696262312.791719", - "reply_count": 12, - "reply_users_count": 3, - "latest_reply": "1696532708.916989", - "reply_users": [ - "U02LXF3HUN7", - "U05T8BJD4DU", - "U01HNKK4XAM" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1696532708.916989", - "reactions": [ - { - "name": "raised_hands", - "users": [ - "U01HNKK4XAM", - "U02MK6YNAQ5", - "U05TU0U224A" - ], - "count": 3 - }, - { - "name": "+1", - "users": [ - "U05T8BJD4DU", - "U02MK6YNAQ5" - ], - "count": 2 - }, - { - "name": "heavy_plus_sign", - "users": [ - "U01DCMDFHBK", - "U02S6F54MAB", - "U05TZE47F2S", - "U01DCLP0GU9" - ], - "count": 4 - } - ], - "replies": [ - { - "client_msg_id": "3c57a2da-2f2f-4456-8dbb-d5e5b6116ea4", - "type": "message", - "text": "Thanks all. The release is authorized and will be initiated within 2 business days.", - "user": "U02LXF3HUN7", - "ts": "1696280408.812339", - "blocks": [ - { - "type": "rich_text", - "block_id": "nrzDe", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks all. The release is authorized and will be initiated within 2 business days." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696262312.791719", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "f1310bc9-9c1a-4e4a-83a1-17da51247481", - "type": "message", - "text": "looking forward to that, I am seeing inconsistent results in Databricks for Spark 3.4+, sometimes there's no inputs / outputs, hope that is fixed?", - "user": "U05T8BJD4DU", - "ts": "1696281106.654189", - "blocks": [ - { - "type": "rich_text", - "block_id": "O2lR9", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "looking forward to that, I am seeing inconsistent results in Databricks for Spark 3.4+, sometimes there's no inputs / outputs, hope that is fixed?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696262312.791719", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "5565759c-a9c0-4bbf-8f2a-bb64e1d26903", - "type": "message", - "text": "<@U05T8BJD4DU> if it isn’t fixed for you, would love it if you could open up an issue that will allow us to reproduce and fix", - "user": "U01HNKK4XAM", - "ts": "1696341564.380589", - "blocks": [ - { - "type": "rich_text", - "block_id": "LbQzV", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U05T8BJD4DU" - }, - { - "type": "text", - "text": " if it isn’t fixed for you, would love it if you could open up an issue that will allow us to reproduce and fix" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696262312.791719", - "parent_user_id": "U02LXF3HUN7", - "reactions": [ - { - "name": "+1", - "users": [ - "U05T8BJD4DU" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "b95f84ae-a245-4be9-ad55-43fd85f18bc2", - "type": "message", - "text": "<@U01HNKK4XAM> the issue still exists -> Spark 3.4 and above, including 3.5, saveAsTable and create table won't have inputs and outputs in Databricks", - "user": "U05T8BJD4DU", - "ts": "1696379020.978749", - "blocks": [ - { - "type": "rich_text", - "block_id": "KFFSy", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U01HNKK4XAM" - }, - { - "type": "text", - "text": " the issue still exists -> Spark 3.4 and above, including 3.5, saveAsTable and create table won't have inputs and outputs in Databricks" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696262312.791719", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "1e2d90d3-c11b-405a-9f0d-35a107543a29", - "type": "message", - "text": "", - "user": "U05T8BJD4DU", - "ts": "1696379415.171809", - "blocks": [ - { - "type": "rich_text", - "block_id": "vmCqe", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/2124" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1695498902, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/issues/2124", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2124 Same Delta Table not catching the location on write", - "text": "*What is the target system?*\n\nSpark / Databricks\n\n*What kind of integration is this?*\n\n☐ Produces OpenLineage metadata\n☐ Consumes OpenLineage metadata\n☐ Something else\n\n*How should this integration be implemented?*\n\nI am using OL 1.2.2, Azure Databricks Runtime 11.3 LTS. When creating a table writing into a ADLS location, OL won't be able to catch the location of the output. But when I read the same object it will be able to read the location as INPUT.\n\nPlease note I have also tested Databricks Runtime 13.3 LTS, Spark 3.4.1 - it will give correct ADLS location in INPUT but the input will only show up once in a blue moon. Most of the time the inputs and outputs are blank.\n\n```\n \"inputs\": [],\n \"outputs\": []\n```\n\n```\nCREATE OR REPLACE TABLE transactions_adj\nUSING DELTA LOCATION ''\nAS\n SELECT\n household_id,\n basket_id,\n week_no,\n day,\n transaction_time,\n store_id,\n product_id,\n amount_list,\n campaign_coupon_discount,\n manuf_coupon_discount,\n manuf_coupon_match_discount,\n total_coupon_discount,\n instore_discount,\n amount_paid,\n units\n FROM (\n SELECT \n household_id,\n basket_id,\n week_no,\n day,\n transaction_time,\n store_id,\n product_id,\n COALESCE(sales_amount - discount_amount - coupon_discount - coupon_discount_match,0.0) as amount_list,\n CASE \n WHEN COALESCE(coupon_discount_match,0.0) = 0.0 THEN -1 * COALESCE(coupon_discount,0.0) \n ELSE 0.0 \n END as campaign_coupon_discount,\n CASE \n WHEN COALESCE(coupon_discount_match,0.0) != 0.0 THEN -1 * COALESCE(coupon_discount,0.0) \n ELSE 0.0 \n END as manuf_coupon_discount,\n -1 * COALESCE(coupon_discount_match,0.0) as manuf_coupon_match_discount,\n -1 * COALESCE(coupon_discount - coupon_discount_match,0.0) as total_coupon_discount,\n COALESCE(-1 * discount_amount,0.0) as instore_discount,\n COALESCE(sales_amount,0.0) as `amount_paid,`\n quantity as units\n FROM transactions\n );\n```\n\nHere's the COMPLETE event:\n\n```\n\n \"outputs\":[\n {\n \"namespace\":\"dbfs\",\n \"name\":\"/user/hive/warehouse/journey.db/transactions_adj\",\n \"facets\":{\n \"dataSource\":{\n \"_producer\":\"\",\n \"_schemaURL\":\"\",\n \"name\":\"dbfs\",\n \"uri\":\"dbfs\"\n },\n\n```\n\nBelow logical plan shows the path:\n\n```\n== Analyzed Logical Plan ==\nnum_affected_rows: bigint, num_inserted_rows: bigint\nReplaceTableAsSelect TableSpec(Map(),Some(DELTA),Map(),Some(),None,None,false,Set()), true\n:- ResolvedIdentifier com.databricks.sql.managedcatalog.UnityCatalogV2Proxy@6251a8df, default.transactions_adj\n+- Project [household_id#184, basket_id#185L, week_no#193, day#186, transaction_time#192, store_id#190, product_id#187, amount_list#147, campaign_coupon_discount#148, manuf_coupon_discount#149, manuf_coupon_match_discount#150, total_coupon_discount#151, instore_discount#152, amount_paid#153, units#154]\n +- SubqueryAlias __auto_generated_subquery_name\n +- Project [household_id#184, basket_id#185L, week_no#193, day#186, transaction_time#192, store_id#190, product_id#187, coalesce(cast((((sales_amount#189 - discount_amount#191) - coupon_discount#194) - coupon_discount_match#195) as double), cast(0.0 as double)) AS amount_list#147, CASE WHEN (coalesce(cast(coupon_discount_match#195 as double), cast(0.0 as double)) = cast(0.0 as double)) THEN (cast(-1 as double) * coalesce(cast(coupon_discount#194 as double), cast(0.0 as double))) ELSE cast(0.0 as double) END AS campaign_coupon_discount#148, CASE WHEN NOT (coalesce(cast(coupon_discount_match#195 as double), cast(0.0 as double)) = cast(0.0 as double)) THEN (cast(-1 as double) * coalesce(cast(coupon_discount#194 as double), cast(0.0 as double))) ELSE cast(0.0 as double) END AS manuf_coupon_discount#149, (cast(-1 as double) * coalesce(cast(coupon_discount_match#195 as double), cast(0.0 as double))) AS manuf_coupon_match_discount#150, (cast(-1 as double) * coalesce(cast((coupon_discount#194 - coupon_discount_match#195) as double), cast(0.0 as double))) AS total_coupon_discount#151, coalesce(cast((cast(-1 as float) * discount_amount#191) as double), cast(0.0 as double)) AS instore_discount#152, coalesce(cast(sales_amount#189 as double), cast(0.0 as double)) AS amount_paid#153, quantity#188 AS units#154]\n +- SubqueryAlias spark_catalog.default.transactions\n +- Relation spark_catalog.default.transactions[household_id#184,basket_id#185L,day#186,product_id#187,quantity#188,sales_amount#189,store_id#190,discount_amount#191,transaction_time#192,week_no#193,coupon_discount#194,coupon_discount_match#195] parquet\n```\n\n*Where should this integration be implemented?*\n\n☐ In the target system\n☐ In the OpenLineage repo\n☐ Somewhere else\n\n*Do you plan to make this contribution yourself?*\n\n☐ I am interested in doing this work", - "title": "#2124 Same Delta Table not catching the location on write", - "title_link": "https://github.com/OpenLineage/OpenLineage/issues/2124", - "footer": "", - "fields": [ - { - "value": "integration/spark, integration/databricks", - "title": "Labels", - "short": true - }, - { - "value": "1", - "title": "Comments", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1696262312.791719", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "0ba6dc08-dd99-4efe-acdb-2a46aab4e711", - "type": "message", - "text": "and of course this issue still exists", - "user": "U05T8BJD4DU", - "ts": "1696379421.415929", - "blocks": [ - { - "type": "rich_text", - "block_id": "CDeCf", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "and of course this issue still exists" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696262312.791719", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "e9527d1c-1b98-4db2-bc7b-8aa2404dc75b", - "type": "message", - "text": "thanks for posting, we’ll continue looking into this.. if you find any clues that might help, please let us know.", - "user": "U01HNKK4XAM", - "ts": "1696383909.178409", - "blocks": [ - { - "type": "rich_text", - "block_id": "YGzDL", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "thanks for posting, we’ll continue looking into this.. if you find any clues that might help, please let us know." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696262312.791719", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "10fc80ad-5ce4-4aca-82ad-7bc54be5ad6b", - "type": "message", - "text": "is there any instructions on how to hook up a debugger to OL?", - "user": "U05T8BJD4DU", - "ts": "1696383987.807139", - "blocks": [ - { - "type": "rich_text", - "block_id": "0nNm5", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "is there any instructions on how to hook up a debugger to OL?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696262312.791719", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "aa1993fa-e506-477f-8144-0f6787cfc5ee", - "type": "message", - "text": "<@U02MK6YNAQ5> has been working on adding a debug facet, but more suggestions are more than welcome!", - "user": "U01HNKK4XAM", - "ts": "1696424656.962049", - "blocks": [ - { - "type": "rich_text", - "block_id": "ETaAp", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " has been working on adding a debug facet, but more suggestions are more than welcome!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696262312.791719", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "18352e7e-1240-42ac-9d1c-e54a08aae259", - "type": "message", - "text": "", - "user": "U01HNKK4XAM", - "ts": "1696424758.746299", - "blocks": [ - { - "type": "rich_text", - "block_id": "bX9AD", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2147" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1696250044, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/pull/2147", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2147 [SPARK] add debug facet to help resolving Spark integration issues", - "text": "*Problem*\n\nDebugging openlineage-spark problems is tedious job. We would like to have debug facet that will collect automatically meaningful information when enabled.\n\nCloses: \n\n*Solution*\n\n• Create debug facet (more details in ),\n• Facet is disabled by default,\n• Allow enabling it thourgh SparkConf.\n\n> *Note:* All schema changes require discussion. Please for context.\n\n☐ Your change modifies the OpenLineage model\n☐ Your change modifies one or more OpenLineage \n\nIf you're contributing a new integration, please specify the scope of the integration and how/where it has been tested (e.g., Apache Spark integration supports `S3` and `GCS` filesystem operations, tested with AWS EMR).\n\n*One-line summary:*\n*Checklist*\n\n☑︎ You've your work\n☑︎ Your pull request title follows our \n☑︎ Your changes are accompanied by tests (_if relevant_)\n☑︎ Your change contains a and is self-contained\n☐ You've updated any relevant documentation (_if relevant_)\n☐ Your comment includes a one-liner for the changelog about the specific purpose of the change (_if necessary_)\n☐ You've versioned the core OpenLineage model or facets according to (_if relevant_)\n☐ You've added a to source files (_if relevant_)\n\n* * *\n\nSPDX-License-Identifier: Apache-2.0 \nCopyright 2018-2023 contributors to the OpenLineage project", - "title": "#2147 [SPARK] add debug facet to help resolving Spark integration issues", - "title_link": "https://github.com/OpenLineage/OpenLineage/pull/2147", - "footer": "", - "fields": [ - { - "value": "documentation, integration/spark", - "title": "Labels", - "short": true - }, - { - "value": "", - "title": "Assignees", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1696262312.791719", - "parent_user_id": "U02LXF3HUN7", - "reactions": [ - { - "name": "eyes", - "users": [ - "U02MK6YNAQ5" - ], - "count": 1 - }, - { - "name": "+1", - "users": [ - "U05T8BJD4DU" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "6f7e2f93-17d7-44b3-9c22-ff76b63fd77e", - "type": "message", - "text": "<@U02MK6YNAQ5> do you have a build for the PR? Appreciated!", - "user": "U05T8BJD4DU", - "ts": "1696490411.609729", - "blocks": [ - { - "type": "rich_text", - "block_id": "5uhEP", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " do you have a build for the PR? Appreciated!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696262312.791719", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "0e21ed3c-4dd4-4254-819f-1328d70a2f29", - "type": "message", - "text": "we’ll ask for a release once it’s reviewed and merged", - "user": "U01HNKK4XAM", - "ts": "1696532708.916989", - "blocks": [ - { - "type": "rich_text", - "block_id": "35mCF", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "we’ll ask for a release once it’s reviewed and merged" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1696262312.791719", - "parent_user_id": "U02LXF3HUN7" - } - ] - }, - { - "client_msg_id": "72d5b68a-2320-47d1-b453-c910e88a5be6", - "type": "message", - "text": "Are you located in the Brussels area or within commutable distance? Interested in attending a meetup between October 16-20? If so, please DM <@U0323HG8C8H> or myself. TIA", - "user": "U02LXF3HUN7", - "ts": "1695932184.205159", - "blocks": [ - { - "type": "rich_text", - "block_id": "EUBUD", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Are you located in the Brussels area or within commutable distance? Interested in attending a meetup between October 16-20? If so, please DM " - }, - { - "type": "user", - "user_id": "U0323HG8C8H" - }, - { - "type": "text", - "text": " or myself. TIA" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "reactions": [ - { - "name": "heart", - "users": [ - "U0323HG8C8H" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "94c47910-d424-44da-bd18-73646996e77c", - "type": "message", - "text": "Hello community\nFirst time poster - bear with me :)\n\nI am looking to make a minor PR on the airflow integration (fixing github #2130), and the code change is easy enough, but I fail to install the python environment. I have tried the simple ones\n`OpenLineage/integration/airflow > pip install -e .`\n or\n`OpenLineage/integration/airflow > pip install -r dev-requirements.txt`\nbut they both fail on\n`ERROR: No matching distribution found for openlineage-sql==1.3.0`\n\n(which I think is an unreleased version in the git project)\n\nHow would I go about to install the requirements?\n\n//Erik\n\nPS. Sorry for posting this in general if there is a specific integration or contribution channel - I didnt find a better channel", - "user": "U05TZE47F2S", - "ts": "1695883240.832669", - "blocks": [ - { - "type": "rich_text", - "block_id": "jFDc1", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hello community\nFirst time poster - bear with me :)\n\nI am looking to make a minor PR on the airflow integration (fixing github #2130), and the code change is easy enough, but I fail to install the python environment. I have tried the simple ones\n" - }, - { - "type": "text", - "text": "OpenLineage/integration/airflow > pip install -e .", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n or\n" - }, - { - "type": "text", - "text": "OpenLineage/integration/airflow > pip install -r dev-requirements.txt", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\nbut they both fail on\n" - }, - { - "type": "text", - "text": "ERROR: No matching distribution found for openlineage-sql==1.3.0", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n\n(which I think is an unreleased version in the git project)\n\nHow would I go about to install the requirements?\n\n//Erik\n\nPS. Sorry for posting this in general if there is a specific integration or contribution channel - I didnt find a better channel" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "reply_count": 36, - "reply_users_count": 3, - "latest_reply": "1695901421.859499", - "reply_users": [ - "U02MK6YNAQ5", - "U05TZE47F2S", - "U02S6F54MAB" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "47f70521-07a4-4520-b030-ebc1a81d0d3e", - "type": "message", - "text": "Hi <@U05TZE47F2S>, the channel is totally OK. I am not airflow integration expert, but what it looks to me, you're missing openlineage-sql library, which is a rust library used to extract lineage from sql queries. This is how we do that in circle ci:\n\n\nand subproject page with build instructions: ", - "user": "U02MK6YNAQ5", - "ts": "1695884688.471259", - "blocks": [ - { - "type": "rich_text", - "block_id": "/qKFV", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi " - }, - { - "type": "user", - "user_id": "U05TZE47F2S" - }, - { - "type": "text", - "text": ", the channel is totally OK. I am not airflow integration expert, but what it looks to me, you're missing openlineage-sql library, which is a rust library used to extract lineage from sql queries. This is how we do that in circle ci:\n" - }, - { - "type": "link", - "url": "https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/8080/workflows/aba53369-836c-48f5-a2dd-51bc0740a31c/jobs/140113" - }, - { - "type": "text", - "text": "\n\nand subproject page with build instructions: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/main/integration/sql" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "f864f8f2-d620-4863-b71f-ef587e9d955f", - "type": "message", - "text": "Ok, so I go and \"manually\" build the internal dependency so that it becomes available in the pip cache?\n\nI was hoping for something more automagical, but that should work", - "user": "U05TZE47F2S", - "ts": "1695884843.168339", - "blocks": [ - { - "type": "rich_text", - "block_id": "LDHKU", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Ok, so I go and \"manually\" build the internal dependency so that it becomes available in the pip cache?\n\nI was hoping for something more automagical, but that should work" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "335b4e75-5126-48ac-8260-5ead31fa5850", - "type": "message", - "text": "I think so. <@U02S6F54MAB> am I right?", - "user": "U02MK6YNAQ5", - "ts": "1695884886.561369", - "blocks": [ - { - "type": "rich_text", - "block_id": "02eTI", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I think so. " - }, - { - "type": "user", - "user_id": "U02S6F54MAB" - }, - { - "type": "text", - "text": " am I right?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "ea8dcd7b-1269-414c-acd9-d2ed0b9870ed", - "type": "message", - "text": "\nthere’s a guide how to setup the dev environment\n\n> Typically, you first need to build `openlineage-sql` locally (see ). After each release you have to repeat this step in order to bump local version of the package.\nThis might be somewhat exposed more in GitHub repository README as well", - "user": "U02S6F54MAB", - "ts": "1695885507.142819", - "blocks": [ - { - "type": "rich_text", - "block_id": "0thOE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://openlineage.io/docs/development/developing/python/setup" - }, - { - "type": "text", - "text": "\nthere’s a guide how to setup the dev environment\n\n" - } - ] - }, - { - "type": "rich_text_quote", - "elements": [ - { - "type": "text", - "text": "Typically, you first need to build " - }, - { - "type": "text", - "text": "openlineage-sql", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " locally (see " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/sql/README.md", - "text": "README" - }, - { - "type": "text", - "text": "). After each release you have to repeat this step in order to bump local version of the package." - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "\nThis might be somewhat exposed more in GitHub repository README as well" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "346618b6-45c3-4dfc-9288-225a583d5d53", - "type": "message", - "text": "It didnt find the wheel in the cache, but if I used the line in the sql/README.md\n`pip install openlineage-sql --no-index --find-links ../target/wheels --force-reinstall`\nIt is installed and thus skipped/passed when pip later checks if it needs to be installed.\n\nNow I have a second issue because it is expecting me to have mysqlclient-2.2.0 which seems to need a binary\n`Command 'pkg-config --exists mysqlclient' returned non-zero exit status 127`\nand\n`Command 'pkg-config --exists mariadb' returned non-zero exit status 127`\nI am on Ubuntu 22.04 in WSL2. Should I go to apt and grab me a mysql client?", - "user": "U05TZE47F2S", - "ts": "1695886040.154289", - "blocks": [ - { - "type": "rich_text", - "block_id": "FhOUD", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "It didnt find the wheel in the cache, but if I used the line in the sql/README.md\n" - }, - { - "type": "text", - "text": "pip install openlineage-sql --no-index --find-links ../target/wheels --force-reinstall", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\nIt is installed and thus skipped/passed when pip later checks if it needs to be installed.\n\nNow I have a second issue because it is expecting me to have mysqlclient-2.2.0 which seems to need a binary\n" - }, - { - "type": "text", - "text": "Command 'pkg-config --exists mysqlclient' returned non-zero exit status 127", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\nand\n" - }, - { - "type": "text", - "text": "Command 'pkg-config --exists mariadb' returned non-zero exit status 127", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\nI am on Ubuntu 22.04 in WSL2. Should I go to apt and grab me a mysql client?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "fe442113-abae-42b0-9b2b-4daefeabf518", - "type": "message", - "text": "> It didnt find the wheel in the cache, but if I used the line in the sql/README.md\n> `pip install openlineage-sql --no-index --find-links ../target/wheels --force-reinstall`\n> It is installed and thus skipped/passed when pip later checks if it needs to be installed.\nThat’s actually expected. You should build new wheel locally and then install it.\n\n> Now I have a second issue because it is expecting me to have mysqlclient-2.2.0 which seems to need a binary\n> `Command 'pkg-config --exists mysqlclient' returned non-zero exit status 127`\n> and\n> `Command 'pkg-config --exists mariadb' returned non-zero exit status 127`\n> I am on Ubuntu 22.04 in WSL2. Should I go to apt and grab me a mysql client?\nWe’ve left some system specific configuration, e.g. mysqlclient, to users as it’s a bit aside from OpenLineage and more of general development task.\n\nprobably\n```sudo apt-get install python3-dev default-libmysqlclient-dev build-essential ```\nshould work", - "user": "U02S6F54MAB", - "ts": "1695886312.730749", - "blocks": [ - { - "type": "rich_text", - "block_id": "M+Muy", - "elements": [ - { - "type": "rich_text_quote", - "elements": [ - { - "type": "text", - "text": "It didnt find the wheel in the cache, but if I used the line in the sql/README.md\n" - }, - { - "type": "text", - "text": "pip install openlineage-sql --no-index --find-links ../target/wheels --force-reinstall", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\nIt is installed and thus skipped/passed when pip later checks if it needs to be installed." - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "That’s actually expected. You should build new wheel locally and then install it.\n\n" - } - ] - }, - { - "type": "rich_text_quote", - "elements": [ - { - "type": "text", - "text": "Now I have a second issue because it is expecting me to have mysqlclient-2.2.0 which seems to need a binary\n" - }, - { - "type": "text", - "text": "Command 'pkg-config --exists mysqlclient' returned non-zero exit status 127", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\nand\n" - }, - { - "type": "text", - "text": "Command 'pkg-config --exists mariadb' returned non-zero exit status 127", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\nI am on Ubuntu 22.04 in WSL2. Should I go to apt and grab me a mysql client?" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "We’ve left some system specific configuration, e.g. mysqlclient, to users as it’s a bit aside from OpenLineage and more of general development task.\n\nprobably\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "sudo apt-get install python3-dev default-libmysqlclient-dev build-essential " - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "should work" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "2dac18f3-26e2-404e-97ba-7a7cf81b0803", - "type": "message", - "text": "I just realized that I should probably skip setting up my wsl and just run the tests in the docker setup you prepared", - "user": "U05TZE47F2S", - "ts": "1695886324.782519", - "blocks": [ - { - "type": "rich_text", - "block_id": "x/lLc", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I just realized that I should probably skip setting up my wsl and just run the tests in the docker setup you prepared" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "08cfc36c-c3da-4f09-9d87-6da0ea9adc64", - "type": "message", - "text": "You could do that as well but if you want to test your changes vs many Airflow versions that wouldn’t be possible I think (run them with tox btw)", - "user": "U02S6F54MAB", - "ts": "1695886546.366809", - "blocks": [ - { - "type": "rich_text", - "block_id": "hmzbn", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "You could do that as well but if you want to test your changes vs many Airflow versions that wouldn’t be possible I think (run them with tox btw)" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "e1c50bc4-a763-4158-96ef-72e326321ef9", - "type": "message", - "text": "This is starting to feel like a rabbit hole :disappointed:\n\nWhen I run tox, I get a lot of build errors\n• client needs to be built\n• sql needs to be built to a different target than its readme says\n• a lot of builds fail on cython_sources", - "user": "U05TZE47F2S", - "ts": "1695891279.619909", - "blocks": [ - { - "type": "rich_text", - "block_id": "CF/Le", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "This is starting to feel like a rabbit hole " - }, - { - "type": "emoji", - "name": "disappointed", - "unicode": "1f61e" - }, - { - "type": "text", - "text": "\n\nWhen I run tox, I get a lot of build errors\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "client needs to be built" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "sql needs to be built to a different target than its readme says" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "a lot of builds fail on cython_sources" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05TZE47F2S", - "ts": "1695891407.000000" - }, - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "8c61b68c-8503-43d7-88d9-ff71102bed16", - "type": "message", - "text": "would you like to share some exact log lines? I’ve never seen such errors, they probably are system specific", - "user": "U02S6F54MAB", - "ts": "1695892774.394699", - "blocks": [ - { - "type": "rich_text", - "block_id": "C6YGc", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "would you like to share some exact log lines? I’ve never seen such errors, they probably are system specific" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "36f3f924-102b-4915-91dc-091cd44002d7", - "type": "message", - "text": "`Getting requirements to build wheel did not run successfully.`\n`│ exit code: 1`\n`╰─> [62 lines of output]`\n `/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/config/setupcfg.py:293: _DeprecatedConfig: Deprecated config in `setup.cfg``\n `!!`\n \n `********************************************************************************`\n `The license_file parameter is deprecated, use license_files instead.`\n \n `By 2023-Oct-30, you need to update your project and remove deprecated calls`\n `or your builds will no longer be supported.`\n \n `See for details.`\n `********************************************************************************`\n \n `!!`\n `parsed = self.parsers.get(option_name, lambda x: x)(value)`\n `running egg_info`\n `writing lib3/PyYAML.egg-info/PKG-INFO`\n `writing dependency_links to lib3/PyYAML.egg-info/dependency_links.txt`\n `writing top-level names to lib3/PyYAML.egg-info/top_level.txt`\n `Traceback (most recent call last):`\n `File \"/home/obr_erikal/projects/OpenLineage/integration/airflow/.tox/py3-airflow-2.1.4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py\", line 353, in <module>`\n `main()`\n `File \"/home/obr_erikal/projects/OpenLineage/integration/airflow/.tox/py3-airflow-2.1.4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py\", line 335, in main`\n `json_out['return_val'] = hook(**hook_input['kwargs'])`\n `File \"/home/obr_erikal/projects/OpenLineage/integration/airflow/.tox/py3-airflow-2.1.4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py\", line 118, in get_requires_for_build_wheel`\n `return hook(config_settings)`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/build_meta.py\", line 355, in get_requires_for_build_wheel`\n `return self._get_build_requires(config_settings, requirements=['wheel'])`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/build_meta.py\", line 325, in _get_build_requires`\n `self.run_setup()`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/build_meta.py\", line 341, in run_setup`\n `exec(code, locals())`\n `File \"<string>\", line 271, in <module>`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/__init__.py\", line 103, in setup`\n `return distutils.core.setup(**attrs)`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py\", line 185, in setup`\n `return run_commands(dist)`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py\", line 201, in run_commands`\n `dist.run_commands()`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py\", line 969, in run_commands`\n `self.run_command(cmd)`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/dist.py\", line 989, in run_command`\n `super().run_command(command)`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py\", line 988, in run_command`\n `cmd_obj.run()`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py\", line 318, in run`\n `self.find_sources()`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py\", line 326, in find_sources`\n `mm.run()`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py\", line 548, in run`\n `self.add_defaults()`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py\", line 586, in add_defaults`\n `sdist.add_defaults(self)`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/sdist.py\", line 113, in add_defaults`\n `super().add_defaults()`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py\", line 251, in add_defaults`\n `self._add_defaults_ext()`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py\", line 336, in _add_defaults_ext`\n `self.filelist.extend(build_ext.get_source_files())`\n `File \"<string>\", line 201, in get_source_files`\n `File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py\", line 107, in __getattr__`\n `raise AttributeError(attr)`\n `AttributeError: cython_sources`\n `[end of output]`\n\n`note: This error originates from a subprocess, and is likely not a problem with pip.`\n`py3-airflow-2.1.4: exit 1 (7.85 seconds) /home/obr_erikal/projects/OpenLineage/integration/airflow> python -m pip install --find-links target/wheels/ --find-links ../sql/iface-py/target/wheels --use-deprecated=legacy-resolver --constraint= apache-airflow==2.1.4 'mypy>=0.9.6' pytest pytest-mock -r dev-requirements.txt pid=368621`\n`py3-airflow-2.1.4: FAIL ✖ in 7.92 seconds`", - "user": "U05TZE47F2S", - "ts": "1695897948.265179", - "blocks": [ - { - "type": "rich_text", - "block_id": "V/odg", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Getting requirements to build wheel did not run successfully.", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "│ exit code: 1", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "╰─> [62 lines of output]", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " /tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/config/setupcfg.py:293: _DeprecatedConfig: Deprecated config in `setup.cfg`", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " !!", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " ", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " ********************************************************************************", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " The license_file parameter is deprecated, use license_files instead.", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " ", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " By 2023-Oct-30, you need to update your project and remove deprecated calls", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " or your builds will no longer be supported.", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " ", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " See ", - "style": { - "code": true - } - }, - { - "type": "link", - "url": "https://setuptools.pypa.io/en/latest/userguide/declarative_config.html", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " for details.", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " ********************************************************************************", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " ", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " !!", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " parsed = self.parsers.get(option_name, lambda x: x)(value)", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " running egg_info", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " writing lib3/PyYAML.egg-info/PKG-INFO", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " writing dependency_links to lib3/PyYAML.egg-info/dependency_links.txt", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " writing top-level names to lib3/PyYAML.egg-info/top_level.txt", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " Traceback (most recent call last):", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/home/obr_erikal/projects/OpenLineage/integration/airflow/.tox/py3-airflow-2.1.4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py\", line 353, in ", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " main()", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/home/obr_erikal/projects/OpenLineage/integration/airflow/.tox/py3-airflow-2.1.4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py\", line 335, in main", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " json_out['return_val'] = hook(**hook_input['kwargs'])", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/home/obr_erikal/projects/OpenLineage/integration/airflow/.tox/py3-airflow-2.1.4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py\", line 118, in get_requires_for_build_wheel", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " return hook(config_settings)", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/build_meta.py\", line 355, in get_requires_for_build_wheel", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " return self._get_build_requires(config_settings, requirements=['wheel'])", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/build_meta.py\", line 325, in _get_build_requires", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " self.run_setup()", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/build_meta.py\", line 341, in run_setup", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " exec(code, locals())", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"\", line 271, in ", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/__init__.py\", line 103, in setup", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " return distutils.core.setup(**attrs)", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py\", line 185, in setup", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " return run_commands(dist)", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py\", line 201, in run_commands", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " dist.run_commands()", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py\", line 969, in run_commands", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " self.run_command(cmd)", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/dist.py\", line 989, in run_command", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " super().run_command(command)", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py\", line 988, in run_command", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " cmd_obj.run()", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py\", line 318, in run", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " self.find_sources()", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py\", line 326, in find_sources", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " mm.run()", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py\", line 548, in run", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " self.add_defaults()", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py\", line 586, in add_defaults", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " sdist.add_defaults(self)", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/sdist.py\", line 113, in add_defaults", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " super().add_defaults()", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py\", line 251, in add_defaults", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " self._add_defaults_ext()", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py\", line 336, in _add_defaults_ext", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " self.filelist.extend(build_ext.get_source_files())", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"\", line 201, in get_source_files", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " File \"/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py\", line 107, in __getattr__", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " raise AttributeError(attr)", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " AttributeError: cython_sources", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " [end of output]", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n\n" - }, - { - "type": "text", - "text": "note: This error originates from a subprocess, and is likely not a problem with pip.", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "py3-airflow-2.1.4: exit 1 (7.85 seconds) /home/obr_erikal/projects/OpenLineage/integration/airflow> python -m pip install --find-links target/wheels/ --find-links ../sql/iface-py/target/wheels --use-deprecated=legacy-resolver --constraint=", - "style": { - "code": true - } - }, - { - "type": "link", - "url": "https://raw.githubusercontent.com/apache/airflow/constraints-2.1.4/constraints-3.8.txt", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " apache-airflow==2.1.4 'mypy>=0.9.6' pytest pytest-mock -r dev-requirements.txt pid=368621", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "py3-airflow-2.1.4: FAIL ✖ in 7.92 seconds", - "style": { - "code": true - } - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "c8a1bf48-bf7d-43d9-b0a0-13992b43c55b", - "type": "message", - "text": "Then, for the actual error in my PR: Evidently you are not using isort, so what linter/fixer should I use for imports?", - "user": "U05TZE47F2S", - "ts": "1695898434.713869", - "blocks": [ - { - "type": "rich_text", - "block_id": "0aovN", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Then, for the actual error in my PR: Evidently you are not using isort, so what linter/fixer should I use for imports?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "fdedfa26-6342-4bd3-9f99-f4c9d76cf8ab", - "type": "message", - "text": "for the error - I think there’s a mistake in the docs. Could you please run `maturin build --out target/wheels` as a temp solution?", - "user": "U02S6F54MAB", - "ts": "1695898695.351679", - "blocks": [ - { - "type": "rich_text", - "block_id": "Lh5PY", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "for the error - I think there’s a mistake in the docs. Could you please run " - }, - { - "type": "text", - "text": "maturin build --out target/wheels", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " as a temp solution?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S", - "reactions": [ - { - "name": "eyes", - "users": [ - "U05TZE47F2S" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "222d9ab1-faeb-4336-9f2c-e56f797fa7fd", - "type": "message", - "text": "we’re using `ruff` , tox runs it as one of commands", - "user": "U02S6F54MAB", - "ts": "1695898737.117519", - "blocks": [ - { - "type": "rich_text", - "block_id": "Kpa7W", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "we’re using " - }, - { - "type": "text", - "text": "ruff", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " , tox runs it as one of commands" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "156461a2-52be-4b91-b7d0-47779d5dba7b", - "type": "message", - "text": "Not in the airflow folder?\n`OpenLineage/integration/airflow$ maturin build --out target/wheels` \n`:boom: maturin failed`\n `Caused by: pyproject.toml at /home/obr_erikal/projects/OpenLineage/integration/airflow/pyproject.toml is invalid`\n `Caused by: TOML parse error at line 1, column 1`\n `|`\n`1 | [tool.ruff]`\n `| ^`\n`missing field `build-system``", - "user": "U05TZE47F2S", - "ts": "1695898837.753129", - "blocks": [ - { - "type": "rich_text", - "block_id": "VOnd9", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Not in the airflow folder?\n" - }, - { - "type": "text", - "text": "OpenLineage/integration/airflow$ maturin build --out target/wheels ", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": ":boom: maturin failed", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " Caused by: pyproject.toml at /home/obr_erikal/projects/OpenLineage/integration/airflow/pyproject.toml is invalid", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " Caused by: TOML parse error at line 1, column 1", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " |", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "1 | [tool.ruff]", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": " | ^", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "missing field `build-system`", - "style": { - "code": true - } - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "1c728711-bb38-4a15-84a5-56bf67874db0", - "type": "message", - "text": "I meant change here \n\nso\n```cd iface-py\npython -m pip install maturin\nmaturin build --out ../target/wheels```\nbecomes\n```cd iface-py\npython -m pip install maturin\nmaturin build --out target/wheels```\ntox runs\n```install_command = python -m pip install {opts} --find-links target/wheels/ \\\n\t--find-links ../sql/iface-py/target/wheels```\nbut it should be\n```install_command = python -m pip install {opts} --find-links target/wheels/ \\\n\t--find-links ../sql/target/wheels```\nactually and I’m posting PR to fix that", - "user": "U02S6F54MAB", - "ts": "1695898952.354359", - "blocks": [ - { - "type": "rich_text", - "block_id": "uXCf0", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I meant change here " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/sql/README.md" - }, - { - "type": "text", - "text": "\n\nso\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "cd iface-py\npython -m pip install maturin\nmaturin build --out ../target/wheels" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "becomes\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "cd iface-py\npython -m pip install maturin\nmaturin build --out target/wheels" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "\ntox runs\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "install_command = python -m pip install {opts} --find-links target/wheels/ \\\n\t--find-links ../sql/iface-py/target/wheels" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "but it should be\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "install_command = python -m pip install {opts} --find-links target/wheels/ \\\n\t--find-links ../sql/target/wheels" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "actually and I’m posting PR to fix that" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "46514f2a-7d79-4ac6-a150-067d469356cb", - "type": "message", - "text": "yes, that part I actually worked out myself, but the cython_sources error I fail to understand cause. I have python3-dev installed on WSL Ubuntu with python version 3.10.12 in a virtualenv. Anything in that that could cause issues?", - "user": "U05TZE47F2S", - "ts": "1695899112.541119", - "blocks": [ - { - "type": "rich_text", - "block_id": "QFkIE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yes, that part I actually worked out myself, but the cython_sources error I fail to understand cause. I have python3-dev installed on WSL Ubuntu with python version 3.10.12 in a virtualenv. Anything in that that could cause issues?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05TZE47F2S", - "ts": "1695899144.000000" - }, - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "6ed420b6-90de-429d-a937-0718804b69b7", - "type": "message", - "text": "looks like it has something to do with latest release of Cython?\n`pip install \"Cython<3\"` maybe solves the issue?", - "user": "U02S6F54MAB", - "ts": "1695899540.894739", - "blocks": [ - { - "type": "rich_text", - "block_id": "NoPO+", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "looks like it has something to do with latest release of Cython?\n" - }, - { - "type": "text", - "text": "pip install \"Cython<3\"", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " maybe solves the issue?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02S6F54MAB", - "ts": "1695899564.000000" - }, - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "b2f96fa2-ffb1-4e1f-99ae-cb860c044e2e", - "type": "message", - "text": "I didnt have any cython before the install. Also no change. Could it be some update to setuptools itself? seems like the depreciation notice and the error is coming from inside setuptools", - "user": "U05TZE47F2S", - "ts": "1695899706.634219", - "blocks": [ - { - "type": "rich_text", - "block_id": "cJw/+", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I didnt have any cython before the install. Also no change. Could it be some update to setuptools itself? seems like the depreciation notice and the error is coming from inside setuptools" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "2d334a13-2319-40bf-b5cf-ef4d12311da3", - "type": "message", - "text": "(I.e. I tried the `pip install \"Cython<3\"` command without any change in the output )", - "user": "U05TZE47F2S", - "ts": "1695899819.474779", - "blocks": [ - { - "type": "rich_text", - "block_id": "KAczn", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "(I.e. I tried the " - }, - { - "type": "text", - "text": "pip install \"Cython<3\"", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " command without any change in the output )" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "b5f08bb3-a0f9-4590-b105-ae4aa5889047", - "type": "message", - "text": "Applying ruff lint on the converter.py file fixed the issue on the PR though so unless you have any feedback on the change itself, I will set it up on my own computer later instead (right now doing changes on behalf of a client on the clients computer)\n\nIf the issue persists on my own computer, I'll dig a bit further", - "user": "U05TZE47F2S", - "ts": "1695900030.125419", - "blocks": [ - { - "type": "rich_text", - "block_id": "mGhfl", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Applying ruff lint on the converter.py file fixed the issue on the PR though so unless you have any feedback on the change itself, I will set it up on my own computer later instead (right now doing changes on behalf of a client on the clients computer)\n\nIf the issue persists on my own computer, I'll dig a bit further" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "c292e86b-4c4e-42c2-a924-7a1704d48633", - "type": "message", - "text": "It’s a bit hard for me to find the root cause as I cannot reproduce this locally and CI works fine as well", - "user": "U02S6F54MAB", - "ts": "1695900063.737969", - "blocks": [ - { - "type": "rich_text", - "block_id": "khU2e", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "It’s a bit hard for me to find the root cause as I cannot reproduce this locally and CI works fine as well" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "a7598c78-c096-4c38-aed9-b317179faa41", - "type": "message", - "text": "Yeah, I am thinking that if I run into the same problem \"at home\", I might find it worthwhile to understand the issue. Right now, the client only wants the fix.", - "user": "U05TZE47F2S", - "ts": "1695900161.809479", - "blocks": [ - { - "type": "rich_text", - "block_id": "gSQO4", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Yeah, I am thinking that if I run into the same problem \"at home\", I might find it worthwhile to understand the issue. Right now, the client only wants the fix." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S", - "reactions": [ - { - "name": "+1", - "users": [ - "U02S6F54MAB" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "fbc7a559-9f3c-416f-b059-92e372149d40", - "type": "message", - "text": "Is there an official release cycle?\n\nor more specific, given that the PRs are approved, how soon can they reach openlineage-dbt and apache-airflow-providers-openlineage ?", - "user": "U05TZE47F2S", - "ts": "1695900310.007789", - "blocks": [ - { - "type": "rich_text", - "block_id": "2bs2Z", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Is there an official release cycle?\n\nor more specific, given that the PRs are approved, how soon can they reach openlineage-dbt and apache-airflow-providers-openlineage ?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "4b04171d-6c1e-4e68-841c-c9b081a882e2", - "type": "message", - "text": "we need to differentiate some things:\n1. OpenLineage repository:\n a. dbt integration - this is the only place where it is maintained\n b. Airflow integration - here we only keep backwards compatibility but generally speaking starting from Airflow 2.7+ we would like to do all the job in Airflow repo as OL Airflow provider\n2. Airflow repository - there’s only Airflow Openlineage provider compatible (and works best) with Airflow 2.7+\n\nwe have control over releases (obviously) in OL repo - it’s monthly cycle so beginning next week that should happen. There’s also a possibility to ask for ad-hoc release in <#C01CK9T7HKR|general> slack channel and with approvals of committers the new version is also released\n\n\nFor Airflow providers - the cycle is monthly as well", - "user": "U02S6F54MAB", - "ts": "1695900538.450809", - "blocks": [ - { - "type": "rich_text", - "block_id": "fiJyL", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "we need to differentiate some things:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "OpenLineage repository:" - } - ] - } - ], - "style": "ordered", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "dbt integration - this is the only place where it is maintained" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Airflow integration - here we only keep backwards compatibility but generally speaking starting from Airflow 2.7+ we would like to do all the job in Airflow repo as OL Airflow provider" - } - ] - } - ], - "style": "ordered", - "indent": 1, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "2. Airflow repository - there’s only Airflow Openlineage provider compatible (and works best) with Airflow 2.7+\n\nwe have control over releases (obviously) in OL repo - it’s monthly cycle so beginning next week that should happen. There’s also a possibility to ask for ad-hoc release in " - }, - { - "type": "channel", - "channel_id": "C01CK9T7HKR" - }, - { - "type": "text", - "text": " slack channel and with approvals of committers the new version is also released\n\n\nFor Airflow providers - the cycle is monthly as well" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "9f0f472f-6d10-47b2-aa32-4b0a4958889f", - "type": "message", - "text": "it’s a bit complex for this split but needed temporarily", - "user": "U02S6F54MAB", - "ts": "1695900690.131759", - "blocks": [ - { - "type": "rich_text", - "block_id": "8QdUB", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "it’s a bit complex for this split but needed temporarily" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "eefb4e06-c0ec-4dec-92d6-2a759cebc579", - "type": "message", - "text": "oh, I did the fix in the wrong place! The client is on airflow 2.7 and is using the provider. Is it syncing?", - "user": "U05TZE47F2S", - "ts": "1695900707.652359", - "blocks": [ - { - "type": "rich_text", - "block_id": "UxXth", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "oh, I did the fix in the wrong place! The client is on airflow 2.7 and is using the provider. Is it syncing?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "5eb99cad-605c-4e92-9bff-620ed0b7fd0c", - "type": "message", - "text": "it’s not, two separate places a~nd we haven’t even added the whole thing with converting _old_ lineage objects to OL specific~\n\nediting, that’s not true", - "user": "U02S6F54MAB", - "ts": "1695900748.805309", - "blocks": [ - { - "type": "rich_text", - "block_id": "dUcDZ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "it’s not, two separate places a" - }, - { - "type": "text", - "text": "nd we haven’t even added the whole thing with converting ", - "style": { - "strike": true - } - }, - { - "type": "text", - "text": "old", - "style": { - "italic": true, - "strike": true - } - }, - { - "type": "text", - "text": " lineage objects to OL specific", - "style": { - "strike": true - } - }, - { - "type": "text", - "text": "\n\nediting, that’s not true" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02S6F54MAB", - "ts": "1695900809.000000" - }, - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "337aedba-adc6-4a71-82f3-53d9d90dca20", - "type": "message", - "text": "the code’s here:\n", - "user": "U02S6F54MAB", - "ts": "1695900880.660149", - "blocks": [ - { - "type": "rich_text", - "block_id": "GosPQ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "the code’s here:\n" - }, - { - "type": "link", - "url": "https://github.com/apache/airflow/blob/main/airflow/providers/openlineage/extractors/manager.py#L154" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/apache/airflow/blob/main/airflow/providers/openlineage/extractors/manager.py#L154", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\n def extract_inlets_and_outlets(\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "7bd82ea4-7a58-424a-8a2a-eeb1077e7787", - "type": "message", - "text": "sorry I did not mention this earlier. we definitely need to add some guidance how to proceed with contributions to OL and Airflow OL provider", - "user": "U02S6F54MAB", - "ts": "1695900917.385629", - "blocks": [ - { - "type": "rich_text", - "block_id": "OcKXi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "sorry I did not mention this earlier. we definitely need to add some guidance how to proceed with contributions to OL and Airflow OL provider" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02S6F54MAB", - "ts": "1695900922.000000" - }, - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "f87b4734-d583-43ac-92ac-1346ae5907e2", - "type": "message", - "text": "anyway, the dbt fix is the blocking issue, so if that parts comes next week, there is no real urgency in getting the columns. It is a nice to have for our ingest parquet files.", - "user": "U05TZE47F2S", - "ts": "1695900970.017429", - "blocks": [ - { - "type": "rich_text", - "block_id": "6vGZz", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "anyway, the dbt fix is the blocking issue, so if that parts comes next week, there is no real urgency in getting the columns. It is a nice to have for our ingest parquet files." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "20642f4a-bc49-4541-a38a-bbbc14a52989", - "type": "message", - "text": "may I ask if you use some custom operator / python operator there?", - "user": "U02S6F54MAB", - "ts": "1695901032.688969", - "blocks": [ - { - "type": "rich_text", - "block_id": "8RMyR", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "may I ask if you use some custom operator / python operator there?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "3645fbfb-9b0a-4657-bd09-83f45dde6d8f", - "type": "message", - "text": "yeah, taskflow with inlets/outlets", - "user": "U05TZE47F2S", - "ts": "1695901053.392859", - "blocks": [ - { - "type": "rich_text", - "block_id": "fX9wa", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yeah, taskflow with inlets/outlets" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "4cbccc34-41a5-4c99-9beb-332b8d5fbbd2", - "type": "message", - "text": "so we extract from sources and use pyarrow to create parquet files in storage that an mssql-server can use as external tables", - "user": "U05TZE47F2S", - "ts": "1695901118.372199", - "blocks": [ - { - "type": "rich_text", - "block_id": "e/gWK", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "so we extract from sources and use pyarrow to create parquet files in storage that an mssql-server can use as external tables" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "37037100-d69f-4187-8647-bcc8f408c18e", - "type": "message", - "text": "awesome :+1:\nwe have plans to integrate more with Python operator as well but not earlier than in Airflow 2.8", - "user": "U02S6F54MAB", - "ts": "1695901194.871609", - "blocks": [ - { - "type": "rich_text", - "block_id": "jCopB", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "awesome " - }, - { - "type": "emoji", - "name": "+1", - "unicode": "1f44d" - }, - { - "type": "text", - "text": "\nwe have plans to integrate more with Python operator as well but not earlier than in Airflow 2.8" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S" - }, - { - "client_msg_id": "f9d22aa8-1c13-4c08-9460-e37a7993f0d9", - "type": "message", - "text": "I guess writing a generic extractor for the python operator is quite hard, but if you could support some inlet/outlet type for tabular fileformat / their python libraries like pyarrow or maybe even pandas and document it, I think a lot of people would understand how to use them", - "user": "U05TZE47F2S", - "ts": "1695901421.859499", - "blocks": [ - { - "type": "rich_text", - "block_id": "0a6uJ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I guess writing a generic extractor for the python operator is quite hard, but if you could support some inlet/outlet type for tabular fileformat / their python libraries like pyarrow or maybe even pandas and document it, I think a lot of people would understand how to use them" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695883240.832669", - "parent_user_id": "U05TZE47F2S", - "reactions": [ - { - "name": "heavy_plus_sign", - "users": [ - "U01HNKK4XAM" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "cdfcae6a-f1f7-4075-826b-4f526d482042", - "type": "message", - "text": "Hi folks, am I correct in my observations that the Spark integration does not generate inputs and outputs for Kafka-to-Kafka pipelines?\n\n*EDIT:* Removed the crazy wall of text. Relevant GitHub issue is .", - "user": "U05FLJE4GDU", - "ts": "1695831785.042079", - "blocks": [ - { - "type": "rich_text", - "block_id": "3DLCm", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi folks, am I correct in my observations that the Spark integration does not generate inputs and outputs for Kafka-to-Kafka pipelines?\n\n" - }, - { - "type": "text", - "text": "EDIT: ", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "Removed the crazy wall of text. Relevant GitHub issue is " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/2137", - "text": "here" - }, - { - "type": "text", - "text": "." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05FLJE4GDU", - "ts": "1695833102.000000" - }, - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1695832992, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/issues/2137", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2137 [SPARK] The integration fails to produce inputs and outputs in a Kafka-to-Kafka scenario", - "text": "*Problem Statement*\n\nAs the title suggests, when a Spark job is running in a streaming configuration that performs a Kafka-to-Kafka data flow, the integration fails to emit any events.\n\nHere's an example of an event that the connector emits:\n\nThe connector emits the following output:\n\n```\n23/09/27 18:17:22 DEBUG OpenLineageRunEventBuilder: Visiting query plan Optional[== Parsed Logical Plan ==\nWriteToMicroBatchDataSource org.apache.spark.sql.kafka010.KafkaSourceProvider$KafkaTable@2e679177, c3df858a-d481-456f-9665-3ff0c2d6d19a, [kafka.bootstrap.servers=localhost:9092,localhost:9093,localhost:9094, topic=target, checkpointLocation=/Users/dhawes/Projects/spark-streaming-openlineage], Append, 0\n+- StreamingDataSourceV2Relation [key#7, value#8, topic#9, partition#10, offset#11L, timestamp#12, timestampType#13], org.apache.spark.sql.kafka010.KafkaSourceProvider$KafkaScan@2095a606, KafkaV2[Subscribe[source]], {\"source\":{\"0\":11}}, {\"source\":{\"0\":11}}\n\n== Analyzed Logical Plan ==\nWriteToMicroBatchDataSource org.apache.spark.sql.kafka010.KafkaSourceProvider$KafkaTable@2e679177, c3df858a-d481-456f-9665-3ff0c2d6d19a, [kafka.bootstrap.servers=localhost:9092,localhost:9093,localhost:9094, topic=target, checkpointLocation=/Users/dhawes/Projects/spark-streaming-openlineage], Append, 0\n+- StreamingDataSourceV2Relation [key#7, value#8, topic#9, partition#10, offset#11L, timestamp#12, timestampType#13], org.apache.spark.sql.kafka010.KafkaSourceProvider$KafkaScan@2095a606, KafkaV2[Subscribe[source]], {\"source\":{\"0\":11}}, {\"source\":{\"0\":11}}\n\n== Optimized Logical Plan ==\nWriteToDataSourceV2 org.apache.spark.sql.execution.streaming.sources.MicroBatchWrite@6b69d31b\n+- StreamingDataSourceV2Relation [key#7, value#8, topic#9, partition#10, offset#11L, timestamp#12, timestampType#13], org.apache.spark.sql.kafka010.KafkaSourceProvider$KafkaScan@2095a606, KafkaV2[Subscribe[source]], {\"source\":{\"0\":11}}, {\"source\":{\"0\":11}}\n\n== Physical Plan ==\nWriteToDataSourceV2 org.apache.spark.sql.execution.streaming.sources.MicroBatchWrite@6b69d31b, org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy$$Lambda$1945/0x000000c001c246c8@28b23a0a\n+- *(1) Project [key#7, value#8, topic#9, partition#10, offset#11L, timestamp#12, timestampType#13]\n +- MicroBatchScan[key#7, value#8, topic#9, partition#10, offset#11L, timestamp#12, timestampType#13] class org.apache.spark.sql.kafka010.KafkaSourceProvider$KafkaScan\n] with output dataset builders [, , , , , , , ]\n\n23/09/27 18:17:22 INFO ConsoleTransport: {\"eventTime\":\"2023-09-27T16:17:22.263Z\",\"producer\":\"\",\"schemaURL\":\"\",\"eventType\":\"COMPLETE\",\"run\":{\"runId\":\"3468112f-96c3-44b5-bd84-847c2d99db8f\",\"facets\":{\"spark_version\":{\"_producer\":\"\",\"_schemaURL\":\"\",\"spark-version\":\"3.3.0\",\"openlineage-spark-version\":\"1.2.2\"},\"processing_engine\":{\"_producer\":\"\",\"_schemaURL\":\"\",\"version\":\"3.3.0\",\"name\":\"spark\",\"openlineageAdapterVersion\":\"1.2.2\"},\"environment-properties\":{\"_producer\":\"\",\"_schemaURL\":\"\",\"environment-properties\":{}}}},\"job\":{\"namespace\":\"default\",\"name\":\"spark_streaming_example.write_to_data_source_v2\",\"facets\":{}},\"inputs\":[],\"outputs\":[]}\n```\n\nAs you can see, there are _zero_ inputs and outputs.\n\n*Stuff to reproduce*\n*Code*\n\n```\npackage streaming;\n\nimport org.apache.spark.sql.Dataset;\nimport org.apache.spark.sql.Row;\nimport org.apache.spark.sql.SparkSession;\nimport org.apache.spark.sql.streaming.StreamingQuery;\nimport org.apache.spark.sql.streaming.Trigger;\n\nimport java.util.concurrent.TimeUnit;\n\npublic class SparkStreamingExampleApplication {\n public static void main(String[] args) throws Exception {\n SparkSession spark = SparkSession\n .builder()\n .appName(\"spark-streaming-example\")\n .master(\"local\")\n .config(\"spark.ui.enabled\", false)\n .config(\"spark.jars.packages\", \"io.openlineage:openlineage-spark:1.2.2\")\n .config(\"spark.extraListeners\", \"io.openlineage.spark.agent.OpenLineageSparkListener\")\n .config(\"spark.openlineage.transport.type\", \"console\")\n .config(\"spark.openlineage.facets.disabled\", \"[spark_unknown;spark.logicalPlan]\")\n .getOrCreate();\n\n Dataset df = spark.readStream()\n .format(\"kafka\")\n .option(\"kafka.bootstrap.servers\", \"localhost:9092,localhost:9093,localhost:9094\")\n .option(\"subscribe\", \"source\")\n .load();\n\n StreamingQuery kafkaWriteQuery = df.writeStream()\n .outputMode(\"append\")\n .format(\"kafka\")\n .option(\"kafka.bootstrap.servers\", \"localhost:9092,localhost:9093,localhost:9094\")\n .option(\"topic\", \"target\")\n .option(\"checkpointLocation\", \"/Users/dhawes/Projects/spark-streaming-openlineage\")\n .trigger(Trigger.ProcessingTime(10, TimeUnit.SECONDS))\n .start();\n\n kafkaWriteQuery.awaitTermination();\n }\n}\n```\n\n*build.gradle.kts*\n\n```\nplugins {\n java\n application\n}\n\njava {\n sourceCompatibility = JavaVersion.VERSION_1_8\n targetCompatibility = JavaVersion.VERSION_1_8\n\n toolchain {\n languageVersion.set(JavaLanguageVersion.of(8))\n }\n}\n\nrepositories {\n mavenLocal()\n mavenCentral()\n}\n\ndependencies {\n implementation(\"org.apache.spark:spark-core_2.12:3.3.0\")\n implementation(\"org.apache.spark:spark-sql_2.12:3.3.0\")\n implementation(\"org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0\")\n implementation(\"org.apache.spark:spark-streaming_2.12:3.3.0\")\n implementation(\"org.apache.spark:spark-streaming-kafka-0-10_2.12:3.3.0\")\n implementation(\"io.openlineage:openlineage-spark:1.2.2\")\n}\n\napplication {\n mainClass = \"streaming.SparkStreamingExampleApplication\"\n}\n```\n\n*docker-compose.yaml*\n\n```\nversion: '3'\n\nservices:\n zookeeper:\n image: confluentinc/cp-zookeeper:latest\n container_name: 'spark-streaming-zookeeper'\n environment:\n ZOOKEEPER_CLIENT_PORT: 2181\n\n kafka1:\n image: 'confluentinc/cp-kafka:latest'\n container_name: 'spark-streaming-kafka-1'\n depends_on:\n - 'zookeeper'\n environment:\n KAFKA_BROKER_ID: 1\n KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181\n KAFKA_ADVERTISED_LISTENERS: \n KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT\n KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE\n KAFKA_LISTENERS: \n ports:\n - \"9092:9094\"\n\n kafka2:\n image: 'confluentinc/cp-kafka:latest'\n container_name: 'spark-streaming-kafka-2'\n depends_on:\n - 'zookeeper'\n environment:\n KAFKA_BROKER_ID: 2\n KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181\n KAFKA_ADVERTISED_LISTENERS: \n KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT\n KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE\n KAFKA_LISTENERS: \n ports:\n - \"9093:9095\"\n\n kafka3:\n image: 'confluentinc/cp-kafka:latest'\n container_name: 'spark-streaming-kafka-3'\n depends_on:\n - 'zookeeper'\n environment:\n KAFKA_BROKER_ID: 3\n KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181\n KAFKA_ADVERTISED_LISTENERS: ", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1695831785.042079", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1695883338.816999", - "reply_users": [ - "U02MK6YNAQ5" - ], - "is_locked": false, - "subscribed": false, - "reactions": [ - { - "name": "eyes", - "users": [ - "U02MK6YNAQ5" - ], - "count": 1 - } - ], - "replies": [ - { - "client_msg_id": "605d51fd-23f6-4f14-bd7b-8ec638b7ecac", - "type": "message", - "text": "responded within the issue", - "user": "U02MK6YNAQ5", - "ts": "1695883338.816999", - "blocks": [ - { - "type": "rich_text", - "block_id": "4EBnj", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "responded within the issue" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695831785.042079", - "parent_user_id": "U05FLJE4GDU" - } - ] - }, - { - "type": "message", - "text": "*Meetup recap: Toronto Meetup @ Airflow Summit, September 18, 2023*\nIt was great to see so many members of our community at this event! I counted 32 total attendees, with all but a handful being first-timers.\n*Topics included:*\n• Presentation on the history, architecture and roadmap of the project by <@U01DCLP0GU9> and <@U01HNKK4XAM> \n• Discussion of OpenLineage support in by <@U01DCMDFHBK> \n• Presentation by *Ye Liu* and *Ivan Perepelitca* from , the social platform for data, about their integration\n• Presentation by <@U02MK6YNAQ5> about the Spark integration\n• Presentation by <@U01RA9B5GG2> about the Apache Airflow Provider\nThanks to all the presenters and attendees with a shout out to <@U01HNKK4XAM> for the help with organizing and day-of logistics, <@U02S6F54MAB> for the help with set up/clean up, and <@U0323HG8C8H> for the crucial assist with the signup sheet.\nThis was our first meetup in Toronto, and we learned some valuable lessons about planning events in new cities — the first and foremost being to ask for a pic of the building! :slightly_smiling_face: But it seemed like folks were undeterred, and the space itself lived up to expectations.\nFor a recording and clips from the meetup, head over to our .\n*Upcoming events:*\n• October 5th in San Francisco: Marquez Meetup @ Astronomer (sign up )\n• November: Warsaw meetup (details, date TBA)\n• January: London meetup (details, date TBA)\nAre you interested in hosting or co-hosting an OpenLineage or Marquez meetup? DM me!", - "files": [ - { - "id": "F05TR87P1JB", - "created": 1695826325, - "timestamp": 1695826325, - "name": "IMG_5456.jpg", - "title": "IMG_5456.jpg", - "mimetype": "image/jpeg", - "filetype": "jpg", - "pretty_type": "JPEG", - "user": "U02LXF3HUN7", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 693622, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05TR87P1JB/img_5456.jpg", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05TR87P1JB/download/img_5456.jpg", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR87P1JB-007b1c010b/img_5456_64.jpg", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR87P1JB-007b1c010b/img_5456_80.jpg", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR87P1JB-007b1c010b/img_5456_360.jpg", - "thumb_360_w": 360, - "thumb_360_h": 270, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR87P1JB-007b1c010b/img_5456_480.jpg", - "thumb_480_w": 480, - "thumb_480_h": 360, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR87P1JB-007b1c010b/img_5456_160.jpg", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR87P1JB-007b1c010b/img_5456_720.jpg", - "thumb_720_w": 720, - "thumb_720_h": 540, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR87P1JB-007b1c010b/img_5456_800.jpg", - "thumb_800_w": 800, - "thumb_800_h": 600, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR87P1JB-007b1c010b/img_5456_960.jpg", - "thumb_960_w": 960, - "thumb_960_h": 720, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR87P1JB-007b1c010b/img_5456_1024.jpg", - "thumb_1024_w": 1024, - "thumb_1024_h": 768, - "original_w": 4032, - "original_h": 3024, - "thumb_tiny": "AwAkADCTYzOdrEYFLiVeoDCnxDBf6gfpUgOehFJIGyESL/EpH4Zpw2t91gak69QDUUqqNpA70NApCnIppP8Ak0vJHHSjHY1BQgfDPx/Ead5g/u1VcyiQhASueM4pzebn5Dx74quYLFjev9002VgcfWq7faOzKKfHnb++JPpinzCsJFcrJcGMbgOgz609rhdzjj5emD1qCOJheebj93nrmoDEEcjJyD6UrAXyaAetIaXsakoYWNPC7gMk1G1Sp0FAEZ4ZvY/0oxlqG++/1/oKUffFAH//2Q==", - "permalink": "https://openlineage.slack.com/files/U02LXF3HUN7/F05TR87P1JB/img_5456.jpg", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05TR87P1JB-7c7ec9ebe9", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - }, - { - "id": "F05TZ52T18W", - "created": 1695826339, - "timestamp": 1695826339, - "name": "IMG_5452.jpg", - "title": "IMG_5452.jpg", - "mimetype": "image/jpeg", - "filetype": "jpg", - "pretty_type": "JPEG", - "user": "U02LXF3HUN7", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 664390, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05TZ52T18W/img_5452.jpg", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05TZ52T18W/download/img_5452.jpg", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ52T18W-717ba6ed78/img_5452_64.jpg", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ52T18W-717ba6ed78/img_5452_80.jpg", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ52T18W-717ba6ed78/img_5452_360.jpg", - "thumb_360_w": 360, - "thumb_360_h": 270, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ52T18W-717ba6ed78/img_5452_480.jpg", - "thumb_480_w": 480, - "thumb_480_h": 360, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ52T18W-717ba6ed78/img_5452_160.jpg", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ52T18W-717ba6ed78/img_5452_720.jpg", - "thumb_720_w": 720, - "thumb_720_h": 540, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ52T18W-717ba6ed78/img_5452_800.jpg", - "thumb_800_w": 800, - "thumb_800_h": 600, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ52T18W-717ba6ed78/img_5452_960.jpg", - "thumb_960_w": 960, - "thumb_960_h": 720, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ52T18W-717ba6ed78/img_5452_1024.jpg", - "thumb_1024_w": 1024, - "thumb_1024_h": 768, - "original_w": 4032, - "original_h": 3024, - "thumb_tiny": "AwAkADByiUgkAEZ6Uu8Dh0ZfwzUkI/dJ7jNS7ciiw7lcBG+6wNGGHvUrQo3VRUTDY+0Mce9Jqw0wzSGlzntQVz61IyZSmANw496eGH94fnWak+84wBx3NHnNniOquKxpZXPUVDMAWXGO9UzNJniHj1qaM5AYkA46YpNgSk4PKt+AqKOcPMVyNuOARzmpQ6kZzVJFxccjvmkNE1L2pKXtSGITzTT1pT1pD1piHr06UxY135xg09elC/epDP/Z", - "permalink": "https://openlineage.slack.com/files/U02LXF3HUN7/F05TZ52T18W/img_5452.jpg", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05TZ52T18W-581d9fefdf", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - }, - { - "id": "F05TR9U0V9V", - "created": 1695826788, - "timestamp": 1695826788, - "name": "20230918_171849.jpg", - "title": "20230918_171849.jpg", - "mimetype": "image/jpeg", - "filetype": "jpg", - "pretty_type": "JPEG", - "user": "U02LXF3HUN7", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 2000031, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05TR9U0V9V/20230918_171849.jpg", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05TR9U0V9V/download/20230918_171849.jpg", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR9U0V9V-cad8e2b752/20230918_171849_64.jpg", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR9U0V9V-cad8e2b752/20230918_171849_80.jpg", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR9U0V9V-cad8e2b752/20230918_171849_360.jpg", - "thumb_360_w": 360, - "thumb_360_h": 270, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR9U0V9V-cad8e2b752/20230918_171849_480.jpg", - "thumb_480_w": 480, - "thumb_480_h": 360, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR9U0V9V-cad8e2b752/20230918_171849_160.jpg", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR9U0V9V-cad8e2b752/20230918_171849_720.jpg", - "thumb_720_w": 720, - "thumb_720_h": 540, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR9U0V9V-cad8e2b752/20230918_171849_800.jpg", - "thumb_800_w": 800, - "thumb_800_h": 600, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR9U0V9V-cad8e2b752/20230918_171849_960.jpg", - "thumb_960_w": 960, - "thumb_960_h": 720, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TR9U0V9V-cad8e2b752/20230918_171849_1024.jpg", - "thumb_1024_w": 1024, - "thumb_1024_h": 768, - "original_w": 4032, - "original_h": 3024, - "thumb_tiny": "AwAkADCTIo69TSsuOaMD0FCVwbEwCOKMUoHJo4K5B9s0gECk9KRhjrinjkZFMlkC4B6+lAXGzOfLIUFT6kUwkZT5mO70pJQZItzElgM+nFRorTMQrFUFAyYuVbYDnB7c5FRO/wAmwAEA5yDUTEhsA00KS+1eSelMCzJJIcYO0AAYNLcKzSb+NuAKjMZB/etgKvY9aakjswGcKO1FwLAYhDim26hX4z6Uv/LM/Wli+9+NSMVoUyflFQKxRQy4BGRnFWm6mqh/1f4mmAN8xyTk04DBzTT2/Cn0gP/Z", - "permalink": "https://openlineage.slack.com/files/U02LXF3HUN7/F05TR9U0V9V/20230918_171849.jpg", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05TR9U0V9V-f79f9ed473", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - }, - { - "id": "F05TZ6P1FD4", - "created": 1695826799, - "timestamp": 1695826799, - "name": "20230918_192522.jpg", - "title": "20230918_192522.jpg", - "mimetype": "image/jpeg", - "filetype": "jpg", - "pretty_type": "JPEG", - "user": "U02LXF3HUN7", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 1628494, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05TZ6P1FD4/20230918_192522.jpg", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05TZ6P1FD4/download/20230918_192522.jpg", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ6P1FD4-ecc34f9575/20230918_192522_64.jpg", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ6P1FD4-ecc34f9575/20230918_192522_80.jpg", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ6P1FD4-ecc34f9575/20230918_192522_360.jpg", - "thumb_360_w": 360, - "thumb_360_h": 270, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ6P1FD4-ecc34f9575/20230918_192522_480.jpg", - "thumb_480_w": 480, - "thumb_480_h": 360, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ6P1FD4-ecc34f9575/20230918_192522_160.jpg", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ6P1FD4-ecc34f9575/20230918_192522_720.jpg", - "thumb_720_w": 720, - "thumb_720_h": 540, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ6P1FD4-ecc34f9575/20230918_192522_800.jpg", - "thumb_800_w": 800, - "thumb_800_h": 600, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ6P1FD4-ecc34f9575/20230918_192522_960.jpg", - "thumb_960_w": 960, - "thumb_960_h": 720, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05TZ6P1FD4-ecc34f9575/20230918_192522_1024.jpg", - "thumb_1024_w": 1024, - "thumb_1024_h": 768, - "original_w": 4032, - "original_h": 3024, - "thumb_tiny": "AwAkADCbb7UjDANTqBRIo2N9KAIxH7/pS+UPX9KcGBYrnnrSqwfOM8HHIpiGeUPX9KhBOKtVBj5BSYIlBwKJHHlsSeMVUvZMRBO7c/hRA4MSKSMjpQMFkZTuyc45pkcjI5Ysx+nemmOTY2D19T71EwaNgpbA4JxQxplv7SfST8qkDDyxk01PnAYcg96hut+wbcjntxSuBAvzJz26UIfmT/eFEf3TQn3k/wB4UwLBJyxzVe4OXBPpU/8AeqCb7w+lAE9uT5eOw6VHI7E8mpLf7hqF+tID/9k=", - "permalink": "https://openlineage.slack.com/files/U02LXF3HUN7/F05TZ6P1FD4/20230918_192522.jpg", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05TZ6P1FD4-378e7de35f", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - }, - { - "id": "F05U5RQUTJ6", - "created": 1695826818, - "timestamp": 1695826818, - "name": "20230918_185612.jpg", - "title": "20230918_185612.jpg", - "mimetype": "image/jpeg", - "filetype": "jpg", - "pretty_type": "JPEG", - "user": "U02LXF3HUN7", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 1644913, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05U5RQUTJ6/20230918_185612.jpg", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05U5RQUTJ6/download/20230918_185612.jpg", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U5RQUTJ6-af11e23ff9/20230918_185612_64.jpg", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U5RQUTJ6-af11e23ff9/20230918_185612_80.jpg", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U5RQUTJ6-af11e23ff9/20230918_185612_360.jpg", - "thumb_360_w": 360, - "thumb_360_h": 270, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U5RQUTJ6-af11e23ff9/20230918_185612_480.jpg", - "thumb_480_w": 480, - "thumb_480_h": 360, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U5RQUTJ6-af11e23ff9/20230918_185612_160.jpg", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U5RQUTJ6-af11e23ff9/20230918_185612_720.jpg", - "thumb_720_w": 720, - "thumb_720_h": 540, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U5RQUTJ6-af11e23ff9/20230918_185612_800.jpg", - "thumb_800_w": 800, - "thumb_800_h": 600, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U5RQUTJ6-af11e23ff9/20230918_185612_960.jpg", - "thumb_960_w": 960, - "thumb_960_h": 720, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U5RQUTJ6-af11e23ff9/20230918_185612_1024.jpg", - "thumb_1024_w": 1024, - "thumb_1024_h": 768, - "original_w": 4032, - "original_h": 3024, - "thumb_tiny": "AwAkADB4BZ8Z7U/ym9adtxIvuDUgppCbIvLb2/KmupQZ4/KrGKinHyH6U2hJkWecYooVgzMB24NLioLHyNgqfc/yo83H8P5mqzRlnLFzk+lMlMceNyls+9NMVi2bjHUqPxqMzeZkKwPHQCmbI8AhV/Kop5PKK7ABnrxRcLE8TfOfSi4ZlQFCODzUYRgg2t83ck011kSInOT3AHai4EyjJqpcHMuD0FW0+9VSf/XGkhly3/1CnAziojiTllBP0qW3/wCPcfSol+7VIRJHy7fQUsjFc4psX32+gpZu/wCFJ7jR/9k=", - "permalink": "https://openlineage.slack.com/files/U02LXF3HUN7/F05U5RQUTJ6/20230918_185612.jpg", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05U5RQUTJ6-56b64e8e34", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U02LXF3HUN7", - "ts": "1695827956.140429", - "blocks": [ - { - "type": "rich_text", - "block_id": "6V1Z6", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Meetup recap: Toronto Meetup @ Airflow Summit, September 18, 2023", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "\nIt was great to see so many members of our community at this event! I counted 32 total attendees, with all but a handful being first-timers.\n" - }, - { - "type": "text", - "text": "Topics included:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Presentation on the history, architecture and roadmap of the project by " - }, - { - "type": "user", - "user_id": "U01DCLP0GU9" - }, - { - "type": "text", - "text": " and " - }, - { - "type": "user", - "user_id": "U01HNKK4XAM" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Discussion of OpenLineage support in " - }, - { - "type": "link", - "url": "https://marquezproject.ai/", - "text": "Marquez" - }, - { - "type": "text", - "text": " by " - }, - { - "type": "user", - "user_id": "U01DCMDFHBK" - }, - { - "type": "text", - "text": " " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Presentation by " - }, - { - "type": "text", - "text": "Ye Liu ", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "and " - }, - { - "type": "text", - "text": "Ivan Perepelitca ", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "from " - }, - { - "type": "link", - "url": "https://metaphor.io/", - "text": "Metaphor" - }, - { - "type": "text", - "text": ", the social platform for data, about their integration" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Presentation by " - }, - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " about the Spark integration" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Presentation by " - }, - { - "type": "user", - "user_id": "U01RA9B5GG2" - }, - { - "type": "text", - "text": " about the Apache Airflow Provider" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks to all the presenters and attendees with a shout out to " - }, - { - "type": "user", - "user_id": "U01HNKK4XAM" - }, - { - "type": "text", - "text": " for the help with organizing and day-of logistics, " - }, - { - "type": "user", - "user_id": "U02S6F54MAB" - }, - { - "type": "text", - "text": " for the help with set up/clean up, and " - }, - { - "type": "user", - "user_id": "U0323HG8C8H" - }, - { - "type": "text", - "text": " for the crucial assist with the signup sheet.\nThis was our first meetup in Toronto, and we learned some valuable lessons about planning events in new cities — the first and foremost being to ask for a pic of the building! " - }, - { - "type": "emoji", - "name": "slightly_smiling_face", - "unicode": "1f642" - }, - { - "type": "text", - "text": " But it seemed like folks were undeterred, and the space itself lived up to expectations.\nFor a recording and clips from the meetup, head over to our " - }, - { - "type": "link", - "url": "https://www.youtube.com/channel/UCRMLy4AaSw_ka-gNV9nl7VQ/", - "text": "YouTube channel" - }, - { - "type": "text", - "text": ".\n" - }, - { - "type": "text", - "text": "Upcoming events:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "October 5th in San Francisco: Marquez Meetup @ Astronomer (sign up " - }, - { - "type": "link", - "url": "https://www.meetup.com/meetup-group-bnfqymxe/events/295444209/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "text": "here" - }, - { - "type": "text", - "text": ")" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "November: Warsaw meetup (details, date TBA)" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "January: London meetup (details, date TBA)" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Are you interested in hosting or co-hosting an OpenLineage or Marquez meetup? DM me!" - } - ] - } - ] - } - ], - "client_msg_id": "fe688ab1-cced-4d37-9c89-e55c97ba1b41", - "thread_ts": "1695827956.140429", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1695830147.896309", - "reply_users": [ - "U02LXF3HUN7" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1695830147.896309", - "attachments": [ - { - "image_url": "https://static.metaphor.io/preview.jpg", - "image_width": 719, - "image_height": 378, - "image_bytes": 122301, - "from_url": "https://metaphor.io/", - "id": 1, - "original_url": "https://metaphor.io/", - "fallback": "Metaphor - The Social Platform for Data", - "text": "Making Data Actionable, At Scale - Designed for data teams building cloud-native, self-service data platforms for their business users. Explore our Data Governance, Data Lineage, Data Discovery, and Data Trust capabilities today.", - "title": "Metaphor - The Social Platform for Data", - "title_link": "https://metaphor.io/", - "service_name": "metaphor.io" - }, - { - "from_url": "https://www.youtube.com/channel/UCRMLy4AaSw_ka-gNV9nl7VQ/", - "service_icon": "https://www.youtube.com/s/desktop/c0b97319/img/favicon.ico", - "thumb_url": "https://yt3.googleusercontent.com/6QCXvndRo0IJtiNvF9ZhZU5xPFkqvBFLORRgrzt4slwfEjYnbszvTNT41DFYhoESCd2Nb-29Lm0", - "thumb_width": 283, - "thumb_height": 283, - "id": 2, - "original_url": "https://www.youtube.com/channel/UCRMLy4AaSw_ka-gNV9nl7VQ/", - "fallback": "YouTube: OpenLineage Project", - "text": "Meetings, talks and tutorials by the OpenLineage Project, an Open Standard for lineage metadata collection", - "title": "OpenLineage Project", - "title_link": "https://www.youtube.com/channel/UCRMLy4AaSw_ka-gNV9nl7VQ/", - "service_name": "YouTube" - }, - { - "from_url": "https://www.meetup.com/meetup-group-bnfqymxe/events/295444209/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "image_url": "https://secure.meetupstatic.com/photos/event/a/1/8/c/600_515141356.jpeg", - "image_width": 600, - "image_height": 338, - "image_bytes": 12395, - "service_icon": "https://secure.meetupstatic.com/next/images/general/m_swarm_120x120.png", - "id": 3, - "original_url": "https://www.meetup.com/meetup-group-bnfqymxe/events/295444209/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "fallback": "Meetup: Marquez Meetup @ Astronomer, Thu, Oct 5, 2023, 5:30 PM | Meetup", - "text": "Join us on Thursday, October 5th, from 5:30-8:30 pm to learn about the Marquez project. Meet other members of the community, get tips on making the most of the latest impro", - "title": "Marquez Meetup @ Astronomer, Thu, Oct 5, 2023, 5:30 PM | Meetup", - "title_link": "https://www.meetup.com/meetup-group-bnfqymxe/events/295444209/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "service_name": "Meetup" - } - ], - "reactions": [ - { - "name": "raised_hands", - "users": [ - "U01HVNU6A4C", - "U01HNKK4XAM", - "U02MK6YNAQ5" - ], - "count": 3 - }, - { - "name": "heart", - "users": [ - "U02S6F54MAB", - "U01HNKK4XAM", - "U05TU0U224A", - "U02MK6YNAQ5", - "U01DCLP0GU9", - "U01DCMDFHBK" - ], - "count": 6 - }, - { - "name": "rocket", - "users": [ - "U02S6F54MAB", - "U05Q3HT6PBR" - ], - "count": 2 - }, - { - "name": "sweat_smile", - "users": [ - "U01HNKK4XAM" - ], - "count": 1 - }, - { - "name": "white_check_mark", - "users": [ - "U0323HG8C8H" - ], - "count": 1 - } - ], - "replies": [ - { - "type": "message", - "text": "A few more pics:", - "files": [ - { - "id": "F05U6445VM1", - "created": 1695830018, - "timestamp": 1695830018, - "name": "IMG_5462.jpg", - "title": "IMG_5462.jpg", - "mimetype": "image/jpeg", - "filetype": "jpg", - "pretty_type": "JPEG", - "user": "U02LXF3HUN7", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 673416, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05U6445VM1/img_5462.jpg", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05U6445VM1/download/img_5462.jpg", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U6445VM1-0fd1bf2d2b/img_5462_64.jpg", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U6445VM1-0fd1bf2d2b/img_5462_80.jpg", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U6445VM1-0fd1bf2d2b/img_5462_360.jpg", - "thumb_360_w": 360, - "thumb_360_h": 270, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U6445VM1-0fd1bf2d2b/img_5462_480.jpg", - "thumb_480_w": 480, - "thumb_480_h": 360, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U6445VM1-0fd1bf2d2b/img_5462_160.jpg", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U6445VM1-0fd1bf2d2b/img_5462_720.jpg", - "thumb_720_w": 720, - "thumb_720_h": 540, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U6445VM1-0fd1bf2d2b/img_5462_800.jpg", - "thumb_800_w": 800, - "thumb_800_h": 600, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U6445VM1-0fd1bf2d2b/img_5462_960.jpg", - "thumb_960_w": 960, - "thumb_960_h": 720, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U6445VM1-0fd1bf2d2b/img_5462_1024.jpg", - "thumb_1024_w": 1024, - "thumb_1024_h": 768, - "original_w": 4032, - "original_h": 3024, - "thumb_tiny": "AwAkADCTAZlGO/8ASnGBDyMg+1O24lT8alAoWw2V/KkH3Xz9RQfMX70efoaskUhGRimIrgqwyKMY6GmqTsz2ApyYZQw71mWLLJtlXGPunr9aBOf9mop1aUAEgY9BUfk/IFLZGfSi4mkWvtB/2fzpPtHsv51U+zrjG4/nTo4hG25Sfxp3FyokbJh+X8aZG7IrNj5QOQKcxLIRyMj+Gq4WTaQQcenrRcdiySaciBupNMqaOpGL5SehP40gRR/CKkPSm0ANZRtPFRHhc+9TN9w1Cfuf8CoA/9k=", - "permalink": "https://openlineage.slack.com/files/U02LXF3HUN7/F05U6445VM1/img_5462.jpg", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05U6445VM1-338a517e97", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - }, - { - "id": "F05U673KN1G", - "created": 1695830054, - "timestamp": 1695830054, - "name": "20230918_172756.jpg", - "title": "20230918_172756.jpg", - "mimetype": "image/jpeg", - "filetype": "jpg", - "pretty_type": "JPEG", - "user": "U02LXF3HUN7", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 2040975, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05U673KN1G/20230918_172756.jpg", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05U673KN1G/download/20230918_172756.jpg", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U673KN1G-75177dd50c/20230918_172756_64.jpg", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U673KN1G-75177dd50c/20230918_172756_80.jpg", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U673KN1G-75177dd50c/20230918_172756_360.jpg", - "thumb_360_w": 360, - "thumb_360_h": 270, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U673KN1G-75177dd50c/20230918_172756_480.jpg", - "thumb_480_w": 480, - "thumb_480_h": 360, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U673KN1G-75177dd50c/20230918_172756_160.jpg", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U673KN1G-75177dd50c/20230918_172756_720.jpg", - "thumb_720_w": 720, - "thumb_720_h": 540, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U673KN1G-75177dd50c/20230918_172756_800.jpg", - "thumb_800_w": 800, - "thumb_800_h": 600, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U673KN1G-75177dd50c/20230918_172756_960.jpg", - "thumb_960_w": 960, - "thumb_960_h": 720, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U673KN1G-75177dd50c/20230918_172756_1024.jpg", - "thumb_1024_w": 1024, - "thumb_1024_h": 768, - "original_w": 4032, - "original_h": 3024, - "thumb_tiny": "AwAkADB+KM1IACopmKErg3YOtL7dBRjJ/ClUhiQD0oegISl2kjNAA5Oc07B9KAIN0hfAIVR04zmkCsRwxNMkJEoA6mnRARS4JzkdfT2pFEqqV6H86hicuxIOPm5B7ippv9Udp5yKr2pIlbI4OfzoESkbckhVXdnr1pgldpg5JKg9B0HFPkiy+WySemDxUD5R2AY444p3AkaotxE20dKlNQn/AI+KhFEynB+tSRcF/rUa/eqSPq31piHYD5zx7iqB+8a0E/irPP3j9aYH/9k=", - "permalink": "https://openlineage.slack.com/files/U02LXF3HUN7/F05U673KN1G/20230918_172756.jpg", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05U673KN1G-f3882acbd5", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - }, - { - "id": "F05U39M0MPX", - "created": 1695830139, - "timestamp": 1695830139, - "name": "20230918_172935.jpg", - "title": "20230918_172935.jpg", - "mimetype": "image/jpeg", - "filetype": "jpg", - "pretty_type": "JPEG", - "user": "U02LXF3HUN7", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 1734799, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05U39M0MPX/20230918_172935.jpg", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05U39M0MPX/download/20230918_172935.jpg", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U39M0MPX-e9294fbbc2/20230918_172935_64.jpg", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U39M0MPX-e9294fbbc2/20230918_172935_80.jpg", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U39M0MPX-e9294fbbc2/20230918_172935_360.jpg", - "thumb_360_w": 360, - "thumb_360_h": 270, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U39M0MPX-e9294fbbc2/20230918_172935_480.jpg", - "thumb_480_w": 480, - "thumb_480_h": 360, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U39M0MPX-e9294fbbc2/20230918_172935_160.jpg", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U39M0MPX-e9294fbbc2/20230918_172935_720.jpg", - "thumb_720_w": 720, - "thumb_720_h": 540, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U39M0MPX-e9294fbbc2/20230918_172935_800.jpg", - "thumb_800_w": 800, - "thumb_800_h": 600, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U39M0MPX-e9294fbbc2/20230918_172935_960.jpg", - "thumb_960_w": 960, - "thumb_960_h": 720, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05U39M0MPX-e9294fbbc2/20230918_172935_1024.jpg", - "thumb_1024_w": 1024, - "thumb_1024_h": 768, - "original_w": 4032, - "original_h": 3024, - "thumb_tiny": "AwAkADCXy19KNnuRUq9KdVEEO3/aNRrkkjg4NWCoIqJBjePekykNxSFcjNOBBYrnkDOKKkockgCj6U7zh6GqY83AGVH4ZpSH7yt+AxTuKxaMvotNjOWf/Paq6rt5yW/3uad5mPalcLDYmk85u5wcikhcu5Y+gFNQlnd1NJbuRIVI60DJgeaZkk4pw603+KkBKEGO5ps4AgbHpUnao7j/AFDUwK1sxDN06VY3YPQflVa3+8fpVj0qkJn/2Q==", - "permalink": "https://openlineage.slack.com/files/U02LXF3HUN7/F05U39M0MPX/20230918_172935.jpg", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05U39M0MPX-5aa7326d25", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U02LXF3HUN7", - "ts": "1695830147.896309", - "blocks": [ - { - "type": "rich_text", - "block_id": "d56Cl", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "A few more pics:" - } - ] - } - ] - } - ], - "client_msg_id": "53cb1e19-536e-417d-8356-97b45bc1b35d", - "thread_ts": "1695827956.140429", - "parent_user_id": "U02LXF3HUN7" - } - ] - }, - { - "client_msg_id": "b0d8792e-684f-4d87-810a-a9f024db51f8", - "type": "message", - "text": " In Airflow Integration we send across a lineage Event for Dag start and complete, but that is not the case with spark integration…we don’t receive any event for the application start and complete in spark…is this expected behaviour or am i missing something?", - "user": "U05QL7LN2GH", - "ts": "1695703890.171789", - "blocks": [ - { - "type": "rich_text", - "block_id": "VWppi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "here" - }, - { - "type": "text", - "text": " In Airflow Integration we send across a lineage Event for Dag start and complete, but that is not the case with spark integration…we don’t receive any event for the application start and complete in spark…is this expected behaviour or am i missing something?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695703890.171789", - "reply_count": 10, - "reply_users_count": 3, - "latest_reply": "1695822848.078309", - "reply_users": [ - "U02MK6YNAQ5", - "U05QL7LN2GH", - "U02LXF3HUN7" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1695822848.078309", - "reactions": [ - { - "name": "heavy_plus_sign", - "users": [ - "U05A1D80QKF" - ], - "count": 1 - } - ], - "replies": [ - { - "client_msg_id": "8fc1de0f-13c7-45d4-a92e-9336dfe16d9e", - "type": "message", - "text": "For spark we do send `start` and `complete` for each spark action being run (single operation that causes spark processing being run). However, it is difficult for us to know if we're dealing with the last action within spark job or a spark script.", - "user": "U02MK6YNAQ5", - "ts": "1695822459.280709", - "blocks": [ - { - "type": "rich_text", - "block_id": "NnLoz", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "For spark we do send " - }, - { - "type": "text", - "text": "start", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " and " - }, - { - "type": "text", - "text": "complete", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " for each spark action being run (single operation that causes spark processing being run). However, it is difficult for us to know if we're dealing with the last action within spark job or a spark script." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695703890.171789", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "e4b03d27-5abf-4e74-a947-0ec863d48b6e", - "type": "message", - "text": "I think we need to look deeper into that as there is reoccuring need to capture such information", - "user": "U02MK6YNAQ5", - "ts": "1695822575.499259", - "blocks": [ - { - "type": "rich_text", - "block_id": "Uge+Y", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I think we need to look deeper into that as there is reoccuring need to capture such information" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695703890.171789", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "c7181c17-e689-4a52-8d94-6adf7af1abad", - "type": "message", - "text": "and spark listener event has methods like `onApplicationStart` and `onApplicationEnd`", - "user": "U02MK6YNAQ5", - "ts": "1695822597.247609", - "blocks": [ - { - "type": "rich_text", - "block_id": "KzOiX", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "and spark listener event has methods like " - }, - { - "type": "text", - "text": "onApplicationStart", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " and " - }, - { - "type": "text", - "text": "onApplicationEnd", - "style": { - "code": true - } - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695703890.171789", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "b862a210-4843-4a38-8765-5f8289d97b5a", - "type": "message", - "text": "We are using the SparkListener, which has a function called OnApplicationStart which gets called whenever a spark application starts, so i was thinking why cant we send one at start and simlarly at end as well", - "user": "U05QL7LN2GH", - "ts": "1695822613.085959", - "blocks": [ - { - "type": "rich_text", - "block_id": "z28yv", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "We are using the SparkListener, which has a function called OnApplicationStart which gets called whenever a spark application starts, so i was thinking why cant we send one at start and simlarly at end as well" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695703890.171789", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "0564b6a2-8ee8-4943-a953-6ab6362a23c2", - "type": "message", - "text": "additionally, we would like to have a concept of a parent run for a spark job which aggregates all actions run within a single spark job context", - "user": "U02MK6YNAQ5", - "ts": "1695822633.475229", - "blocks": [ - { - "type": "rich_text", - "block_id": "B7f8D", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "additionally, we would like to have a concept of a parent run for a spark job which aggregates all actions run within a single spark job context" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695703890.171789", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "9c2cf2f2-80b8-4333-8590-d58f2e1b68b0", - "type": "message", - "text": "yeah exactly. the way that it works with airflow integration", - "user": "U05QL7LN2GH", - "ts": "1695822671.102329", - "blocks": [ - { - "type": "rich_text", - "block_id": "8Cgct", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yeah exactly. the way that it works with airflow integration" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695703890.171789", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "1a09efa6-9259-4c58-8755-0d45fb51bf91", - "type": "message", - "text": "we do have an issue for that ", - "user": "U02MK6YNAQ5", - "ts": "1695822686.662719", - "blocks": [ - { - "type": "rich_text", - "block_id": "eVJKn", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "we do have an issue for that " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/2105" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1694633153, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/issues/2105", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2105 [PROPOSAL] Spark jobs should have a parent Job describing the Run State of Spark pipeline.", - "text": "*Purpose:* \nCurrently there is no way to group spark jobs running within the same SparkContext since there is no concept of parent Jobs in Spark. In airflow we support Parent Runs and this way we can group Tasks together within a single DAG.\n\n*Use cases:*\n\n• For data catalog based OL backends, a user might want to search for Spark ETL pipelines first and then dive deeper into a specific pipeline to view the Spark Jobs. This is similar to how a user can search for airflow DAG assets and then look into the Tasks for that DAG.\n• Also this will help display operational metadata for Spark pipelines. For Airflow we have DAG run status (parent job status), but for Spark we don't have any such concept to display the completion state of a pipeline run. We can only track it at SparkJob level.\n\n*Proposed implementation* \nThis should be similar to the airflow implementation.", - "title": "#2105 [PROPOSAL] Spark jobs should have a parent Job describing the Run State of Spark pipeline.", - "title_link": "https://github.com/OpenLineage/OpenLineage/issues/2105", - "footer": "", - "fields": [ - { - "value": "proposal", - "title": "Labels", - "short": true - }, - { - "value": "2", - "title": "Comments", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1695703890.171789", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "9400483a-7331-4225-b6fc-ec10a8512e76", - "type": "message", - "text": "what you can is: come to our monthly Openlineage open meetings and raise that issue and convince the community about its importance", - "user": "U02MK6YNAQ5", - "ts": "1695822728.304729", - "blocks": [ - { - "type": "rich_text", - "block_id": "VDW1M", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "what you can is: come to our monthly Openlineage open meetings and raise that issue and convince the community about its importance" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695703890.171789", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "de5ac6ca-b957-4e76-93dd-25d7c2e0e2b2", - "type": "message", - "text": "yeah sure would love to do that…how can i join them, will that be posted here in this slack channel?", - "user": "U05QL7LN2GH", - "ts": "1695822812.568049", - "blocks": [ - { - "type": "rich_text", - "block_id": "hrUZs", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yeah sure would love to do that…how can i join them, will that be posted here in this slack channel?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695703890.171789", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "08fddf36-1931-427e-b16b-cd95cae49c05", - "type": "message", - "text": "Hi, you can see the schedule and RSVP here: ", - "user": "U02LXF3HUN7", - "ts": "1695822848.078309", - "blocks": [ - { - "type": "rich_text", - "block_id": "ASLvc", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi, you can see the schedule and RSVP here: " - }, - { - "type": "link", - "url": "https://openlineage.io/community" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.io/community", - "service_icon": "https://openlineage.io/img/favicon.ico", - "id": 1, - "original_url": "https://openlineage.io/community", - "fallback": "Community | OpenLineage", - "text": "Data lineage is the foundation for a new generation of powerful, context-aware data tools and best practices. OpenLineage enables consistent collection of lineage metadata, creating a deeper understanding of how data is produced and used.", - "title": "Community | OpenLineage", - "title_link": "https://openlineage.io/community", - "service_name": "openlineage.io" - } - ], - "thread_ts": "1695703890.171789", - "parent_user_id": "U05QL7LN2GH", - "reactions": [ - { - "name": "raised_hands", - "users": [ - "U02MK6YNAQ5" - ], - "count": 1 - }, - { - "name": "gratitude-thank-you", - "users": [ - "U05QL7LN2GH" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "8a4ccf7e-640f-44e0-9d4d-92bcbd341376", - "type": "message", - "text": "I'm using the Spark OpenLineage integration. In the `outputStatistics` output dataset facet we receive `rowCount` and `size`.\nThe Job performs a SQL insert into a MySQL table and I'm receiving the `size` as 0.\n```{\n \"outputStatistics\":\n {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"rowCount\": 1,\n \"size\": 0\n }\n}```\nI'm not sure what the size means here. Does this mean number of bytes inserted/updated?\nAlso, do we have any documentation for Spark specific Job and Run facets?", - "user": "U05A1D80QKF", - "ts": "1695663385.834539", - "blocks": [ - { - "type": "rich_text", - "block_id": "JoEAD", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I'm using the Spark OpenLineage integration. In the " - }, - { - "type": "text", - "text": "outputStatistics", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " output dataset facet we receive " - }, - { - "type": "text", - "text": "rowCount", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " and " - }, - { - "type": "text", - "text": "size", - "style": { - "code": true - } - }, - { - "type": "text", - "text": ".\nThe Job performs a SQL insert into a MySQL table and I'm receiving the " - }, - { - "type": "text", - "text": "size", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " as 0.\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "{\n \"outputStatistics\":\n {\n \"_producer\": \"" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark" - }, - { - "type": "text", - "text": "\",\n \"_schemaURL\": \"" - }, - { - "type": "link", - "url": "https://openlineage.io/spec/facets/1-0-0/OutputStatisticsOutputDatasetFacet.json#/$defs/OutputStatisticsOutputDatasetFacet" - }, - { - "type": "text", - "text": "\",\n \"rowCount\": 1,\n \"size\": 0\n }\n}" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I'm not sure what the size means here. Does this mean number of bytes inserted/updated?\nAlso, do we have any documentation for Spark specific Job and Run facets?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05A1D80QKF", - "ts": "1695663417.000000" - }, - "thread_ts": "1695663385.834539", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1695822960.410559", - "reply_users": [ - "U02MK6YNAQ5" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "15851c75-f1f1-4575-b4ba-0444b0ae475b", - "type": "message", - "text": "I am not sure it's stated in the doc. Here's the list of spark facets schemas: ", - "user": "U02MK6YNAQ5", - "ts": "1695822960.410559", - "blocks": [ - { - "type": "rich_text", - "block_id": "WlBQw", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I am not sure it's stated in the doc. Here's the list of spark facets schemas: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark/shared/facets/spark/v1" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695663385.834539", - "parent_user_id": "U05A1D80QKF" - } - ] - }, - { - "client_msg_id": "05bcc77f-ca23-4be3-ae32-e73df9d4cce9", - "type": "message", - "text": " I'm presently addressing a particular scenario that pertains to Openlineage authentication, specifically involving the use of an access key and secret.\n\nI've implemented a custom token provider called AccessKeySecretKeyTokenProvider, which extends the TokenProvider class. This token provider communicates with another service, obtaining a token and an expiration time based on the provided access key, secret, and client ID.\n\nMy goal is to retain this token in a cache prior to its expiration, thereby eliminating the need for network calls to the third-party service. Is it possible without relying on an external caching system.", - "user": "U05SMTVPPL3", - "ts": "1695633110.066819", - "blocks": [ - { - "type": "rich_text", - "block_id": "Y8xCm", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "here" - }, - { - "type": "text", - "text": " I'm presently addressing a particular scenario that pertains to Openlineage authentication, specifically involving the use of an access key and secret.\n\nI've implemented a custom token provider called AccessKeySecretKeyTokenProvider, which extends the TokenProvider class. This token provider communicates with another service, obtaining a token and an expiration time based on the provided access key, secret, and client ID.\n\nMy goal is to retain this token in a cache prior to its expiration, thereby eliminating the need for network calls to the third-party service. Is it possible without relying on an external caching system." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05SMTVPPL3", - "ts": "1695642342.000000" - }, - "thread_ts": "1695633110.066819", - "reply_count": 11, - "reply_users_count": 3, - "latest_reply": "1695716393.265469", - "reply_users": [ - "U01HNKK4XAM", - "U05SMTVPPL3", - "U02MK6YNAQ5" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "910899d6-33f1-49ac-bd72-82521a106ad5", - "type": "message", - "text": "Hey <@U05SMTVPPL3>, I’m not sure that I fully understand your question here. What do you mean by OpenLineage authentication?\nWhat are you using to generate OL events? What’s your OL receiving backend?", - "user": "U01HNKK4XAM", - "ts": "1695646613.043569", - "blocks": [ - { - "type": "rich_text", - "block_id": "Mf6c0", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hey " - }, - { - "type": "user", - "user_id": "U05SMTVPPL3" - }, - { - "type": "text", - "text": ", I’m not sure that I fully understand your question here. What do you mean by OpenLineage authentication?\nWhat are you using to generate OL events? What’s your OL receiving backend?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695633110.066819", - "parent_user_id": "U05SMTVPPL3" - }, - { - "client_msg_id": "d42d6001-6201-4fab-810d-5bd57f54ffd2", - "type": "message", - "text": "Hey <@U01HNKK4XAM>,\nI wanted to clarify the previous message. I apologize for any confusion. When I mentioned \"_OpenLineage authentication_,\" I was actually referring to the authentication process for the OpenLineage backend, specifically using HTTP transport. This involves using my custom token provider, which utilizes access keys and secrets for authentication. The OL backend is http based backend . I hope this clears things up!", - "user": "U05SMTVPPL3", - "ts": "1695647073.016419", - "blocks": [ - { - "type": "rich_text", - "block_id": "usb+A", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hey " - }, - { - "type": "user", - "user_id": "U01HNKK4XAM" - }, - { - "type": "text", - "text": ",\nI wanted to clarify the previous message. I apologize for any confusion. When I mentioned \"" - }, - { - "type": "text", - "text": "OpenLineage authentication", - "style": { - "italic": true - } - }, - { - "type": "text", - "text": ",\" I was actually referring to the authentication process for the OpenLineage backend, specifically using HTTP transport. This involves using my custom token provider, which utilizes access keys and secrets for authentication. The OL backend is http based backend . I hope this clears things up!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05SMTVPPL3", - "ts": "1695647102.000000" - }, - "thread_ts": "1695633110.066819", - "parent_user_id": "U05SMTVPPL3" - }, - { - "client_msg_id": "c6059924-e40d-490b-800e-2d91d960af05", - "type": "message", - "text": "Are you using Marquez?", - "user": "U01HNKK4XAM", - "ts": "1695647112.882869", - "blocks": [ - { - "type": "rich_text", - "block_id": "OByZ5", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Are you using Marquez?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695633110.066819", - "parent_user_id": "U05SMTVPPL3" - }, - { - "client_msg_id": "abffa9ed-8e4d-448b-91e7-36c2f86e8d06", - "type": "message", - "text": "We are trying to leverage our own backend here.", - "user": "U05SMTVPPL3", - "ts": "1695647155.937279", - "blocks": [ - { - "type": "rich_text", - "block_id": "sukxf", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "We are trying to leverage our own backend here." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695633110.066819", - "parent_user_id": "U05SMTVPPL3" - }, - { - "client_msg_id": "280b8e75-9356-4f23-bd2d-a897f6a2d41e", - "type": "message", - "text": "I see.. I’m not sure the OpenLineage community could help here. Which webserver framework are you using?", - "user": "U01HNKK4XAM", - "ts": "1695647223.932189", - "blocks": [ - { - "type": "rich_text", - "block_id": "5a6eW", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I see.. I’m not sure the OpenLineage community could help here. Which webserver framework are you using?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695633110.066819", - "parent_user_id": "U05SMTVPPL3" - }, - { - "client_msg_id": "d0948a0c-3d8a-47ba-bc3f-09e8322635fc", - "type": "message", - "text": "KTOR framework", - "user": "U05SMTVPPL3", - "ts": "1695647336.172679", - "blocks": [ - { - "type": "rich_text", - "block_id": "2+7aa", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "KTOR framework" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695633110.066819", - "parent_user_id": "U05SMTVPPL3" - }, - { - "client_msg_id": "bdf944a5-2fc7-4728-9cc3-23558c2acf34", - "type": "message", - "text": "Our backend authentication operates based on either a pair of keys or a single bearer token, with a limited time of expiry. Hence, wanted to cache this information inside the token provider.", - "user": "U05SMTVPPL3", - "ts": "1695647733.493489", - "blocks": [ - { - "type": "rich_text", - "block_id": "Vs7hy", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Our backend authentication operates based on either a pair of keys or a single bearer token, with a limited time of expiry. Hence, wanted to cache this information inside the token provider." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695633110.066819", - "parent_user_id": "U05SMTVPPL3" - }, - { - "client_msg_id": "4cb9780f-caa0-48ec-a7c6-276b37dd3966", - "type": "message", - "text": "I see, I would ask this question here ", - "user": "U01HNKK4XAM", - "ts": "1695648417.273519", - "blocks": [ - { - "type": "rich_text", - "block_id": "QebU1", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I see, I would ask this question here " - }, - { - "type": "link", - "url": "https://ktor.io/support/" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "image_url": "https://ktor.io/static/preview-bcb70be2e64dd985ee1d3e7044b41726.png", - "image_width": 1280, - "image_height": 800, - "image_bytes": 313772, - "from_url": "https://ktor.io/support/", - "service_icon": "https://ktor.io/icons/icon-48x48.png?v=1c13dbdf035874784ecf822f992c1e4b", - "id": 1, - "original_url": "https://ktor.io/support/", - "fallback": "Ktor Framework: Support", - "text": "Kotlin Server and Client Framework for microservices, HTTP APIs, and RESTful services", - "title": "Support", - "title_link": "https://ktor.io/support/", - "service_name": "Ktor Framework" - } - ], - "thread_ts": "1695633110.066819", - "parent_user_id": "U05SMTVPPL3" - }, - { - "client_msg_id": "b0df4659-713a-428a-9a80-80d8b11fa670", - "type": "message", - "text": "Thank you", - "user": "U05SMTVPPL3", - "ts": "1695651172.772709", - "blocks": [ - { - "type": "rich_text", - "block_id": "ztrU8", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thank you" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695633110.066819", - "parent_user_id": "U05SMTVPPL3" - }, - { - "client_msg_id": "48ff099b-571d-4942-bcf6-18283df58da7", - "type": "message", - "text": "<@U05SMTVPPL3> which openlineage client are you using: java or python?", - "user": "U02MK6YNAQ5", - "ts": "1695716000.284039", - "blocks": [ - { - "type": "rich_text", - "block_id": "+TXCP", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U05SMTVPPL3" - }, - { - "type": "text", - "text": " which openlineage client are you using: java or python?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695633110.066819", - "parent_user_id": "U05SMTVPPL3" - }, - { - "client_msg_id": "70e67eba-ff5a-4653-bde6-13cc71bff642", - "type": "message", - "text": "<@U02MK6YNAQ5> I am using python client", - "user": "U05SMTVPPL3", - "ts": "1695716393.265469", - "blocks": [ - { - "type": "rich_text", - "block_id": "BID0d", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " I am using python client" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695633110.066819", - "parent_user_id": "U05SMTVPPL3" - } - ] - }, - { - "type": "message", - "text": "I am attaching the log4j, there is no openlineagecontext", - "files": [ - { - "id": "F05TJ6PA3NG", - "created": 1695352480, - "timestamp": 1695352480, - "name": "log4j-active (3).txt", - "title": "log4j-active (3).txt", - "mimetype": "text/plain", - "filetype": "text", - "pretty_type": "Plain Text", - "user": "U05T8BJD4DU", - "user_team": "T01CWUYP5AR", - "editable": true, - "size": 229901, - "mode": "snippet", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05TJ6PA3NG/log4j-active__3_.txt", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05TJ6PA3NG/download/log4j-active__3_.txt", - "permalink": "https://openlineage.slack.com/files/U05T8BJD4DU/F05TJ6PA3NG/log4j-active__3_.txt", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05TJ6PA3NG-030be021d8", - "edit_link": "https://openlineage.slack.com/files/U05T8BJD4DU/F05TJ6PA3NG/log4j-active__3_.txt/edit", - "preview": "23/09/22 03:12:03 INFO DriverDaemon$: Started Log4j2\n23/09/22 03:12:06 INFO DatabricksMain$$anon$1: Configured feature flag data source LaunchDarkly\n23/09/22 03:12:06 INFO DatabricksMain$$anon$1: Load feature flag from LaunchDarkly\n23/09/22 03:12:06 WARN DatabricksMain$$anon$1: REGION environment variable is not defined. getConfForCurrentRegion will always return default value\n23/09/22 03:12:06 INFO DriverDaemon$: Current JVM Version 1.8.0_362", - "preview_highlight": "
\n
\n
23/09/22 03:12:03 INFO DriverDaemon$: Started Log4j2
\n
23/09/22 03:12:06 INFO DatabricksMain$$anon$1: Configured feature flag data source LaunchDarkly
\n
23/09/22 03:12:06 INFO DatabricksMain$$anon$1: Load feature flag from LaunchDarkly
\n
23/09/22 03:12:06 WARN DatabricksMain$$anon$1: REGION environment variable is not defined. getConfForCurrentRegion will always return default value
\n
23/09/22 03:12:06 INFO DriverDaemon$: Current JVM Version 1.8.0_362
\n
\n
\n", - "lines": 1327, - "lines_more": 1322, - "preview_is_truncated": true, - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05T8BJD4DU", - "display_as_bot": false, - "ts": "1695352570.560639", - "blocks": [ - { - "type": "rich_text", - "block_id": "+0tED", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I am attaching the log4j, there is no openlineagecontext" - } - ] - } - ] - } - ], - "edited": { - "user": "U05T8BJD4DU", - "ts": "1695352579.000000" - }, - "client_msg_id": "a2763503-de63-4097-81ca-cb8eb43dace3", - "thread_ts": "1695352570.560639", - "reply_count": 2, - "reply_users_count": 2, - "latest_reply": "1695646750.596629", - "reply_users": [ - "U05T8BJD4DU", - "U01HNKK4XAM" - ], - "is_locked": false, - "subscribed": false, - "reactions": [ - { - "name": "white_check_mark", - "users": [ - "U01HNKK4XAM" - ], - "count": 1 - } - ], - "replies": [ - { - "client_msg_id": "6684ada9-9870-4370-95a1-0d17db24a2e9", - "type": "message", - "text": "this issue is resolved, solution can be found here: ", - "user": "U05T8BJD4DU", - "ts": "1695354442.264569", - "blocks": [ - { - "type": "rich_text", - "block_id": "nHzgi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "this issue is resolved, solution can be found here: " - }, - { - "type": "link", - "url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1691592987038929" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1691592987038929", - "ts": "1691592987.038929", - "author_id": "U05KNSP01TR", - "channel_id": "C01CK9T7HKR", - "channel_team": "T01CWUYP5AR", - "is_msg_unfurl": true, - "is_thread_root_unfurl": true, - "message_blocks": [ - { - "team": "T01CWUYP5AR", - "channel": "C01CK9T7HKR", - "ts": "1691592987.038929", - "message": { - "blocks": [ - { - "type": "rich_text", - "block_id": "oRms", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hey,\nI’m running Spark application (spark version 3.4) with OL integration.\nI changed spark to use “debug” level, and I see the OL events with the below message:\n“Emitting lineage completed successfully:”\n\nWith all the above, I can’t see the event in Marquez.\n\nAttaching the OL configurations.\nWhen changing the OL-spark version to 0.6.+, I do see event created in Marquez with only “Start” status (attached below).\n\nThe OL-spark version is matching the Spark version? Is there a known issues with the Spark / OL versions ?" - } - ] - } - ] - } - ] - } - } - ], - "files": [ - { - "id": "F05M1RLTMKM", - "created": 1691592859, - "timestamp": 1691592859, - "user": "U05KNSP01TR", - "is_hidden_by_limit": 1 - }, - { - "id": "F05M1UZA5EW", - "created": 1691592943, - "timestamp": 1691592943, - "user": "U05KNSP01TR", - "is_hidden_by_limit": 1 - } - ], - "id": 1, - "original_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1691592987038929", - "fallback": "[August 9th, 2023 7:56 AM] zahi.fail: Hey,\nI’m running Spark application (spark version 3.4) with OL integration.\nI changed spark to use “debug” level, and I see the OL events with the below message:\n“Emitting lineage completed successfully:”\n\nWith all the above, I can’t see the event in Marquez.\n\nAttaching the OL configurations.\nWhen changing the OL-spark version to 0.6.+, I do see event created in Marquez with only “Start” status (attached below).\n\nThe OL-spark version is matching the Spark version? Is there a known issues with the Spark / OL versions ?", - "text": "Hey,\nI’m running Spark application (spark version 3.4) with OL integration.\nI changed spark to use “debug” level, and I see the OL events with the below message:\n“Emitting lineage completed successfully:”\n\nWith all the above, I can’t see the event in Marquez.\n\nAttaching the OL configurations.\nWhen changing the OL-spark version to 0.6.+, I do see event created in Marquez with only “Start” status (attached below).\n\nThe OL-spark version is matching the Spark version? Is there a known issues with the Spark / OL versions ?", - "author_name": "Zahi Fail", - "author_link": "https://openlineage.slack.com/team/U05KNSP01TR", - "author_icon": "https://avatars.slack-edge.com/2023-08-03/5668907056247_deaa92eded7196343d84_48.png", - "author_subname": "Zahi Fail", - "mrkdwn_in": [ - "text" - ], - "footer": "Thread in Slack Conversation" - } - ], - "thread_ts": "1695352570.560639", - "parent_user_id": "U05T8BJD4DU" - }, - { - "client_msg_id": "8f1c101c-fcc5-40be-8178-e9254c71ed5f", - "type": "message", - "text": "We were all out at Airflow Summit last week, so apologies for the delayed response. Glad you were able to resolve the issue!", - "user": "U01HNKK4XAM", - "ts": "1695646750.596629", - "blocks": [ - { - "type": "rich_text", - "block_id": "i6p6k", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "We were all out at Airflow Summit last week, so apologies for the delayed response. Glad you were able to resolve the issue!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695352570.560639", - "parent_user_id": "U05T8BJD4DU" - } - ] - }, - { - "client_msg_id": "00cbf211-1bbc-4bd3-8275-8cf7bea8013e", - "type": "message", - "text": "I installed 1.2.2 on Databricks, followed the below init script: \n\nmy cluster config looks like this:\n\nspark.openlineage.version v1\nspark.openlineage.namespace adb-5445974573286168.8#default\nspark.openlineage.endpoint v1/lineage\nspark.openlineage.url.param.code 8kZl0bo2TJfnbpFxBv-R2v7xBDj-PgWMol3yUm5iP1vaAzFu9kIZGg==\nspark.openlineage.url \n\nBut it is not calling the API, it works fine with 0.18 version", - "user": "U05T8BJD4DU", - "ts": "1695347501.889769", - "blocks": [ - { - "type": "rich_text", - "block_id": "6uik+", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I installed 1.2.2 on Databricks, followed the below init script: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/databricks/open-lineage-init-script.sh" - }, - { - "type": "text", - "text": "\n\nmy cluster config looks like this:\n\nspark.openlineage.version v1\nspark.openlineage.namespace adb-5445974573286168.8#default\nspark.openlineage.endpoint v1/lineage\nspark.openlineage.url.param.code 8kZl0bo2TJfnbpFxBv-R2v7xBDj-PgWMol3yUm5iP1vaAzFu9kIZGg==\nspark.openlineage.url " - }, - { - "type": "link", - "url": "https://f77b-50-35-69-138.ngrok-free.app" - }, - { - "type": "text", - "text": "\n\nBut it is not calling the API, it works fine with 0.18 version" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/databricks/open-lineage-init-script.sh", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\n#!/bin/bash\n#\n# Copyright 2018-2023 contributors to the OpenLineage project\n# SPDX-License-Identifier: Apache-2.0\n\nSTAGE_DIR=\"/dbfs/databricks/openlineage\"\n\necho \"BEGIN: Upload Spark Listener JARs\"\ncp -f $STAGE_DIR/openlineage-spark-*.jar /mnt/driver-daemon/jars || { echo \"Error copying Spark Listener library file\"; exit 1;}\necho \"END: Upload Spark Listener JARs\"\n\necho \"BEGIN: Modify Spark config settings\"\ncat << 'EOF' > /databricks/driver/conf/openlineage-spark-driver-defaults.conf\n[driver] {\n \"spark.extraListeners\" = \"io.openlineage.spark.agent.OpenLineageSparkListener\"\n}\nEOF\necho \"END: Modify Spark config settings\"\n\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "reactions": [ - { - "name": "white_check_mark", - "users": [ - "U01HNKK4XAM" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "ceb38ae7-a2ab-4d48-a920-f6e6d6c51b93", - "type": "message", - "text": "I am using this accelerator that leverages OpenLineage on Databricks to publish lineage info to Purview, but it's using a rather old version of OpenLineage aka 0.18, anybody has tried it on a newer version of OpenLineage? I am facing some issues with the inputs and outputs for the same object is having different json\n", - "user": "U05T8BJD4DU", - "ts": "1695335777.852519", - "blocks": [ - { - "type": "rich_text", - "block_id": "6JEze", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I am using this accelerator that leverages OpenLineage on Databricks to publish lineage info to Purview, but it's using a rather old version of OpenLineage aka 0.18, anybody has tried it on a newer version of OpenLineage? I am facing some issues with the inputs and outputs for the same object is having different json\n" - }, - { - "type": "link", - "url": "https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator/", - "text": "https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator/" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05T8BJD4DU", - "ts": "1695335813.000000" - }, - "attachments": [ - { - "id": 1, - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator/", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "microsoft/Purview-ADB-Lineage-Solution-Accelerator", - "text": "A connector to ingest Azure Databricks lineage into Microsoft Purview", - "title": "microsoft/Purview-ADB-Lineage-Solution-Accelerator", - "fields": [ - { - "value": "77", - "title": "Stars", - "short": true - }, - { - "value": "C#", - "title": "Language", - "short": true - } - ] - } - ], - "reactions": [ - { - "name": "white_check_mark", - "users": [ - "U01HNKK4XAM" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "6fdd1e8f-8f13-4100-9ea4-1af90dd34bc8", - "type": "message", - "text": "Hi, we're using Custom Operators in airflow(2.5) and are planning to expose lineage via default extractors: \n*Question*: Now if we upgrade our Airflow version to 2.7 in the future, would our code be backward compatible?\nSince OpenLineage has now moved inside airflow and I think there is no concept of extractors in the latest version.", - "user": "U05A1D80QKF", - "ts": "1695276670.439269", - "blocks": [ - { - "type": "rich_text", - "block_id": "bqXzF", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi, we're using Custom Operators in airflow(2.5) and are planning to expose lineage via default extractors: " - }, - { - "type": "link", - "url": "https://openlineage.io/docs/integrations/airflow/default-extractors/" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Question", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": ": Now if we upgrade our Airflow version to 2.7 in the future, would our code be backward compatible?\nSince OpenLineage has now moved inside airflow and I think there is no concept of extractors in the latest version." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695276670.439269", - "reply_count": 2, - "reply_users_count": 2, - "latest_reply": "1695823449.348469", - "reply_users": [ - "U05A1D80QKF", - "U02S6F54MAB" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "d85590cd-9f1f-44e7-a019-192a58b617a8", - "type": "message", - "text": "Also, do we have any docs on how OL works with the latest airflow version? Few questions:\n• How is it replacing the concept of custom extractors and Manually Annotated Lineage in the latest version? \n• Do we have any examples of setting up the integration to emit input/output datasets for non supported Operators like PythonOperator?", - "user": "U05A1D80QKF", - "ts": "1695276900.832599", - "blocks": [ - { - "type": "rich_text", - "block_id": "up+gD", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Also, do we have any docs on how OL works with the latest airflow version? Few questions:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "How is it replacing the concept of custom extractors and Manually Annotated Lineage in the latest version? " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Do we have any examples of setting up the integration to emit input/output datasets for non supported Operators like PythonOperator?" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695276670.439269", - "parent_user_id": "U05A1D80QKF" - }, - { - "client_msg_id": "d47f79ef-74de-45fa-b03c-49eef46ecf40", - "type": "message", - "text": "> Question: Now if we upgrade our Airflow version to 2.7 in the future, would our code be backward compatible?\nIt will be compatible, “default extractors” is generally the same concept as we’re using in the 2.7 integration.\nOne thing that might be good to update is import paths, from `openlineage.airflow` to `airflow.providers.openlineage` but should work both ways\n\n> • Do we have any code samples/docs of setting up the integration to emit input/output datasets for non supported Operators like PythonOperator?\nOur experience with that is currently lacking - this means, it works like in bare airflow, if you annotate your PythonOperator tasks with old Airflow lineage like \n\nWe want to make this experience better - by doing few things\n• instrumenting hooks, then collecting lineage from them\n• integration with AIP-48 datasets\n• allowing to emit lineage collected inside Airflow task by other means, by providing core Airflow API for that\nAll those things require changing core Airflow in a couple of ways:\n• tracking which hooks were used during PythonOperator execution\n• just being able to emit datasets (airflow inlets/outlets) from inside of a task - they are now a static thing, so if you try that it does not work\n• providing better API for emitting that lineage, preferably based on OpenLineage itself rather than us having to convert that later.\nAs this requires core Airflow changes, it won’t be live until Airflow 2.8 at the earliest.\n\nthanks to <@U01RA9B5GG2> for this response", - "user": "U02S6F54MAB", - "ts": "1695823449.348469", - "blocks": [ - { - "type": "rich_text", - "block_id": "1nu2w", - "elements": [ - { - "type": "rich_text_quote", - "elements": [ - { - "type": "text", - "text": "Question: Now if we upgrade our Airflow version to 2.7 in the future, would our code be backward compatible?" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "It will be compatible, “default extractors” is generally the same concept as we’re using in the 2.7 integration.\nOne thing that might be good to update is import paths, from " - }, - { - "type": "text", - "text": "openlineage.airflow", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " to " - }, - { - "type": "text", - "text": "airflow.providers.openlineage", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " but should work both ways\n\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Do we have any code samples/docs of setting up the integration to emit input/output datasets for non supported Operators like PythonOperator?" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 1 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Our experience with that is currently lacking - this means, it works like in bare airflow, if you annotate your PythonOperator tasks with old Airflow lineage like " - }, - { - "type": "link", - "url": "https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/lineage.html", - "text": "in this doc." - }, - { - "type": "text", - "text": "\n\nWe want to make this experience better - by doing few things\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "instrumenting hooks, then collecting lineage from them" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "integration with AIP-48 datasets" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "allowing to emit lineage collected inside Airflow task by other means, by providing core Airflow API for that" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "\nAll those things require changing core Airflow in a couple of ways:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "tracking which hooks were used during PythonOperator execution" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "just being able to emit datasets (airflow inlets/outlets) from inside of a task - they are now a static thing, so if you try that it does not work" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "providing better API for emitting that lineage, preferably based on OpenLineage itself rather than us having to convert that later." - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "\nAs this requires core Airflow changes, it won’t be live until Airflow 2.8 at the earliest.\n\nthanks to " - }, - { - "type": "user", - "user_id": "U01RA9B5GG2" - }, - { - "type": "text", - "text": " for this response" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695276670.439269", - "parent_user_id": "U05A1D80QKF" - } - ] - }, - { - "client_msg_id": "f6b7340b-e7c9-4126-8001-1ac903fc62bc", - "type": "message", - "text": "\nWe released OpenLineage 1.2.2!\nAdded\n• Spark: publish the `ProcessingEngineRunFacet` as part of the normal operation of the `OpenLineageSparkEventListener` `#2089` \n• Spark: capture and emit `spark.databricks.clusterUsageTags.clusterAllTags` variable from databricks environment `#2099` \nFixed\n• Common: support parsing dbt_project.yml without target-path `#2106` \n• Proxy: fix Proxy chart `#2091` \n• Python: fix serde filtering `#2044` \n• Python: use non-deprecated `apiKey` if loading it from env variables `@2029` \n• Spark: Improve RDDs on S3 integration. `#2039` \n• Flink: prevent sending `running` events after job completes `#2075` \n• Spark & Flink: Unify dataset naming from URI objects `#2083` \n• Spark: Databricks improvements `#2076` \nRemoved\n• SQL: remove sqlparser dependency from iface-java and iface-py `#2090` \nThanks to all the contributors, including new contributors <@U055N2GRT4P>, , and !\n*Release:* \n*Changelog:* \n*Commit history:* \n*Maven:* \n*PyPI:* ", - "user": "U02LXF3HUN7", - "ts": "1695244138.650089", - "blocks": [ - { - "type": "rich_text", - "block_id": "BzgIj", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nWe released OpenLineage 1.2.2!\nAdded\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark: publish the " - }, - { - "type": "text", - "text": "ProcessingEngineRunFacet", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " as part of the normal operation of the " - }, - { - "type": "text", - "text": "OpenLineageSparkEventListener", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "text", - "text": "#2089", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/d-m-h", - "text": "@d-m-h", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark: capture and emit " - }, - { - "type": "text", - "text": "spark.databricks.clusterUsageTags.clusterAllTags", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " variable from databricks environment " - }, - { - "type": "text", - "text": "#2099", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/Anirudh181001", - "text": "@Anirudh181001", - "unsafe": true - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Fixed\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Common: support parsing dbt_project.yml without target-path " - }, - { - "type": "text", - "text": "#2106", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/tatiana", - "text": "@tatiana", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Proxy: fix Proxy chart " - }, - { - "type": "text", - "text": "#2091", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/harels", - "text": "@harels", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Python: fix serde filtering " - }, - { - "type": "text", - "text": "#2044", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/xli-1026", - "text": "@xli-1026", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Python: use non-deprecated " - }, - { - "type": "text", - "text": "apiKey", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " if loading it from env variables " - }, - { - "type": "text", - "text": "@2029", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/mobuchowski", - "text": "@mobuchowski", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark: Improve RDDs on S3 integration. " - }, - { - "type": "text", - "text": "#2039", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/pawel-big-lebowski", - "text": "@pawel-big-lebowski", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Flink: prevent sending " - }, - { - "type": "text", - "text": "running", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " events after job completes " - }, - { - "type": "text", - "text": "#2075", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/pawel-big-lebowski", - "text": "@pawel-big-lebowski", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark & Flink: Unify dataset naming from URI objects " - }, - { - "type": "text", - "text": "#2083", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/pawel-big-lebowski", - "text": "@pawel-big-lebowski", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark: Databricks improvements " - }, - { - "type": "text", - "text": "#2076", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/pawel-big-lebowski", - "text": "@pawel-big-lebowski", - "unsafe": true - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Removed\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "SQL: remove sqlparser dependency from iface-java and iface-py " - }, - { - "type": "text", - "text": "#2090", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/JDarDagran", - "text": "@JDarDagran", - "unsafe": true - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks to all the contributors, including new contributors " - }, - { - "type": "user", - "user_id": "U055N2GRT4P" - }, - { - "type": "text", - "text": ", " - }, - { - "type": "link", - "url": "https://github.com/xli-1026", - "text": "@xli-1026", - "unsafe": true - }, - { - "type": "text", - "text": ", and " - }, - { - "type": "link", - "url": "https://github.com/d-m-h", - "text": "@d-m-h", - "unsafe": true - }, - { - "type": "text", - "text": "!\n" - }, - { - "type": "text", - "text": "Release:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/releases/tag/1.2.2" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Changelog: ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Commit history:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/compare/1.1.0...1.2.0", - "text": "https://github.com/OpenLineage/OpenLineage/compare/1.1.0...1.2.2" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Maven:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://oss.sonatype.org/#nexus-search;quick~openlineage" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "PyPI:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://pypi.org/project/openlineage-python/" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695244138.650089", - "reply_count": 3, - "reply_users_count": 3, - "latest_reply": "1695471077.136259", - "reply_users": [ - "U05ST398BHT", - "U02LXF3HUN7", - "U01RA9B5GG2" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1695471077.136259", - "reactions": [ - { - "name": "fire", - "users": [ - "U01RA9B5GG2", - "U01HNKK4XAM", - "U04EZ2LPDV4" - ], - "count": 3 - }, - { - "name": "+1", - "users": [ - "U05QL7LN2GH", - "U05GY00DY6A", - "U05SMTVPPL3" - ], - "count": 3 - } - ], - "replies": [ - { - "client_msg_id": "3472acf2-f11c-4e4c-8d6b-24ce1c40d93f", - "type": "message", - "text": "Hi <@U02LXF3HUN7> Thank you! I love the job that you’ve done. If you have a few seconds, please hint at how I can push lineage gathered from Airflow and Spark jobs into DataHub for visualization? I didn’t find any solutions or official support neither at Openlineage nor at DataHub, but I still want to continue using Openlineage", - "user": "U05ST398BHT", - "ts": "1695431120.413409", - "blocks": [ - { - "type": "rich_text", - "block_id": "eZxY8", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi " - }, - { - "type": "user", - "user_id": "U02LXF3HUN7" - }, - { - "type": "text", - "text": " Thank you! I love the job that you’ve done. If you have a few seconds, please hint at how I can push lineage gathered from Airflow and Spark jobs into DataHub for visualization? I didn’t find any solutions or official support neither at Openlineage nor at DataHub, but I still want to continue using Openlineage" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695244138.650089", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "7fc32230-ea04-4080-9cd1-7a4c616bed49", - "type": "message", - "text": "Hi Yevhenii, thank you for using OpenLineage. The DataHub integration is new to us, but perhaps the experts on Spark and Airflow know more. <@U02MK6YNAQ5> <@U01RA9B5GG2> <@U02S6F54MAB>", - "user": "U02LXF3HUN7", - "ts": "1695432622.381059", - "blocks": [ - { - "type": "rich_text", - "block_id": "M5Py7", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi Yevhenii, thank you for using OpenLineage. The DataHub integration is new to us, but perhaps the experts on Spark and Airflow know more. " - }, - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " " - }, - { - "type": "user", - "user_id": "U01RA9B5GG2" - }, - { - "type": "text", - "text": " " - }, - { - "type": "user", - "user_id": "U02S6F54MAB" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02LXF3HUN7", - "ts": "1695489116.000000" - }, - "thread_ts": "1695244138.650089", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "2b9073be-55e3-41dc-8200-4182e30164ab", - "type": "message", - "text": "<@U05ST398BHT> at Airflow Summit, Shirshanka Das from DataHub mentioned this as upcoming feature.", - "user": "U01RA9B5GG2", - "ts": "1695471077.136259", - "blocks": [ - { - "type": "rich_text", - "block_id": "EdLPn", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U05ST398BHT" - }, - { - "type": "text", - "text": " at Airflow Summit, Shirshanka Das from DataHub mentioned this as upcoming feature." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695244138.650089", - "parent_user_id": "U02LXF3HUN7", - "reactions": [ - { - "name": "+1", - "users": [ - "U05ST398BHT" - ], - "count": 1 - }, - { - "name": "dart", - "users": [ - "U05ST398BHT" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "6bc2e8a7-85c8-45cd-808d-351adc7188de", - "type": "message", - "text": "congrats folks :partying_face: ", - "user": "U05HFGKEYVB", - "ts": "1695217014.549799", - "blocks": [ - { - "type": "rich_text", - "block_id": "D8SPh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "congrats folks " - }, - { - "type": "emoji", - "name": "partying_face", - "unicode": "1f973" - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://lfaidata.foundation/blog/2023/09/20/lf-ai-data-foundation-announces-graduation-of-openlineage-project" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "reactions": [ - { - "name": "tada", - "users": [ - "U02S6F54MAB", - "U01HVNU6A4C", - "U053LLVTHRN", - "U05QL7LN2GH", - "U01SW738WCF", - "U01RA9B5GG2", - "U05JBHLPY8K", - "U04AZ7992SU", - "U01HNKK4XAM", - "U02K353H2KF", - "U01DPTNCGU8" - ], - "count": 11 - }, - { - "name": "+1", - "users": [ - "U05JBHLPY8K" - ], - "count": 1 - }, - { - "name": "heart", - "users": [ - "U01HNKK4XAM" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "18d20727-c8c6-4490-aea2-b1db03cd54c8", - "type": "message", - "text": "Hi I need help in extracting OpenLineage for PostgresOperator in json format.\nany suggestions or comments would be greatly appreciated", - "user": "U05SQGH8DV4", - "ts": "1695106067.665469", - "blocks": [ - { - "type": "rich_text", - "block_id": "+DToP", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi I need help in extracting OpenLineage for PostgresOperator in json format.\nany suggestions or comments would be greatly appreciated" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695106067.665469", - "reply_count": 3, - "reply_users_count": 2, - "latest_reply": "1695205616.828789", - "reply_users": [ - "U01RA9B5GG2", - "U05SQGH8DV4" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "cbb8f343-a8b8-4734-85d6-a53beb93c9f7", - "type": "message", - "text": "If you're using Airflow 2.7, take a look at ", - "user": "U01RA9B5GG2", - "ts": "1695156006.370219", - "blocks": [ - { - "type": "rich_text", - "block_id": "7mV17", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "If you're using Airflow 2.7, take a look at " - }, - { - "type": "link", - "url": "https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html", - "text": "https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695106067.665469", - "parent_user_id": "U05SQGH8DV4", - "reactions": [ - { - "name": "heart", - "users": [ - "U05SQGH8DV4" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "accd82a6-a042-4f83-af6a-05efe3b0b460", - "type": "message", - "text": "If you use one of the lower versions, take a look here ", - "user": "U01RA9B5GG2", - "ts": "1695156054.864799", - "blocks": [ - { - "type": "rich_text", - "block_id": "HT53+", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "If you use one of the lower versions, take a look here " - }, - { - "type": "link", - "url": "https://openlineage.io/docs/integrations/airflow/usage", - "text": "https://openlineage.io/docs/integrations/airflow/usage" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.io/docs/integrations/airflow/usage", - "service_icon": "https://openlineage.io/img/favicon.ico", - "id": 1, - "original_url": "https://openlineage.io/docs/integrations/airflow/usage", - "fallback": "Using the Airflow integration | OpenLineage", - "text": "PREREQUISITES", - "title": "Using the Airflow integration | OpenLineage", - "title_link": "https://openlineage.io/docs/integrations/airflow/usage", - "service_name": "openlineage.io" - } - ], - "thread_ts": "1695106067.665469", - "parent_user_id": "U05SQGH8DV4" - }, - { - "client_msg_id": "24180d5a-cca8-4d65-a29e-300ff58b6059", - "type": "message", - "text": "Maciej,\nThanks for sharing the link \nthis should address the issue", - "user": "U05SQGH8DV4", - "ts": "1695205616.828789", - "blocks": [ - { - "type": "rich_text", - "block_id": "EWnYW", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Maciej,\nThanks for sharing the link " - }, - { - "type": "link", - "url": "https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html" - }, - { - "type": "text", - "text": "\nthis should address the issue" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695106067.665469", - "parent_user_id": "U05SQGH8DV4" - } - ] - }, - { - "client_msg_id": "a02784a5-f80b-4209-bc92-dd5ae76b87d9", - "type": "message", - "text": "Hi! I'm in need of help with wrapping my head around OpenLineage. My team have the goal of collecting metadata from the Airflow operators GreatExpectationsOperator, PythonOperator, MsSqlOperator and BashOperator (for dbt). Where can I see the sourcecode for what is collected for each operator, and is there support for these in the new provider *apache-airflow-providers-openlineage*? I am super confused and feel lost in the docs. :exploding_head: We are using MSSQL/ODBC to connect to our db, and this data does not seem to appear as datasets in Marquez, do I need to configure this? If so, HOW and WHERE? :smiling_face_with_tear:\n\nHappy for any help, big or small! :pray:", - "user": "U05K8F1T887", - "ts": "1695039754.591479", - "blocks": [ - { - "type": "rich_text", - "block_id": "iTsci", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi! I'm in need of help with wrapping my head around OpenLineage. My team have the goal of collecting metadata from the Airflow operators GreatExpectationsOperator, PythonOperator, MsSqlOperator and BashOperator (for dbt). Where can I see the sourcecode for what is collected for each operator, and is there support for these in the new provider" - }, - { - "type": "text", - "text": " apache-airflow-providers-openlineage", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "? I am super confused and feel lost in the docs. " - }, - { - "type": "emoji", - "name": "exploding_head", - "unicode": "1f92f" - }, - { - "type": "text", - "text": " We are using MSSQL/ODBC to connect to our db, and this data does not seem to appear as datasets in Marquez, do I need to configure this? If so, HOW and WHERE? " - }, - { - "type": "emoji", - "name": "smiling_face_with_tear", - "unicode": "1f972" - }, - { - "type": "text", - "text": "\n\nHappy for any help, big or small! " - }, - { - "type": "emoji", - "name": "pray", - "unicode": "1f64f" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695039754.591479", - "reply_count": 3, - "reply_users_count": 1, - "latest_reply": "1695068818.187449", - "reply_users": [ - "U02S6F54MAB" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "985ce8ec-0abb-4dcb-a0bc-eb897bf403a2", - "type": "message", - "text": "there’s no actual single source of what integrations are currently implemented in openlineage Airflow provider. That’s something we should work on so it’s more visible", - "user": "U02S6F54MAB", - "ts": "1695068767.883209", - "blocks": [ - { - "type": "rich_text", - "block_id": "c5rkV", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "there’s no actual single source of what integrations are currently implemented in openlineage Airflow provider. That’s something we should work on so it’s more visible" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695039754.591479", - "parent_user_id": "U05K8F1T887" - }, - { - "client_msg_id": "c5d88209-70c9-4f31-b4fd-eed402f485d3", - "type": "message", - "text": "answering this quickly - GE & MS SQL are not currently implemented yet in the provider", - "user": "U02S6F54MAB", - "ts": "1695068806.046349", - "blocks": [ - { - "type": "rich_text", - "block_id": "3pyGD", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "answering this quickly - GE & MS SQL are not currently implemented yet in the provider" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1695039754.591479", - "parent_user_id": "U05K8F1T887" - }, - { - "client_msg_id": "9acf490a-9219-4325-a53c-8b68b69c4200", - "type": "message", - "text": "but I also invite you to contribute if you’re interested! :slightly_smiling_face:", - "user": "U02S6F54MAB", - "ts": "1695068818.187449", - "blocks": [ - { - "type": "rich_text", - "block_id": "NkKHD", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "but I also invite you to contribute if you’re interested! " - }, - { - "type": "emoji", - "name": "slightly_smiling_face", - "unicode": "1f642" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02S6F54MAB", - "ts": "1695068825.000000" - }, - "thread_ts": "1695039754.591479", - "parent_user_id": "U05K8F1T887" - } - ] - }, - { - "client_msg_id": "00f92226-f107-42e8-bb34-7adddf13bc0b", - "type": "message", - "text": "It doesn't seem like there's a way to override the OL endpoint from the default (`/api/v1/lineage`) in Airflow? I tried setting the `OPENLINEAGE_ENDPOINT` environment to no avail. Based on , it seems that only `OPENLINEAGE_URL` was used to construct `HttpConfig` ?", - "user": "U01HVNU6A4C", - "ts": "1694956061.909169", - "blocks": [ - { - "type": "rich_text", - "block_id": "1FrxC", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "It doesn't seem like there's a way to override the OL endpoint from the default (" - }, - { - "type": "text", - "text": "/api/v1/lineage", - "style": { - "code": true - } - }, - { - "type": "text", - "text": ") in Airflow? I tried setting the " - }, - { - "type": "text", - "text": "OPENLINEAGE_ENDPOINT", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " environment to no avail. Based on " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/client/python/openlineage/client/transport/factory.py#L80-L87", - "text": "this statement" - }, - { - "type": "text", - "text": ", it seems that only " - }, - { - "type": "text", - "text": "OPENLINEAGE_URL", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " was used to construct " - }, - { - "type": "text", - "text": "HttpConfig", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " ?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/main/client/python/openlineage/client/transport/factory.py#L80-L87", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\n config = HttpConfig(\n url=os.environ[\"OPENLINEAGE_URL\"],\n auth=create_token_provider(\n {\n \"type\": \"api_key\",\n \"apiKey\": os.environ.get(\"OPENLINEAGE_API_KEY\", \"\"),\n },\n ),\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1694956061.909169", - "reply_count": 4, - "reply_users_count": 2, - "latest_reply": "1696337536.498149", - "reply_users": [ - "U02S6F54MAB", - "U01HVNU6A4C" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "2a1c7601-9128-4280-a927-65d6f9f23578", - "type": "message", - "text": "That’s correct. For now there’s no way to configure the endpoint via env var. You can do that by using config file", - "user": "U02S6F54MAB", - "ts": "1695068711.558639", - "blocks": [ - { - "type": "rich_text", - "block_id": "nsSnE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "That’s correct. For now there’s no way to configure the endpoint via env var. You can do that by using config file" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694956061.909169", - "parent_user_id": "U01HVNU6A4C" - }, - { - "client_msg_id": "a0a9275d-9cf8-4582-acda-0b8c7b7d058b", - "type": "message", - "text": "How do you do that in Airflow? Any particular reason for excluding endpoint override via env var? Happy to create a PR to fix that.", - "user": "U01HVNU6A4C", - "ts": "1695069039.150789", - "blocks": [ - { - "type": "rich_text", - "block_id": "gjGUd", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "How do you do that in Airflow? Any particular reason for excluding endpoint override via env var? Happy to create a PR to fix that." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694956061.909169", - "parent_user_id": "U01HVNU6A4C" - }, - { - "client_msg_id": "8113b26a-ebec-4395-86d6-697483fa2e29", - "type": "message", - "text": "historical I guess? go for the PR, of course :rocket:", - "user": "U02S6F54MAB", - "ts": "1695070368.703459", - "blocks": [ - { - "type": "rich_text", - "block_id": "Vnr4T", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "historical I guess? go for the PR, of course " - }, - { - "type": "emoji", - "name": "rocket", - "unicode": "1f680" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694956061.909169", - "parent_user_id": "U01HVNU6A4C" - }, - { - "client_msg_id": "9b181669-3aad-4e22-8202-3bc76c3567cd", - "type": "message", - "text": "", - "user": "U01HVNU6A4C", - "ts": "1696337536.498149", - "blocks": [ - { - "type": "rich_text", - "block_id": "iw279", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2151" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1696337382, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/pull/2151", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2151 Allow setting client's endpoint via environment variable", - "text": "*Problem*\n\nCurrently, it's not possible to set the OpenLineage endpoint (hard-coded to `/api/v1/lineage`) using an environment variable when running the Airflow integration.\n\n*Solution*\n\nGiven that it's not possible to create the client manually in Airflow, especially now that OpenLineage has become an official Airflow provider, this change seems like the only feasible solution.\n\n☐ Your change modifies the OpenLineage model\n☐ Your change modifies one or more OpenLineage \n\n*One-line summary:*\n\nAllow setting client's endpoint via environment variable.\n\n*Checklist*\n\n☑︎ You've your work\n☑︎ Your pull request title follows our \n☑︎ Your changes are accompanied by tests (_if relevant_)\n☑︎ Your change contains a and is self-contained\n☑︎ You've updated any relevant documentation (_if relevant_)\n☑︎ Your comment includes a one-liner for the changelog about the specific purpose of the change (_if necessary_)\n☐ You've versioned the core OpenLineage model or facets according to (_if relevant_)\n☐ You've added a to source files (_if relevant_)\n\n* * *\n\nSPDX-License-Identifier: Apache-2.0 \nCopyright 2018-2023 contributors to the OpenLineage project", - "title": "#2151 Allow setting client's endpoint via environment variable", - "title_link": "https://github.com/OpenLineage/OpenLineage/pull/2151", - "footer": "", - "fields": [ - { - "value": "documentation, client/python", - "title": "Labels", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1694956061.909169", - "parent_user_id": "U01HVNU6A4C" - } - ] - }, - { - "client_msg_id": "2288c4d1-0b22-4a10-90fb-914438bbfc50", - "type": "message", - "text": " is there a way by which we could add custom headers to openlineage client in airflow, i see that provision is there for spark integration via these properties --> abcdef", - "user": "U05QL7LN2GH", - "ts": "1694907627.974239", - "blocks": [ - { - "type": "rich_text", - "block_id": "ZvNmS", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "here" - }, - { - "type": "text", - "text": " is there a way by which we could add custom headers to openlineage client in airflow, i see that provision is there for spark integration via these properties spark.openlineage.transport.headers.xyz --> abcdef" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694907627.974239", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1695156055.877069", - "reply_users": [ - "U02S6F54MAB" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "15a8ab15-edb6-47ed-a3dd-ea593601fb80", - "type": "message", - "text": "there’s no out-of-the-box possibility to do that yet, you’re very welcome to create an issue in GitHub and _maybe_ contribute as well! :slightly_smiling_face:", - "user": "U02S6F54MAB", - "ts": "1695156055.877069", - "blocks": [ - { - "type": "rich_text", - "block_id": "y/Qm5", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "there’s no out-of-the-box possibility to do that yet, you’re very welcome to create an issue in GitHub and " - }, - { - "type": "text", - "text": "maybe", - "style": { - "italic": true - } - }, - { - "type": "text", - "text": " contribute as well! " - }, - { - "type": "emoji", - "name": "slightly_smiling_face", - "unicode": "1f642" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694907627.974239", - "parent_user_id": "U05QL7LN2GH" - } - ] - }, - { - "client_msg_id": "8fdf10a3-fcbd-45ca-aea9-e5213e874e13", - "type": "message", - "text": " we have dataproc operator getting called from a dag which submits a spark job, we wanted to maintain that continuity of parent job in the spark job so according to the documentation we can acheive that by using a macro called lineage_run_id that requires task and task_instance as the parameters. The problem we are facing is that our client’s have 1000's of dags, so asking them to change this everywhere it is used is not feasible, so we thought of using the task_policy feature in the airflow…but the problem is that task_policy gives you access to only the task/operator, but we don’t have the access to the task instance..that is required as a parameter to the lineage_run_id function. Can anyone kindly help us on how should we go about this one\n```t1 = DataProcPySparkOperator(\n task_id=job_name,\n #required pyspark configuration,\n job_name=job_name,\n dataproc_pyspark_properties={\n 'spark.driver.extraJavaOptions':\n f\"-javaagent:{jar}={os.environ.get('OPENLINEAGE_URL')}/api/v1/namespaces/{os.getenv('OPENLINEAGE_NAMESPACE', 'default')}/jobs/{job_name}/runs/{{{{macros.OpenLineagePlugin.lineage_run_id(task, task_instance)}}}}?api_key={os.environ.get('OPENLINEAGE_API_KEY')}\"\n dag=dag)```", - "user": "U05QL7LN2GH", - "ts": "1694849427.228709", - "blocks": [ - { - "type": "rich_text", - "block_id": "AHZek", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "here" - }, - { - "type": "text", - "text": " we have dataproc operator getting called from a dag which submits a spark job, we wanted to maintain that continuity of parent job in the spark job so according to the documentation we can acheive that by using a macro called lineage_run_id that requires task and task_instance as the parameters. The problem we are facing is that our client’s have 1000's of dags, so asking them to change this everywhere it is used is not feasible, so we thought of using the task_policy feature in the airflow…but the problem is that task_policy gives you access to only the task/operator, but we don’t have the access to the task instance..that is required as a parameter to the lineage_run_id function. Can anyone kindly help us on how should we go about this one\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "t1 = DataProcPySparkOperator(\n task_id=job_name,\n #required pyspark configuration,\n job_name=job_name,\n dataproc_pyspark_properties={\n 'spark.driver.extraJavaOptions':\n f\"-javaagent:{jar}={os.environ.get('OPENLINEAGE_URL')}/api/v1/namespaces/{os.getenv('OPENLINEAGE_NAMESPACE', 'default')}/jobs/{job_name}/runs/{{{{macros.OpenLineagePlugin.lineage_run_id(task, task_instance)}}}}?api_key={os.environ.get('OPENLINEAGE_API_KEY')}\"\n dag=dag)" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05QL7LN2GH", - "ts": "1694849476.000000" - }, - "thread_ts": "1694849427.228709", - "reply_count": 6, - "reply_users_count": 3, - "latest_reply": "1694854824.785559", - "reply_users": [ - "U02S6F54MAB", - "U05QL7LN2GH", - "U01RA9B5GG2" - ], - "is_locked": false, - "subscribed": false, - "reactions": [ - { - "name": "heavy_plus_sign", - "users": [ - "U05HBLE7YPL" - ], - "count": 1 - } - ], - "replies": [ - { - "client_msg_id": "c74584ec-debc-402b-b867-4cb91564bc1a", - "type": "message", - "text": "you don't need actual task instance to do that. you only should set additional argument as jinja template, same as above", - "user": "U02S6F54MAB", - "ts": "1694852567.185479", - "blocks": [ - { - "type": "rich_text", - "block_id": "YJIIN", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "you don't need actual task instance to do that. you only should set additional argument as jinja template, same as above" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694849427.228709", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "739d43fe-6091-4a33-9e38-49c251a2092f", - "type": "message", - "text": "task_instance in this case is just part of string which is evaluated when jinja render happens", - "user": "U02S6F54MAB", - "ts": "1694852728.566109", - "blocks": [ - { - "type": "rich_text", - "block_id": "JReNS", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "task_instance in this case is just part of string which is evaluated when jinja render happens" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02S6F54MAB", - "ts": "1694852748.000000" - }, - "thread_ts": "1694849427.228709", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "fd9ff2ac-d30d-44fe-be46-9fc6bd92cbec", - "type": "message", - "text": "ohh…then we could use the same example as above inside the task_policy to intercept the Operator and add the openlineage specific additions properties?", - "user": "U05QL7LN2GH", - "ts": "1694852830.195709", - "blocks": [ - { - "type": "rich_text", - "block_id": "k9v8D", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "ohh…then we could use the same example as above inside the task_policy to intercept the Operator and add the openlineage specific additions properties?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694849427.228709", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "7016edd8-d837-4f45-9c0d-baf91cac0447", - "type": "message", - "text": "correct, just remember not to override all properties, just add ol specific", - "user": "U02S6F54MAB", - "ts": "1694853059.395789", - "blocks": [ - { - "type": "rich_text", - "block_id": "4HvJl", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "correct, just remember not to override all properties, just add ol specific" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694849427.228709", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "b03d0357-c247-4634-b6f8-e3456fd93bc5", - "type": "message", - "text": "yeah sure…thank you so much <@U02S6F54MAB>, will try this out and keep you posted", - "user": "U05QL7LN2GH", - "ts": "1694853122.539399", - "blocks": [ - { - "type": "rich_text", - "block_id": "Xsbel", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yeah sure…thank you so much " - }, - { - "type": "user", - "user_id": "U02S6F54MAB" - }, - { - "type": "text", - "text": ", will try this out and keep you posted" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694849427.228709", - "parent_user_id": "U05QL7LN2GH", - "reactions": [ - { - "name": "+1", - "users": [ - "U02S6F54MAB" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "9e632ff3-3d24-48e1-9b39-dc353db41164", - "type": "message", - "text": "We want to automate setting those options at some point inside the operator itself", - "user": "U01RA9B5GG2", - "ts": "1694854824.785559", - "blocks": [ - { - "type": "rich_text", - "block_id": "pXEbp", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "We want to automate setting those options at some point inside the operator itself" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694849427.228709", - "parent_user_id": "U05QL7LN2GH", - "reactions": [ - { - "name": "heavy_plus_sign", - "users": [ - "U05QL7LN2GH" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "f8a7a53a-d531-490e-a231-7e9f83a9f7a3", - "type": "message", - "text": "\nFriendly reminder: the next OpenLineage meetup, our first in Toronto, is happening this coming Monday at 5 PM ET ", - "user": "U02LXF3HUN7", - "ts": "1694793807.376729", - "blocks": [ - { - "type": "rich_text", - "block_id": "liTY7", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nFriendly reminder: the next OpenLineage meetup, our first in Toronto, is happening this coming Monday at 5 PM ET " - }, - { - "type": "link", - "url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1694441261486759" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1694441261486759", - "ts": "1694441261.486759", - "author_id": "U02LXF3HUN7", - "channel_id": "C01CK9T7HKR", - "channel_team": "T01CWUYP5AR", - "is_msg_unfurl": true, - "message_blocks": [ - { - "team": "T01CWUYP5AR", - "channel": "C01CK9T7HKR", - "ts": "1694441261.486759", - "message": { - "blocks": [ - { - "type": "rich_text", - "block_id": "t94g1", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nThe first Toronto OpenLineage Meetup, featuring a presentation by recent adopter " - }, - { - "type": "link", - "url": "https://metaphor.io/", - "text": "Metaphor" - }, - { - "type": "text", - "text": ", is just one week away. On the agenda:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Evolution of spec presentation/discussion (project background/history)", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "State of the community", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Integrating OpenLineage with ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://metaphor.io/", - "text": "Metaphor", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " (by special guests ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://www.linkedin.com/in/yeliu84/", - "text": "Ye", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " & ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://www.linkedin.com/in/ivanperepelitca/", - "text": "Ivan", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": ")", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark/Column lineage update", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Airflow Provider update", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Roadmap Discussion", - "style": { - "bold": true - } - } - ] - } - ], - "style": "ordered", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Find more details and RSVP ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "text": "here", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "." - } - ] - } - ] - } - ] - } - } - ], - "id": 1, - "original_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1694441261486759", - "fallback": "[September 11th, 2023 7:07 AM] michael282: \nThe first Toronto OpenLineage Meetup, featuring a presentation by recent adopter , is just one week away. On the agenda:\n1. *Evolution of spec presentation/discussion (project background/history)*\n2. *State of the community*\n3. *Integrating OpenLineage with (by special guests & )*\n4. *Spark/Column lineage update*\n5. *Airflow Provider update*\n6. *Roadmap Discussion*\n*Find more details and RSVP *.", - "text": "\nThe first Toronto OpenLineage Meetup, featuring a presentation by recent adopter , is just one week away. On the agenda:\n1. *Evolution of spec presentation/discussion (project background/history)*\n2. *State of the community*\n3. *Integrating OpenLineage with (by special guests & )*\n4. *Spark/Column lineage update*\n5. *Airflow Provider update*\n6. *Roadmap Discussion*\n*Find more details and RSVP *.", - "author_name": "Michael Robinson", - "author_link": "https://openlineage.slack.com/team/U02LXF3HUN7", - "author_icon": "https://avatars.slack-edge.com/2022-01-25/3019716733729_66fea720e9504dc08144_48.jpg", - "author_subname": "Michael Robinson", - "mrkdwn_in": [ - "text" - ], - "footer": "Slack Conversation" - } - ], - "reactions": [ - { - "name": "+1", - "users": [ - "U01RA9B5GG2" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "32e40b58-9b35-45a3-99aa-e37404cd6329", - "type": "message", - "text": "Per discussion in the OpenLineage sync today here is a very early strawman proposal for an OpenLineage registry that producers and consumers could be registered in.\nFeedback or alternate proposals welcome\n\nOnce this is sufficiently fleshed out, I’ll create an actual proposal on github", - "user": "U01DCLP0GU9", - "ts": "1694737381.437569", - "blocks": [ - { - "type": "rich_text", - "block_id": "KKjtL", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Per discussion in the OpenLineage sync today here is a very early strawman proposal for an OpenLineage registry that producers and consumers could be registered in.\nFeedback or alternate proposals welcome\n" - }, - { - "type": "link", - "url": "https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit" - }, - { - "type": "text", - "text": "\nOnce this is sufficiently fleshed out, I’ll create an actual proposal on github" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694737381.437569", - "reply_count": 2, - "reply_users_count": 1, - "latest_reply": "1696541652.452819", - "reply_users": [ - "U01DCLP0GU9" - ], - "is_locked": false, - "subscribed": false, - "reactions": [ - { - "name": "+1", - "users": [ - "U01RA9B5GG2" - ], - "count": 1 - } - ], - "replies": [ - { - "type": "message", - "subtype": "thread_broadcast", - "text": "I have cleaned up the registry proposal.\n\nIn particular:\n• I clarified that option 2 is preferred at this point.\n• I moved discussion notes to the bottom. they will go away at some point\n• Once it is stable, I’ll create a with the preferred option.\n• we need a good proposal for the core facets prefix. My suggestion is to move core facets to `core` in the registry. The drawback is prefix would be inconsistent.\n", - "user": "U01DCLP0GU9", - "ts": "1696379615.265919", - "thread_ts": "1694737381.437569", - "root": { - "client_msg_id": "32e40b58-9b35-45a3-99aa-e37404cd6329", - "type": "message", - "text": "Per discussion in the OpenLineage sync today here is a very early strawman proposal for an OpenLineage registry that producers and consumers could be registered in.\nFeedback or alternate proposals welcome\n\nOnce this is sufficiently fleshed out, I’ll create an actual proposal on github", - "user": "U01DCLP0GU9", - "ts": "1694737381.437569", - "blocks": [ - { - "type": "rich_text", - "block_id": "KKjtL", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Per discussion in the OpenLineage sync today here is a very early strawman proposal for an OpenLineage registry that producers and consumers could be registered in.\nFeedback or alternate proposals welcome\n" - }, - { - "type": "link", - "url": "https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit" - }, - { - "type": "text", - "text": "\nOnce this is sufficiently fleshed out, I’ll create an actual proposal on github" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694737381.437569", - "reply_count": 2, - "reply_users_count": 1, - "latest_reply": "1696541652.452819", - "reply_users": [ - "U01DCLP0GU9" - ], - "is_locked": false, - "subscribed": false - }, - "blocks": [ - { - "type": "rich_text", - "block_id": "UD6d9", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I have cleaned up the registry proposal.\n" - }, - { - "type": "link", - "url": "https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit" - }, - { - "type": "text", - "text": "\nIn particular:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I clarified that option 2 is preferred at this point." - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I moved discussion notes to the bottom. they will go away at some point" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Once it is stable, I’ll create a " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/main/proposals", - "text": "proposal" - }, - { - "type": "text", - "text": " with the preferred option." - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "we need a good proposal for the core facets prefix. My suggestion is to move core facets to " - }, - { - "type": "text", - "text": "core", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " in the registry. The drawback is prefix would be inconsistent." - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [] - } - ] - } - ], - "client_msg_id": "4f7a5bb9-269f-4e6d-98b4-669b2760c1bf" - }, - { - "type": "message", - "subtype": "thread_broadcast", - "text": "I have created a ticket to make this easier to find. Once I get more feedback I’ll turn it into a md file in the repo: \n", - "user": "U01DCLP0GU9", - "ts": "1696541652.452819", - "thread_ts": "1694737381.437569", - "root": { - "client_msg_id": "32e40b58-9b35-45a3-99aa-e37404cd6329", - "type": "message", - "text": "Per discussion in the OpenLineage sync today here is a very early strawman proposal for an OpenLineage registry that producers and consumers could be registered in.\nFeedback or alternate proposals welcome\n\nOnce this is sufficiently fleshed out, I’ll create an actual proposal on github", - "user": "U01DCLP0GU9", - "ts": "1694737381.437569", - "blocks": [ - { - "type": "rich_text", - "block_id": "KKjtL", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Per discussion in the OpenLineage sync today here is a very early strawman proposal for an OpenLineage registry that producers and consumers could be registered in.\nFeedback or alternate proposals welcome\n" - }, - { - "type": "link", - "url": "https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit" - }, - { - "type": "text", - "text": "\nOnce this is sufficiently fleshed out, I’ll create an actual proposal on github" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694737381.437569", - "reply_count": 2, - "reply_users_count": 1, - "latest_reply": "1696541652.452819", - "reply_users": [ - "U01DCLP0GU9" - ], - "is_locked": false, - "subscribed": false - }, - "blocks": [ - { - "type": "rich_text", - "block_id": "EbQGP", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I have created a ticket to make this easier to find. Once I get more feedback I’ll turn it into a md file in the repo: " - }, - { - "type": "link", - "url": "https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit#heading=h.enpbmvu7n8gu" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/2161" - } - ] - } - ] - } - ], - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1696541261, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/issues/2161", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2161 [PROPOSAL] Add a Registry of Producers and Consumers in OpenLineage", - "text": "*Purpose*\n\nThis is the early stage of an idea to get community feedback on what an OpenLineage registry for producers, custom facets and consumers could be. Once this document is stable enough, I’ll create an official proposal on the OpenLineage repo.\n\n*Goal*\n\nAllow third parties to register their implementations or custom extensions to make them easy to discover. \nShorten “Producer” and “schema url” values\n\n*Proposed implementation*\n\nCurrent draft for discussion:\n\n", - "title": "#2161 [PROPOSAL] Add a Registry of Producers and Consumers in OpenLineage", - "title_link": "https://github.com/OpenLineage/OpenLineage/issues/2161", - "footer": "", - "fields": [ - { - "value": "proposal", - "title": "Labels", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "client_msg_id": "b2b64b49-4c25-427f-b1ae-f1fa92aad028", - "edited": { - "user": "U01DCLP0GU9", - "ts": "1696541673.000000" - } - } - ] - }, - { - "client_msg_id": "64406ba1-f8a0-403e-8819-6a11eddd1ec5", - "type": "message", - "text": "Hey everyone,\nAny chance we could have a *openlineage-integration-common* 1.1.1 release with the following changes..?\n• \n• ", - "user": "U055N2GRT4P", - "ts": "1694700221.242579", - "blocks": [ - { - "type": "rich_text", - "block_id": "KjDBh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hey everyone,\nAny chance we could have a " - }, - { - "type": "text", - "text": "openlineage-integration-common", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " 1.1.1 release with the following changes..?\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2106" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2108" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694700221.242579", - "reply_count": 4, - "reply_users_count": 2, - "latest_reply": "1694767212.306929", - "reply_users": [ - "U055N2GRT4P", - "U02LXF3HUN7" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1694767212.306929", - "reactions": [ - { - "name": "heavy_plus_sign", - "users": [ - "U02LXF3HUN7", - "U01HNKK4XAM", - "U01RA9B5GG2", - "U02S6F54MAB", - "U02MK6YNAQ5", - "U01DCLP0GU9" - ], - "count": 6 - } - ], - "replies": [ - { - "client_msg_id": "8b7ef37c-6527-4a9f-a021-b7603fc1b02a", - "type": "message", - "text": "Specially the first PR is affecting users of the *astronomer-cosmos* library: ", - "user": "U055N2GRT4P", - "ts": "1694700319.444869", - "blocks": [ - { - "type": "rich_text", - "block_id": "oAc8+", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Specially the first PR is affecting users of the " - }, - { - "type": "text", - "text": "astronomer-cosmos", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " library: " - }, - { - "type": "link", - "url": "https://github.com/astronomer/astronomer-cosmos/issues/533" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694700221.242579", - "parent_user_id": "U055N2GRT4P" - }, - { - "client_msg_id": "518c0176-d673-4505-b728-c1971a00c897", - "type": "message", - "text": "Thanks <@U055N2GRT4P> for requesting your first OpenLineage release! Three +1s from committers will authorize", - "user": "U02LXF3HUN7", - "ts": "1694700324.134709", - "blocks": [ - { - "type": "rich_text", - "block_id": "g+12P", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks " - }, - { - "type": "user", - "user_id": "U055N2GRT4P" - }, - { - "type": "text", - "text": " for requesting your first OpenLineage release! Three +1s from committers will authorize" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694700221.242579", - "parent_user_id": "U055N2GRT4P", - "reactions": [ - { - "name": "gratitude-thank-you", - "users": [ - "U055N2GRT4P" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "cb4da1c3-9374-42ee-a162-5c7937d788d8", - "type": "message", - "text": "The release is authorized and will be initiated within two business days.", - "user": "U02LXF3HUN7", - "ts": "1694707195.166749", - "blocks": [ - { - "type": "rich_text", - "block_id": "iREVS", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "The release is authorized and will be initiated within two business days." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694700221.242579", - "parent_user_id": "U055N2GRT4P", - "reactions": [ - { - "name": "tada", - "users": [ - "U055N2GRT4P" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "d076f666-77f2-4e04-9f84-d5a066b8a7ed", - "type": "message", - "text": "Thanks a lot, <@U02LXF3HUN7>!", - "user": "U055N2GRT4P", - "ts": "1694767212.306929", - "blocks": [ - { - "type": "rich_text", - "block_id": "gbUl/", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks a lot, " - }, - { - "type": "user", - "user_id": "U02LXF3HUN7" - }, - { - "type": "text", - "text": "!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694700221.242579", - "parent_user_id": "U055N2GRT4P" - } - ] - }, - { - "client_msg_id": "09f3740d-1573-4bcc-9ef8-d980019a3fcd", - "type": "message", - "text": "Context:\n\nWe use Spark with YARN, running on Hadoop 2.x (I can't remember the exact minor version) with Hive support.\n\nProblem:\n\nI'm noticed that `CreateDataSourceAsSelectCommand` objects are _always_ transformed to an `OutputDataset` with a _namespace_ value set to `file` - which is curious, because the inputs always have a (correct) namespace of `hdfs://<name-node>` - is this a known issue? A flaw with Apache Spark? A bug in the resolution logic?\n\nFor reference:\n\n```public class CreateDataSourceTableCommandVisitor\n extends QueryPlanVisitor<CreateDataSourceTableCommand, OpenLineage.OutputDataset> {\n\n public CreateDataSourceTableCommandVisitor(OpenLineageContext context) {\n super(context);\n }\n\n @Override\n public List<OpenLineage.OutputDataset> apply(LogicalPlan x) {\n CreateDataSourceTableCommand command = (CreateDataSourceTableCommand) x;\n CatalogTable catalogTable = command.table();\n\n return Collections.singletonList(\n outputDataset()\n .getDataset(\n PathUtils.fromCatalogTable(catalogTable),\n catalogTable.schema(),\n OpenLineage.LifecycleStateChangeDatasetFacet.LifecycleStateChange.CREATE));\n }\n}```\nRunning this: `cat events.log | jq '{eventTime: .eventTime, eventType: .eventType, runId: .run.runId, jobNamespace: .job.namespace, jobName: .job.name, outputs: .outputs[] | {namespace: .namespace, name: .name}, inputs: .inputs[] | {namespace: .namespace, name: .name}}'`\n\nThis is an output:\n```{\n \"eventTime\": \"2023-09-13T16:01:27.059Z\",\n \"eventType\": \"START\",\n \"runId\": \"bbbb5763-3615-46c0-95ca-1fc398c91d5d\",\n \"jobNamespace\": \"spark.cluster-1\",\n \"jobName\": \"ol_hadoop_test.execute_create_data_source_table_as_select_command.dhawes_db_ol_test_hadoop_tgt\",\n \"outputs\": {\n \"namespace\": \"file\",\n \"name\": \"/user/hive/warehouse/dhawes.db/ol_test_hadoop_tgt\"\n },\n \"inputs\": {\n \"namespace\": \"\",\n \"name\": \"/user/hive/warehouse/dhawes.db/ol_test_hadoop_src\"\n }\n}```", - "user": "U05FLJE4GDU", - "ts": "1694686815.337029", - "blocks": [ - { - "type": "rich_text", - "block_id": "1OX1T", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Context:\n\nWe use Spark with YARN, running on Hadoop 2.x (I can't remember the exact minor version) with Hive support.\n\nProblem:\n\nI'm noticed that " - }, - { - "type": "text", - "text": "CreateDataSourceAsSelectCommand", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " objects are " - }, - { - "type": "text", - "text": "always", - "style": { - "italic": true - } - }, - { - "type": "text", - "text": " transformed to an " - }, - { - "type": "text", - "text": "OutputDataset", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " with a " - }, - { - "type": "text", - "text": "namespace", - "style": { - "italic": true - } - }, - { - "type": "text", - "text": " value set to " - }, - { - "type": "text", - "text": "file", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " - which is curious, because the inputs always have a (correct) namespace of " - }, - { - "type": "text", - "text": "hdfs://", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " - is this a known issue? A flaw with Apache Spark? A bug in the resolution logic?\n\nFor reference:\n\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "public class CreateDataSourceTableCommandVisitor\n extends QueryPlanVisitor {\n\n public CreateDataSourceTableCommandVisitor(OpenLineageContext context) {\n super(context);\n }\n\n @Override\n public List apply(LogicalPlan x) {\n CreateDataSourceTableCommand command = (CreateDataSourceTableCommand) x;\n CatalogTable catalogTable = command.table();\n\n return Collections.singletonList(\n outputDataset()\n .getDataset(\n PathUtils.fromCatalogTable(catalogTable),\n catalogTable.schema(),\n OpenLineage.LifecycleStateChangeDatasetFacet.LifecycleStateChange.CREATE));\n }\n}" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "\nRunning this: " - }, - { - "type": "text", - "text": "cat events.log | jq '{eventTime: .eventTime, eventType: .eventType, runId: .run.runId, jobNamespace: .job.namespace, jobName: .job.name, outputs: .outputs[] | {namespace: .namespace, name: .name}, inputs: .inputs[] | {namespace: .namespace, name: .name}}'", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n\nThis is an output:\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "{\n \"eventTime\": \"2023-09-13T16:01:27.059Z\",\n \"eventType\": \"START\",\n \"runId\": \"bbbb5763-3615-46c0-95ca-1fc398c91d5d\",\n \"jobNamespace\": \"spark.cluster-1\",\n \"jobName\": \"ol_hadoop_test.execute_create_data_source_table_as_select_command.dhawes_db_ol_test_hadoop_tgt\",\n \"outputs\": {\n \"namespace\": \"file\",\n \"name\": \"/user/hive/warehouse/dhawes.db/ol_test_hadoop_tgt\"\n },\n \"inputs\": {\n \"namespace\": \"" - }, - { - "type": "link", - "url": "hdfs://nn1" - }, - { - "type": "text", - "text": "\",\n \"name\": \"/user/hive/warehouse/dhawes.db/ol_test_hadoop_src\"\n }\n}" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05FLJE4GDU", - "ts": "1694687550.000000" - }, - "thread_ts": "1694686815.337029", - "reply_count": 3, - "reply_users_count": 2, - "latest_reply": "1694697839.386649", - "reply_users": [ - "U02MK6YNAQ5", - "U05FLJE4GDU" - ], - "is_locked": false, - "subscribed": false, - "reactions": [ - { - "name": "eyes", - "users": [ - "U02MK6YNAQ5" - ], - "count": 1 - } - ], - "replies": [ - { - "client_msg_id": "67530b68-4b7d-42d2-8bd9-f207888169d2", - "type": "message", - "text": "Seems like an issue on our side. Do you know how the source is read? What LogicalPlan leaf is used to read src? Would love to find how is this done differently", - "user": "U02MK6YNAQ5", - "ts": "1694691145.323709", - "blocks": [ - { - "type": "rich_text", - "block_id": "NPu0C", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Seems like an issue on our side. Do you know how the source is read? What LogicalPlan leaf is used to read src? Would love to find how is this done differently" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694686815.337029", - "parent_user_id": "U05FLJE4GDU" - }, - { - "client_msg_id": "4f3db9e2-2137-478c-86e0-2a4d9d7dad6f", - "type": "message", - "text": "Hmm, I'll have to do explain plan to see what exactly it is.\n\nHowever my sample job uses `spark.sql(\"SELECT * FROM dhawes.ol_test_hadoop_src\")`\n\nwhich itself is created using\n\n```spark.sql(\"SELECT 1 AS id\").write.format(\"orc\").mode(\"overwrite\").saveAsTable(\"dhawes.ol_test_hadoop_src\")```\n", - "user": "U05FLJE4GDU", - "ts": "1694697418.142389", - "blocks": [ - { - "type": "rich_text", - "block_id": "zWFes", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hmm, I'll have to do explain plan to see what exactly it is.\n\nHowever my sample job uses " - }, - { - "type": "text", - "text": "spark.sql(\"SELECT * FROM dhawes.ol_test_hadoop_src\")", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\n\nwhich itself is created using\n\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "spark.sql(\"SELECT 1 AS id\").write.format(\"orc\").mode(\"overwrite\").saveAsTable(\"dhawes.ol_test_hadoop_src\")" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694686815.337029", - "parent_user_id": "U05FLJE4GDU" - }, - { - "client_msg_id": "bde1a4fd-2425-4d60-abcd-40d6e55044a0", - "type": "message", - "text": "```>>> spark.sql(\"SELECT * FROM dhawes.ol_test_hadoop_src\").explain(True)\n== Parsed Logical Plan ==\n'Project [*]\n+- 'UnresolvedRelation `dhawes`.`ol_test_hadoop_src`\n\n== Analyzed Logical Plan ==\nid: int\nProject [id#3]\n+- SubqueryAlias `dhawes`.`ol_test_hadoop_src`\n +- Relation[id#3] orc\n\n== Optimized Logical Plan ==\nRelation[id#3] orc\n\n== Physical Plan ==\n*(1) FileScan orc dhawes.ol_test_hadoop_src[id#3] Batched: true, Format: ORC, Location: InMemoryFileIndex[], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:int>```\n", - "user": "U05FLJE4GDU", - "ts": "1694697839.386649", - "blocks": [ - { - "type": "rich_text", - "block_id": "HMpuD", - "elements": [ - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": ">>> spark.sql(\"SELECT * FROM dhawes.ol_test_hadoop_src\").explain(True)\n== Parsed Logical Plan ==\n'Project [*]\n+- 'UnresolvedRelation `dhawes`.`ol_test_hadoop_src`\n\n== Analyzed Logical Plan ==\nid: int\nProject [id#3]\n+- SubqueryAlias `dhawes`.`ol_test_hadoop_src`\n +- Relation[id#3] orc\n\n== Optimized Logical Plan ==\nRelation[id#3] orc\n\n== Physical Plan ==\n*(1) FileScan orc dhawes.ol_test_hadoop_src[id#3] Batched: true, Format: ORC, Location: InMemoryFileIndex[" - }, - { - "type": "link", - "url": "hdfs://nn1/user/hive/warehouse/dhawes.db/ol_test_hadoop_src" - }, - { - "type": "text", - "text": "], PartitionFilters: [], PushedFilters: [], ReadSchema: struct" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694686815.337029", - "parent_user_id": "U05FLJE4GDU" - } - ] - }, - { - "client_msg_id": "e3ce0cb2-d60c-4e67-b0d1-ca22551c75aa", - "type": "message", - "text": "\nThis month’s TSC meeting, open to all, is tomorrow: ", - "user": "U02LXF3HUN7", - "ts": "1694629232.934029", - "blocks": [ - { - "type": "rich_text", - "block_id": "q4mW7", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nThis month’s TSC meeting, open to all, is tomorrow: " - }, - { - "type": "link", - "url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1694113940400549" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1694113940400549", - "ts": "1694113940.400549", - "author_id": "U02LXF3HUN7", - "channel_id": "C01CK9T7HKR", - "channel_team": "T01CWUYP5AR", - "is_msg_unfurl": true, - "message_blocks": [ - { - "team": "T01CWUYP5AR", - "channel": "C01CK9T7HKR", - "ts": "1694113940.400549", - "message": { - "blocks": [ - { - "type": "rich_text", - "block_id": "Yv9ts", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nThis month’s TSC meeting is next Thursday the 14th at 10am PT. On the tentative agenda:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "announcements" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "recent releases" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "demo: Spark integration tests in Databricks runtime" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "open discussion" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "more (TBA)" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "More info and the meeting link can be found on the " - }, - { - "type": "link", - "url": "https://openlineage.io/meetings/", - "text": "website" - }, - { - "type": "text", - "text": ". All are welcome! Also, feel free to reply or DM me with discussion topics, agenda items, etc." - } - ] - } - ] - } - ] - } - } - ], - "id": 1, - "original_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1694113940400549", - "fallback": "[September 7th, 2023 12:12 PM] michael282: \nThis month’s TSC meeting is next Thursday the 14th at 10am PT. On the tentative agenda:\n• announcements\n• recent releases\n• demo: Spark integration tests in Databricks runtime\n• open discussion\n• more (TBA)\nMore info and the meeting link can be found on the . All are welcome! Also, feel free to reply or DM me with discussion topics, agenda items, etc.", - "text": "\nThis month’s TSC meeting is next Thursday the 14th at 10am PT. On the tentative agenda:\n• announcements\n• recent releases\n• demo: Spark integration tests in Databricks runtime\n• open discussion\n• more (TBA)\nMore info and the meeting link can be found on the . All are welcome! Also, feel free to reply or DM me with discussion topics, agenda items, etc.", - "author_name": "Michael Robinson", - "author_link": "https://openlineage.slack.com/team/U02LXF3HUN7", - "author_icon": "https://avatars.slack-edge.com/2022-01-25/3019716733729_66fea720e9504dc08144_48.jpg", - "author_subname": "Michael Robinson", - "mrkdwn_in": [ - "text" - ], - "footer": "Slack Conversation" - } - ], - "reactions": [ - { - "name": "white_check_mark", - "users": [ - "U0323HG8C8H" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "84e1efcc-8e5c-4334-b18f-5279acee755d", - "type": "message", - "text": "I am exploring Spark - OpenLineage integration (using the latest PySpark and OL versions). I tested a simple pipeline which:\n• Reads JSON data into PySpark DataFrame\n• Apply data transformations\n• Write transformed data to MySQL database\nObserved that we receive 4 events (2 `START` and 2 `COMPLETE`) for the same job name. The events are almost identical with a small diff in the facets. All the events share the same `runId`, and we don't get any `parentRunId`.\nTeam, can you please confirm if this behaviour is expected? Seems to be different from the Airflow integration where we relate jobs to Parent Jobs.", - "user": "U05A1D80QKF", - "ts": "1694583867.900909", - "blocks": [ - { - "type": "rich_text", - "block_id": "p9tSR", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I am exploring Spark - OpenLineage integration (using the latest PySpark and OL versions). I tested a simple pipeline which:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Reads JSON data into PySpark DataFrame" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Apply data transformations" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Write transformed data to MySQL database" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Observed that we receive 4 events (2 " - }, - { - "type": "text", - "text": "START", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " and 2 " - }, - { - "type": "text", - "text": "COMPLETE", - "style": { - "code": true - } - }, - { - "type": "text", - "text": ") for the same job name. The events are almost identical with a small diff in the facets. All the events share the same " - }, - { - "type": "text", - "text": "runId", - "style": { - "code": true - } - }, - { - "type": "text", - "text": ", and we don't get any " - }, - { - "type": "text", - "text": "parentRunId", - "style": { - "code": true - } - }, - { - "type": "text", - "text": ".\nTeam, can you please confirm if this behaviour is expected? Seems to be different from the Airflow integration where we relate jobs to Parent Jobs." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694583867.900909", - "reply_count": 16, - "reply_users_count": 4, - "latest_reply": "1695717183.779589", - "reply_users": [ - "U05FLJE4GDU", - "U05A1D80QKF", - "U01RA9B5GG2", - "U02MK6YNAQ5" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "1d737c7a-5a41-4d36-87c1-8aecd19ca7ea", - "type": "message", - "text": "The Spark integration requires that two parameters are passed to it, namely:\n\n```spark.openlineage.parentJobName\nspark.openlineage.parentRunId```\nYou can find the list of parameters here:\n\n", - "user": "U05FLJE4GDU", - "ts": "1694588077.143419", - "blocks": [ - { - "type": "rich_text", - "block_id": "0UiGZ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "The Spark integration requires that two parameters are passed to it, namely:\n\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "spark.openlineage.parentJobName\nspark.openlineage.parentRunId" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "You can find the list of parameters here:\n\n" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/README.md" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05FLJE4GDU", - "ts": "1694588116.000000" - }, - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF" - }, - { - "client_msg_id": "ad9ed647-58e2-45af-8139-274c7904e9e6", - "type": "message", - "text": "Thanks, will check this out", - "user": "U05A1D80QKF", - "ts": "1694588151.594469", - "blocks": [ - { - "type": "rich_text", - "block_id": "lbzzq", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks, will check this out" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF" - }, - { - "client_msg_id": "a8e782ca-76dc-4f36-9d61-37d9eaebaa86", - "type": "message", - "text": "As for double accounting of events - that's a bit harder to diagnose.", - "user": "U05FLJE4GDU", - "ts": "1694588263.483749", - "blocks": [ - { - "type": "rich_text", - "block_id": "LZeql", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "As for double accounting of events - that's a bit harder to diagnose." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF" - }, - { - "client_msg_id": "4bb93ddd-263b-423e-b260-f2bea978c19f", - "type": "message", - "text": "Can you share the the job and events?\nAlso <@U02MK6YNAQ5>", - "user": "U01RA9B5GG2", - "ts": "1694593983.249679", - "blocks": [ - { - "type": "rich_text", - "block_id": "BrOP5", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Can you share the the job and events?\nAlso " - }, - { - "type": "user", - "user_id": "U02MK6YNAQ5" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF" - }, - { - "type": "message", - "text": "Sure, sharing Job and events.", - "files": [ - { - "id": "F05S21H1LHK", - "created": 1694599423, - "timestamp": 1694599423, - "name": "spark_events.txt", - "title": "spark_events.txt", - "mimetype": "text/plain", - "filetype": "text", - "pretty_type": "Plain Text", - "user": "U05A1D80QKF", - "user_team": "T01CWUYP5AR", - "editable": true, - "size": 49881, - "mode": "snippet", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05S21H1LHK/spark_events.txt", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05S21H1LHK/download/spark_events.txt", - "permalink": "https://openlineage.slack.com/files/U05A1D80QKF/F05S21H1LHK/spark_events.txt", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05S21H1LHK-d330f2f2ea", - "edit_link": "https://openlineage.slack.com/files/U05A1D80QKF/F05S21H1LHK/spark_events.txt/edit", - "preview": "{\"eventTime\": \"2023-09-12T20:44:10.764Z\", \"producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark\", \"schemaURL\": \"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent\", \"eventType\": \"START\", \"run\": {\"runId\": \"9293fb3d-bbe9-4237-b518-719a7c0f149d\", \"facets\": {\"spark.logicalPlan\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark\", \"_schemaURL\": \"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet\", \"plan\": [{\"class\": \"org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand\", \"num-children\": 0, \"query\": [{\"class\": \"org.apache.spark.sql.catalyst.plans.logical.Project\", \"num-children\": 1, \"projectList\": [[{\"class\": \"org.apache.spark.sql.catalyst.expressions.AttributeReference\", \"num-children\": 0, \"name\": \"customer_id\", \"dataType\": \"integer\", \"nullable\": true, \"metadata\": {}, \"exprId\": {\"product-class\": \"org.apache.spark.sql.catalyst.expressions.ExprId\", \"id\": 497, \"jvmId\": \"bca387c7-9171-4d47-8061-7031cec5e...", - "preview_highlight": "
\n
\n
{"eventTime": "2023-09-12T20:44:10.764Z", "producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent", "eventType": "START", "run": {"runId": "9293fb3d-bbe9-4237-b518-719a7c0f149d", "facets": {"spark.logicalPlan": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", "plan": [{"class": "org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand", "num-children": 0, "query": [{"class": "org.apache.spark.sql.catalyst.plans.logical.Project", "num-children": 1, "projectList": [[{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "customer_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 497, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e...
\n
\n
\n", - "lines": 5, - "lines_more": 4, - "preview_is_truncated": true, - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - }, - { - "id": "F05S4UHH4GJ", - "mode": "tombstone" - } - ], - "upload": false, - "user": "U05A1D80QKF", - "ts": "1694599429.520289", - "blocks": [ - { - "type": "rich_text", - "block_id": "Gmvxl", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Sure, sharing Job and events." - } - ] - } - ] - } - ], - "client_msg_id": "6dbc15c5-b1f3-4cf5-8c9f-8e7a9d0e3582", - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF" - }, - { - "type": "message", - "text": "", - "files": [ - { - "id": "F05S4S20MDZ", - "created": 1694599579, - "timestamp": 1694599579, - "name": "etl-mysql.py", - "title": "etl-mysql.py", - "mimetype": "text/plain", - "filetype": "python", - "pretty_type": "Python", - "user": "U05A1D80QKF", - "user_team": "T01CWUYP5AR", - "editable": true, - "size": 2005, - "mode": "snippet", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05S4S20MDZ/etl-mysql.py", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05S4S20MDZ/download/etl-mysql.py", - "permalink": "https://openlineage.slack.com/files/U05A1D80QKF/F05S4S20MDZ/etl-mysql.py", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05S4S20MDZ-b2be55e90f", - "edit_link": "https://openlineage.slack.com/files/U05A1D80QKF/F05S4S20MDZ/etl-mysql.py/edit", - "preview": "from pyspark.sql import SparkSession\nfrom pyspark.sql.functions import col, concat_ws, year, month, dayofmonth\nfrom pyspark.sql.types import StructType, StructField, StringType, IntegerType\n\n", - "preview_highlight": "
\n
\n
from pyspark.sql import SparkSession
\n
from pyspark.sql.functions import col, concat_ws, year, month, dayofmonth
\n
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
\n
\n
\n
\n", - "lines": 57, - "lines_more": 52, - "preview_is_truncated": true, - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05A1D80QKF", - "display_as_bot": false, - "ts": "1694599581.950579", - "client_msg_id": "50568519-5364-4dff-9fbe-750dd6896192", - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF" - }, - { - "client_msg_id": "de52d2d9-2547-40f3-b0c9-f1084212f451", - "type": "message", - "text": "Hi <@U05A1D80QKF>,\n\nThanks for providing such a detailed description of the problem.\n\nIt is not expected behaviour, it's an issue. The events correspond to the same logical plan which for some reason lead to sending two OL events. Is it reproducible aka. does it occur each time? If yes, we please feel free to raise an issue for that.\n\nWe have added in recent months several tests to verify amount of OL events being generated but we haven't tested it that way with JDBC. BTW. will the same happen if you write your data `df_transformed` to a file (like parquet file) ?", - "user": "U02MK6YNAQ5", - "ts": "1694601542.208509", - "blocks": [ - { - "type": "rich_text", - "block_id": "Q7/y4", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi " - }, - { - "type": "user", - "user_id": "U05A1D80QKF" - }, - { - "type": "text", - "text": ",\n\nThanks for providing such a detailed description of the problem.\n\nIt is not expected behaviour, it's an issue. The events correspond to the same logical plan which for some reason lead to sending two OL events. Is it reproducible aka. does it occur each time? If yes, we please feel free to raise an issue for that.\n\nWe have added in recent months several tests to verify amount of OL events being generated but we haven't tested it that way with JDBC. BTW. will the same happen if you write your data " - }, - { - "type": "text", - "text": "df_transformed", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " to a file (like parquet file) ?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF", - "reactions": [ - { - "name": "gratitude-thank-you", - "users": [ - "U05A1D80QKF" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "31906db3-adc9-461c-bcb5-89d606785db8", - "type": "message", - "text": "Thanks <@U02MK6YNAQ5>, will confirm about writing to file and get back.", - "user": "U05A1D80QKF", - "ts": "1694604483.876959", - "blocks": [ - { - "type": "rich_text", - "block_id": "OLVgV", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks " - }, - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": ", will confirm about writing to file and get back." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF" - }, - { - "client_msg_id": "c383dd11-9d69-4616-937f-31661d7f4c5d", - "type": "message", - "text": "And yes, the issue is reproducible. Will raise an issue for this.", - "user": "U05A1D80QKF", - "ts": "1694604815.950619", - "blocks": [ - { - "type": "rich_text", - "block_id": "lJ2OI", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "And yes, the issue is reproducible. Will raise an issue for this." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF" - }, - { - "client_msg_id": "7571e602-ec66-4c82-b7e3-9f0b31e2ca91", - "type": "message", - "text": "even if you write onto a file?", - "user": "U02MK6YNAQ5", - "ts": "1694604834.201719", - "blocks": [ - { - "type": "rich_text", - "block_id": "1EGL5", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "even if you write onto a file?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF" - }, - { - "client_msg_id": "037e36e4-0098-4634-afe3-f505f16631f1", - "type": "message", - "text": "Yes, even when I write to a parquet file.", - "user": "U05A1D80QKF", - "ts": "1694605041.849429", - "blocks": [ - { - "type": "rich_text", - "block_id": "hyQSb", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Yes, even when I write to a parquet file." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF" - }, - { - "client_msg_id": "4cd3fe1c-1c46-40c1-abc8-5b44bef807ca", - "type": "message", - "text": "ok. i think i was able to reproduce it locally with ", - "user": "U02MK6YNAQ5", - "ts": "1694605768.036149", - "blocks": [ - { - "type": "rich_text", - "block_id": "ihnk0", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "ok. i think i was able to reproduce it locally with " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2103/files" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF" - }, - { - "client_msg_id": "741bf33c-bdbe-4bb1-b742-360dd6ac5675", - "type": "message", - "text": "Opened an issue: ", - "user": "U05A1D80QKF", - "ts": "1694606171.447199", - "blocks": [ - { - "type": "rich_text", - "block_id": "JNjNs", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Opened an issue: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/2104" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF" - }, - { - "client_msg_id": "34debb23-2b95-4bf5-bf84-416dd485a755", - "type": "message", - "text": "<@U02MK6YNAQ5> I see that the is work in progress. Any rough estimate on when we can expect this fix to be released?", - "user": "U05A1D80QKF", - "ts": "1695673929.069669", - "blocks": [ - { - "type": "rich_text", - "block_id": "sgyzW", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " I see that the " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2103", - "text": "PR" - }, - { - "type": "text", - "text": " is work in progress. Any rough estimate on when we can expect this fix to be released?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF" - }, - { - "client_msg_id": "ec12c784-190b-4cae-a481-14543076ab1d", - "type": "message", - "text": "<@U05A1D80QKF> put a comment within your issue. it's a bug we need to solve but I cannot bring any estimates today.", - "user": "U02MK6YNAQ5", - "ts": "1695713523.494129", - "blocks": [ - { - "type": "rich_text", - "block_id": "LkN+M", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U05A1D80QKF" - }, - { - "type": "text", - "text": " put a comment within your issue. it's a bug we need to solve but I cannot bring any estimates today." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF" - }, - { - "client_msg_id": "442caa17-2deb-43ee-abe0-7de40a38f803", - "type": "message", - "text": "Thanks for update <@U02MK6YNAQ5>, also please look into comment. It might related and I'm not sure if expected behaviour.", - "user": "U05A1D80QKF", - "ts": "1695717183.779589", - "blocks": [ - { - "type": "rich_text", - "block_id": "smZz1", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks for update " - }, - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": ", also please look into " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/2104#issuecomment-1735065087", - "text": "this" - }, - { - "type": "text", - "text": " comment. It might related and I'm not sure if expected behaviour." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1695716708, - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/issues/2104#issuecomment-1735065087", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "Comment on #2104 [Spark Integration] Receiving duplicate OpenLineage events for a Spark Job", - "text": "Hi I understand your point. I've also come across cases where the duplicate events have the exact same payload with no difference in facets. I'm assuming there should be some diff in the payload for the backend to merge them. I've also seen a single Job emit 7 events and there is no fixed pattern for this. \nI also agree that there should be only a single START and a single COMPLETE event for any Job since this what the OL spec says.\n\nThis Job emits 7 events (5 Start and 2 Complete):\n\n```\nfrom pyspark.sql import SparkSession\nfrom pyspark.sql.functions import col\nfrom pyspark.sql.functions import lit\nimport random\n\nspark = (SparkSession.builder.master('local')\n .appName('spark-pipeline-v1')\n .config('spark.jars.packages', \"io.openlineage:openlineage-spark:1.1.0,\"\n \"mysql:mysql-connector-java:8.0.33,\"\n \"net.snowflake:snowflake-jdbc:3.13.14,\"\n \"net.snowflake:spark-snowflake_2.12:2.10.0-spark_3.2\")\n .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener')\n .config('spark.openlineage.transport.type', 'http')\n .config('spark.openlineage.transport.url', '')\n .config('spark.openlineage.transport.endpoint', '/events/openlineage/spark/api/v1/lineage')\n .config('spark.openlineage.namespace', 'staging')\n .config('spark.openlineage.transport.auth.type', 'api_key')\n .config('spark.openlineage.transport.auth.apiKey', 'test-key')\n .config('spark.openlineage.parentJobName', 'suraj-test-job')\n .config('spark.openlineage.parentRunId', 'acd-eheh-ththth-wnjwnj')\n .getOrCreate())\n\n\nmysql_connection_properties = {\n \"user\": \"\",\n \"password\": \"\",\n \"driver\": \"com.mysql.cj.jdbc.Driver\",\n}\n\nsnowflake_options = {\n \"sfURL\": \".\",\n \"sfUser\": \"\",\n \"sfPassword\": \"\",\n \"sfDatabase\": \"ANALYTICS\",\n \"sfWarehouse\": \"COMPUTE_WH\",\n \"sfSchema\": \"PUBLIC\",\n \"sfRole\": \"ACCOUNTADMIN\",\n}\n\nmysql_url = \"\"\n\ncats_df = spark.read.jdbc(url=mysql_url, table=\"cats\", properties=mysql_connection_properties)\ndogs_df = spark.read.jdbc(url=mysql_url, table=\"dogs\", properties=mysql_connection_properties)\n\ncats_df = cats_df.withColumnRenamed(\"Country\", \"cat_name\")\n\njoined_df = cats_df.join(dogs_df, on=\"owner\", how=\"inner\")\n\nfiltered_df = joined_df.filter(col(\"cat_name\") == 'Cookie')\n\nagg_df = filtered_df.groupBy(\"cat_name\").count()\n\nagg_df = agg_df.withColumn(\"id\", lit(random.randint(0, 1000)))\n\nagg_df.write \\\n .format(\"snowflake\") \\\n .options(**snowflake_options) \\\n .option(\"dbtable\", \"COUNT_TABLE\") \\\n .mode(\"overwrite\") \\\n .save()\n\nspark.stop()\n```", - "title": "Comment on #2104 [Spark Integration] Receiving duplicate OpenLineage events for a Spark Job", - "title_link": "https://github.com/OpenLineage/OpenLineage/issues/2104#issuecomment-1735065087", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1694583867.900909", - "parent_user_id": "U05A1D80QKF" - } - ] - }, - { - "client_msg_id": "45c0b47a-699b-47d1-bc76-b4028a7d2d13", - "type": "message", - "text": " has anyone succeded in getting a custom extractor to work in GCP Cloud Composer or AWS MWAA, seems like there is no way", - "user": "U05QL7LN2GH", - "ts": "1694553961.188719", - "blocks": [ - { - "type": "rich_text", - "block_id": "og69G", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "here" - }, - { - "type": "text", - "text": " has anyone succeded in getting a custom extractor to work in GCP Cloud Composer or AWS MWAA, seems like there is no way" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694553961.188719", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1694554469.757559", - "reply_users": [ - "U01HVNU6A4C" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "2cbfffbe-1026-4f2e-a8a7-8091b79301aa", - "type": "message", - "text": "I'm getting quite close with MWAA. See .", - "user": "U01HVNU6A4C", - "ts": "1694554469.757559", - "blocks": [ - { - "type": "rich_text", - "block_id": "cYLI+", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I'm getting quite close with MWAA. See " - }, - { - "type": "link", - "url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1692743745585879" - }, - { - "type": "text", - "text": "." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U01HVNU6A4C", - "ts": "1694554486.000000" - }, - "attachments": [ - { - "from_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1692743745585879", - "ts": "1692743745.585879", - "author_id": "U01HVNU6A4C", - "channel_id": "C01CK9T7HKR", - "channel_team": "T01CWUYP5AR", - "is_msg_unfurl": true, - "is_thread_root_unfurl": true, - "message_blocks": [ - { - "team": "T01CWUYP5AR", - "channel": "C01CK9T7HKR", - "ts": "1692743745.585879", - "message": { - "blocks": [ - { - "type": "rich_text", - "block_id": "swU0", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Has anyone managed to get the OL Airflow integration to work on AWS MWAA? We've tried pretty much every trick but still ended up with the following error:\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "Broken plugin: [openlineage.airflow.plugin] No module named 'openlineage.airflow'; 'openlineage' is not a package" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [] - } - ] - } - ] - } - } - ], - "id": 1, - "original_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1692743745585879", - "fallback": "[August 22nd, 2023 3:35 PM] mars: Has anyone managed to get the OL Airflow integration to work on AWS MWAA? We've tried pretty much every trick but still ended up with the following error:\n```Broken plugin: [openlineage.airflow.plugin] No module named 'openlineage.airflow'; 'openlineage' is not a package```", - "text": "Has anyone managed to get the OL Airflow integration to work on AWS MWAA? We've tried pretty much every trick but still ended up with the following error:\n```Broken plugin: [openlineage.airflow.plugin] No module named 'openlineage.airflow'; 'openlineage' is not a package```", - "author_name": "Mars Lan", - "author_link": "https://openlineage.slack.com/team/U01HVNU6A4C", - "author_icon": "https://avatars.slack-edge.com/2020-12-28/1598525158821_e70f0b60ad8c72e52398_48.jpg", - "author_subname": "Mars Lan", - "mrkdwn_in": [ - "text" - ], - "footer": "Thread in Slack Conversation" - } - ], - "thread_ts": "1694553961.188719", - "parent_user_id": "U05QL7LN2GH" - } - ] - }, - { - "type": "message", - "text": "I am trying to run Google Cloud Composer where i have added the openlineage-airflow pypi packagae as a dependency and have added the env OPENLINEAGE_EXTRACTORS to point to my custom extractor. I have added a folder by name dependencies and inside that i have placed my extractor file, and the path given to OPENLINEAGE_EXTRACTORS is dependencies.<file_name>.<extractor_class_name>…still it fails with the exception saying No module named ‘dependencies’. Can anyone kindly help me out on correcting my mistake", - "files": [ - { - "id": "F05RM6EV6DV", - "created": 1694545739, - "timestamp": 1694545739, - "name": "Screenshot 2023-09-13 at 12.38.55 AM.png", - "title": "Screenshot 2023-09-13 at 12.38.55 AM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05QL7LN2GH", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 951188, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05RM6EV6DV/screenshot_2023-09-13_at_12.38.55_am.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05RM6EV6DV/download/screenshot_2023-09-13_at_12.38.55_am.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_360.png", - "thumb_360_w": 360, - "thumb_360_h": 122, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_480.png", - "thumb_480_w": 480, - "thumb_480_h": 162, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_720.png", - "thumb_720_w": 720, - "thumb_720_h": 243, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_800.png", - "thumb_800_w": 800, - "thumb_800_h": 270, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_960.png", - "thumb_960_w": 960, - "thumb_960_h": 324, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 346, - "original_w": 3024, - "original_h": 1022, - "thumb_tiny": "AwAQADDRx2pefWg5zRQAtFJ+FHNAC0YpMmjmgD//2Q==", - "permalink": "https://openlineage.slack.com/files/U05QL7LN2GH/F05RM6EV6DV/screenshot_2023-09-13_at_12.38.55_am.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05RM6EV6DV-62656c8fb4", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05QL7LN2GH", - "display_as_bot": false, - "ts": "1694545905.974339", - "blocks": [ - { - "type": "rich_text", - "block_id": "M37Ut", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I am trying to run Google Cloud Composer where i have added the openlineage-airflow pypi packagae as a dependency and have added the env OPENLINEAGE_EXTRACTORS to point to my custom extractor. I have added a folder by name dependencies and inside that i have placed my extractor file, and the path given to OPENLINEAGE_EXTRACTORS is dependencies..…still it fails with the exception saying No module named ‘dependencies’. Can anyone kindly help me out on correcting my mistake" - } - ] - } - ] - } - ], - "client_msg_id": "96b4074c-8175-4e0c-a4d2-480c8ad53912", - "thread_ts": "1694545905.974339", - "reply_count": 56, - "reply_users_count": 4, - "latest_reply": "1694849569.693149", - "reply_users": [ - "U01HNKK4XAM", - "U05QL7LN2GH", - "U01RA9B5GG2", - "U02S6F54MAB" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "6699d029-37b4-46a5-a617-a3596fa43622", - "type": "message", - "text": "Hey <@U05QL7LN2GH>, can you share some details on which versions of airflow and openlineage you’re using?", - "user": "U01HNKK4XAM", - "ts": "1694553336.948439", - "blocks": [ - { - "type": "rich_text", - "block_id": "3T9v9", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hey " - }, - { - "type": "user", - "user_id": "U05QL7LN2GH" - }, - { - "type": "text", - "text": ", can you share some details on which versions of airflow and openlineage you’re using?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "2983444d-e79f-47e1-b84f-52c510893b14", - "type": "message", - "text": "airflow ---> 2.5.3, openlinegae-airflow ---> 1.1.0", - "user": "U05QL7LN2GH", - "ts": "1694553386.763359", - "blocks": [ - { - "type": "rich_text", - "block_id": "zROwd", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "airflow ---> 2.5.3, openlinegae-airflow ---> 1.1.0" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "abe58931-c3fd-4e56-87de-4c9f035cfdbe", - "type": "message", - "text": "```import traceback\nimport uuid\nfrom typing import List, Optional\n\nfrom openlineage.airflow.extractors.base import BaseExtractor, TaskMetadata\nfrom openlineage.airflow.utils import get_job_name\n\n\nclass BigQueryInsertJobExtractor(BaseExtractor):\n def __init__(self, operator):\n super().__init__(operator)\n\n @classmethod\n def get_operator_classnames(cls) -> List[str]:\n return ['BigQueryInsertJobOperator']\n\n def extract(self) -> Optional[TaskMetadata]:\n return None\n\n def extract_on_complete(self, task_instance) -> Optional[TaskMetadata]:\n self.log.debug(f\"JEEVAN ---> extract_on_complete({task_instance})\")\n random_uuid = str(uuid.uuid4())\n self.log.debug(f\"JEEVAN ---> Randomly Generated UUID --> {random_uuid}\")\n\n self.operator.job_id = random_uuid\n\n return TaskMetadata(\n name=get_job_name(task=self.operator)\n )```", - "user": "U05QL7LN2GH", - "ts": "1694555108.611609", - "blocks": [ - { - "type": "rich_text", - "block_id": "yJdTD", - "elements": [ - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "import traceback\nimport uuid\nfrom typing import List, Optional\n\nfrom openlineage.airflow.extractors.base import BaseExtractor, TaskMetadata\nfrom openlineage.airflow.utils import get_job_name\n\n\nclass BigQueryInsertJobExtractor(BaseExtractor):\n def __init__(self, operator):\n super().__init__(operator)\n\n @classmethod\n def get_operator_classnames(cls) -> List[str]:\n return ['BigQueryInsertJobOperator']\n\n def extract(self) -> Optional[TaskMetadata]:\n return None\n\n def extract_on_complete(self, task_instance) -> Optional[TaskMetadata]:\n self.log.debug(f\"JEEVAN ---> extract_on_complete({task_instance})\")\n random_uuid = str(uuid.uuid4())\n self.log.debug(f\"JEEVAN ---> Randomly Generated UUID --> {random_uuid}\")\n\n self.operator.job_id = random_uuid\n\n return TaskMetadata(\n name=get_job_name(task=self.operator)\n )" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "e3536e9c-8378-41f5-b2d4-9cc9914646c8", - "type": "message", - "text": "this is the custom extractor code that im trying with", - "user": "U05QL7LN2GH", - "ts": "1694555124.052409", - "blocks": [ - { - "type": "rich_text", - "block_id": "+dZNQ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "this is the custom extractor code that im trying with" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "56fb30af-3e45-469f-a4d7-f5f377ae8fa9", - "type": "message", - "text": "thanks <@U05QL7LN2GH>, will try to take a deeper look tomorrow", - "user": "U01HNKK4XAM", - "ts": "1694567402.289539", - "blocks": [ - { - "type": "rich_text", - "block_id": "chu5h", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "thanks " - }, - { - "type": "user", - "user_id": "U05QL7LN2GH" - }, - { - "type": "text", - "text": ", will try to take a deeper look tomorrow" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "2dc50680-0b06-466b-8b43-f4faec6fe389", - "type": "message", - "text": "`No module named 'dependencies'.`\nThis sounds like general Python problem", - "user": "U01RA9B5GG2", - "ts": "1694606066.296109", - "blocks": [ - { - "type": "rich_text", - "block_id": "jSK0Q", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "No module named 'dependencies'.", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\nThis sounds like general Python problem" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "488b814e-8149-44fa-a4eb-47578a90fc2d", - "type": "message", - "text": "", - "user": "U01RA9B5GG2", - "ts": "1694606112.308969", - "blocks": [ - { - "type": "rich_text", - "block_id": "wNpXH", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://stackoverflow.com/questions/69991553/how-to-import-custom-modules-in-cloud-composer" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://stackoverflow.com/questions/69991553/how-to-import-custom-modules-in-cloud-composer", - "thumb_url": "https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon@2.png?v=73d79a89bded", - "thumb_width": 316, - "thumb_height": 316, - "service_icon": "https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon.png?v=c78bd457575a", - "id": 1, - "original_url": "https://stackoverflow.com/questions/69991553/how-to-import-custom-modules-in-cloud-composer", - "fallback": "Stack Overflow: how to import custom modules in Cloud Composer", - "text": "I created a local project with apache Airflow and i want to run it in cloud composer. My project contains custom modules and a main file that calls them.\nExample : from src.kuzzle import KuzzleQuery", - "title": "how to import custom modules in Cloud Composer", - "title_link": "https://stackoverflow.com/questions/69991553/how-to-import-custom-modules-in-cloud-composer", - "service_name": "Stack Overflow" - } - ], - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "38a47c30-f268-4ad9-9462-97970aa94cb7", - "type": "message", - "text": "basically, if you're able to import the file from your dag code, OL should be able too", - "user": "U01RA9B5GG2", - "ts": "1694606188.664139", - "blocks": [ - { - "type": "rich_text", - "block_id": "NmblC", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "basically, if you're able to import the file from your dag code, OL should be able too" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "61ec8716-1531-4611-be70-5520e7f8c613", - "type": "message", - "text": "The Problem is in the GCS Composer there is a component called Triggerer, which they say is used for deferrable operators…i have logged into that pod and i could see that the GCS Bucket is not mounted on this, but i am unable to understand why is the initialisation happening inside the triggerer pod", - "user": "U05QL7LN2GH", - "ts": "1694606472.982729", - "blocks": [ - { - "type": "rich_text", - "block_id": "8kk5D", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "The Problem is in the GCS Composer there is a component called Triggerer, which they say is used for deferrable operators…i have logged into that pod and i could see that the GCS Bucket is not mounted on this, but i am unable to understand why is the initialisation happening inside the triggerer pod" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "type": "message", - "text": "", - "files": [ - { - "id": "F05SUDUQEDN", - "created": 1694606486, - "timestamp": 1694606486, - "name": "Screenshot 2023-09-13 at 5.31.22 PM.png", - "title": "Screenshot 2023-09-13 at 5.31.22 PM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05QL7LN2GH", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 1542818, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05SUDUQEDN/screenshot_2023-09-13_at_5.31.22_pm.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05SUDUQEDN/download/screenshot_2023-09-13_at_5.31.22_pm.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_360.png", - "thumb_360_w": 360, - "thumb_360_h": 161, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_480.png", - "thumb_480_w": 480, - "thumb_480_h": 214, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_720.png", - "thumb_720_w": 720, - "thumb_720_h": 321, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_800.png", - "thumb_800_w": 800, - "thumb_800_h": 357, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_960.png", - "thumb_960_w": 960, - "thumb_960_h": 428, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 457, - "original_w": 3010, - "original_h": 1342, - "thumb_tiny": "AwAVADDS74xQRR/FQaAEpaTtS0AFFLRQAY5pGOKWmv2oATOaXNNHWnUxC5ozSUUDP//Z", - "permalink": "https://openlineage.slack.com/files/U05QL7LN2GH/F05SUDUQEDN/screenshot_2023-09-13_at_5.31.22_pm.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05SUDUQEDN-edf19dc4d6", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05QL7LN2GH", - "display_as_bot": false, - "ts": "1694606492.739709", - "client_msg_id": "d19589eb-ea2b-4503-851a-1bff3ab03683", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "63649a04-b4df-4ffe-ad4f-bc6857c3de86", - "type": "message", - "text": "> The Problem is in the GCS Composer there is a component called Triggerer, which they say is used for deferrable operators…i have logged into that pod and i could see that the GCS Bucket is not mounted on this, but i am unable to understand why is the initialisation happening inside the triggerer pod\nOL integration is not running on triggerer, only on worker and scheduler pods", - "user": "U01RA9B5GG2", - "ts": "1694606507.351309", - "blocks": [ - { - "type": "rich_text", - "block_id": "Dd/xA", - "elements": [ - { - "type": "rich_text_quote", - "elements": [ - { - "type": "text", - "text": "The Problem is in the GCS Composer there is a component called Triggerer, which they say is used for deferrable operators…i have logged into that pod and i could see that the GCS Bucket is not mounted on this, but i am unable to understand why is the initialisation happening inside the triggerer pod" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "\nOL integration is not running on triggerer, only on worker and scheduler pods" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "type": "message", - "text": "", - "files": [ - { - "id": "F05SJ3DJ5CH", - "created": 1694606507, - "timestamp": 1694606507, - "name": "Screenshot 2023-09-13 at 5.31.44 PM.png", - "title": "Screenshot 2023-09-13 at 5.31.44 PM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05QL7LN2GH", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 331156, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05SJ3DJ5CH/screenshot_2023-09-13_at_5.31.44_pm.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05SJ3DJ5CH/download/screenshot_2023-09-13_at_5.31.44_pm.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SJ3DJ5CH-d898f6fdcc/screenshot_2023-09-13_at_5.31.44_pm_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SJ3DJ5CH-d898f6fdcc/screenshot_2023-09-13_at_5.31.44_pm_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SJ3DJ5CH-d898f6fdcc/screenshot_2023-09-13_at_5.31.44_pm_360.png", - "thumb_360_w": 360, - "thumb_360_h": 125, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SJ3DJ5CH-d898f6fdcc/screenshot_2023-09-13_at_5.31.44_pm_480.png", - "thumb_480_w": 480, - "thumb_480_h": 167, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SJ3DJ5CH-d898f6fdcc/screenshot_2023-09-13_at_5.31.44_pm_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SJ3DJ5CH-d898f6fdcc/screenshot_2023-09-13_at_5.31.44_pm_720.png", - "thumb_720_w": 720, - "thumb_720_h": 251, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SJ3DJ5CH-d898f6fdcc/screenshot_2023-09-13_at_5.31.44_pm_800.png", - "thumb_800_w": 800, - "thumb_800_h": 279, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SJ3DJ5CH-d898f6fdcc/screenshot_2023-09-13_at_5.31.44_pm_960.png", - "thumb_960_w": 960, - "thumb_960_h": 335, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SJ3DJ5CH-d898f6fdcc/screenshot_2023-09-13_at_5.31.44_pm_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 357, - "original_w": 3024, - "original_h": 1054, - "thumb_tiny": "AwAQADCrub1P50hZx/EfzNIaQ0ALvb+835mje395vzNNooAdvf8AvN+dJvf+8350lFAH/9k=", - "permalink": "https://openlineage.slack.com/files/U05QL7LN2GH/F05SJ3DJ5CH/screenshot_2023-09-13_at_5.31.44_pm.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05SJ3DJ5CH-c6f8f73a8d", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05QL7LN2GH", - "display_as_bot": false, - "ts": "1694606513.120669", - "client_msg_id": "81c34c2f-282f-4b5a-9bc1-698ff96254ee", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "396e095e-dded-49d2-8aaa-b0845d67f992", - "type": "message", - "text": "As you can see in this screenshot i am seeing the logs of the triggerer and it says clearly unable to import plugin openlineage", - "user": "U05QL7LN2GH", - "ts": "1694606606.171889", - "blocks": [ - { - "type": "rich_text", - "block_id": "vBkL+", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "As you can see in this screenshot i am seeing the logs of the triggerer and it says clearly unable to import plugin openlineage" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "type": "message", - "text": "", - "files": [ - { - "id": "F05SUDUQEDN", - "created": 1694606486, - "timestamp": 1694606486, - "name": "Screenshot 2023-09-13 at 5.31.22 PM.png", - "title": "Screenshot 2023-09-13 at 5.31.22 PM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05QL7LN2GH", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 1542818, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05SUDUQEDN/screenshot_2023-09-13_at_5.31.22_pm.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05SUDUQEDN/download/screenshot_2023-09-13_at_5.31.22_pm.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_360.png", - "thumb_360_w": 360, - "thumb_360_h": 161, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_480.png", - "thumb_480_w": 480, - "thumb_480_h": 214, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_720.png", - "thumb_720_w": 720, - "thumb_720_h": 321, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_800.png", - "thumb_800_w": 800, - "thumb_800_h": 357, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_960.png", - "thumb_960_w": 960, - "thumb_960_h": 428, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05SUDUQEDN-b8a4896bfd/screenshot_2023-09-13_at_5.31.22_pm_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 457, - "original_w": 3010, - "original_h": 1342, - "thumb_tiny": "AwAVADDS74xQRR/FQaAEpaTtS0AFFLRQAY5pGOKWmv2oATOaXNNHWnUxC5ozSUUDP//Z", - "permalink": "https://openlineage.slack.com/files/U05QL7LN2GH/F05SUDUQEDN/screenshot_2023-09-13_at_5.31.22_pm.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05SUDUQEDN-edf19dc4d6", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05QL7LN2GH", - "display_as_bot": false, - "x_files": [ - "F05SUDUQEDN" - ], - "ts": "1694606609.510489", - "blocks": [ - { - "type": "rich_text", - "block_id": "SPBs9", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://openlineage.slack.com/files/U05QL7LN2GH/F05SUDUQEDN/screenshot_2023-09-13_at_5.31.22_pm.png" - } - ] - } - ] - } - ], - "client_msg_id": "ff1b8b2d-ca2f-4acc-bfdc-a5f76ac0e043", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "eb91ae90-5582-4966-bb4a-b9f875bf6c02", - "type": "message", - "text": "I see. There are few possible things to do here - composer could mount the user files, Airflow could not start plugins on triggerer, or we could detect we're on triggerer and not import anything there. However, does it impact OL or Airflow operation in other way than this log?", - "user": "U01RA9B5GG2", - "ts": "1694607032.263479", - "blocks": [ - { - "type": "rich_text", - "block_id": "YE+8k", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I see. There are few possible things to do here - composer could mount the user files, Airflow could not start plugins on triggerer, or we could detect we're on triggerer and not import anything there. However, does it impact OL or Airflow operation in other way than this log?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "05bafc1e-0f4b-42bb-919e-24dca030f7af", - "type": "message", - "text": "Probably we'd have to do something if that really bothers you as there won't be further changes to Airflow 2.5", - "user": "U01RA9B5GG2", - "ts": "1694607126.289379", - "blocks": [ - { - "type": "rich_text", - "block_id": "4YD1+", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Probably we'd have to do something if that really bothers you as there won't be further changes to Airflow 2.5" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "48836a79-3a5e-4376-aebe-55d30bc8413b", - "type": "message", - "text": "The Problem is it is actually not registering this custom extractor written by me, henceforth i am just receiving the DefaultExtractor things and my piece of extractor code is not even getting triggered", - "user": "U05QL7LN2GH", - "ts": "1694607494.517249", - "blocks": [ - { - "type": "rich_text", - "block_id": "2ufg6", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "The Problem is it is actually not registering this custom extractor written by me, henceforth i am just receiving the DefaultExtractor things and my piece of extractor code is not even getting triggered" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "c3b668ca-b149-453b-ac13-992613f90f61", - "type": "message", - "text": "any suggestions to try <@U01RA9B5GG2>", - "user": "U05QL7LN2GH", - "ts": "1694607769.601109", - "blocks": [ - { - "type": "rich_text", - "block_id": "PM4tm", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "any suggestions to try " - }, - { - "type": "user", - "user_id": "U01RA9B5GG2" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "4435ba4e-2446-4d1d-9426-8fd5035e332d", - "type": "message", - "text": "Could you share worker logs?", - "user": "U01RA9B5GG2", - "ts": "1694608068.429439", - "blocks": [ - { - "type": "rich_text", - "block_id": "k/5hQ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Could you share worker logs?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "ae8653c1-f153-447d-b791-aa4fe703d5b9", - "type": "message", - "text": "and check if module is importable from your dag code?", - "user": "U01RA9B5GG2", - "ts": "1694608076.879469", - "blocks": [ - { - "type": "rich_text", - "block_id": "B+JX3", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "and check if module is importable from your dag code?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "type": "message", - "text": "these are the worker pod logs…where there is no log of openlineageplugin", - "files": [ - { - "id": "F05S5J5AY2E", - "created": 1694608245, - "timestamp": 1694608245, - "name": "downloaded-logs-20230913-180017.json", - "title": "downloaded-logs-20230913-180017.json", - "mimetype": "text/plain", - "filetype": "json", - "pretty_type": "JSON", - "user": "U05QL7LN2GH", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 2886952, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05S5J5AY2E/downloaded-logs-20230913-180017.json", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05S5J5AY2E/download/downloaded-logs-20230913-180017.json", - "media_display_type": "unknown", - "permalink": "https://openlineage.slack.com/files/U05QL7LN2GH/F05S5J5AY2E/downloaded-logs-20230913-180017.json", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05S5J5AY2E-8c5402e3d8", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05QL7LN2GH", - "display_as_bot": false, - "ts": "1694608285.567279", - "blocks": [ - { - "type": "rich_text", - "block_id": "nA/mA", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "these are the worker pod logs…where there is no log of openlineageplugin" - } - ] - } - ] - } - ], - "client_msg_id": "31f0c42a-f1a4-4064-851d-09d3da4a1c00", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "4332faa1-bbb5-4073-bd60-7e9865482513", - "type": "message", - "text": " --> sure will check now on this one", - "user": "U05QL7LN2GH", - "ts": "1694608312.873729", - "blocks": [ - { - "type": "rich_text", - "block_id": "jM2YJ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1694608076879469?thread_ts=1694545905.974339&cid=C01CK9T7HKR" - }, - { - "type": "text", - "text": " --> sure will check now on this one" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1694608076879469?thread_ts=1694545905.974339&cid=C01CK9T7HKR", - "ts": "1694608076.879469", - "author_id": "U01RA9B5GG2", - "channel_id": "C01CK9T7HKR", - "channel_team": "T01CWUYP5AR", - "is_msg_unfurl": true, - "is_reply_unfurl": true, - "message_blocks": [ - { - "team": "T01CWUYP5AR", - "channel": "C01CK9T7HKR", - "ts": "1694608076.879469", - "message": { - "blocks": [ - { - "type": "rich_text", - "block_id": "B+JX3", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "and check if module is importable from your dag code?" - } - ] - } - ] - } - ] - } - } - ], - "id": 1, - "original_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1694608076879469?thread_ts=1694545905.974339&cid=C01CK9T7HKR", - "fallback": "[September 13th, 2023 5:27 AM] maciej.obuchowski: and check if module is importable from your dag code?", - "text": "and check if module is importable from your dag code?", - "author_name": "Maciej Obuchowski", - "author_link": "https://openlineage.slack.com/team/U01RA9B5GG2", - "author_icon": "https://avatars.slack-edge.com/2021-03-16/1888464110336_2f556751039c2a20f1fe_48.jpg", - "author_subname": "Maciej Obuchowski", - "mrkdwn_in": [ - "text" - ], - "footer": "Thread in Slack Conversation" - } - ], - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "ba6d946f-3279-44dc-963e-63f8d426691e", - "type": "message", - "text": "``` {\n \"textPayload\": \"Traceback (most recent call last): File \\\"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py\\\", line 427, in import_from_string module = importlib.import_module(module_path) File \\\"/opt/python3.8/lib/python3.8/importlib/__init__.py\\\", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File \\\"<frozen importlib._bootstrap>\\\", line 1014, in _gcd_import File \\\"<frozen importlib._bootstrap>\\\", line 991, in _find_and_load File \\\"<frozen importlib._bootstrap>\\\", line 961, in _find_and_load_unlocked File \\\"<frozen importlib._bootstrap>\\\", line 219, in _call_with_frames_removed File \\\"<frozen importlib._bootstrap>\\\", line 1014, in _gcd_import File \\\"<frozen importlib._bootstrap>\\\", line 991, in _find_and_load File \\\"<frozen importlib._bootstrap>\\\", line 961, in _find_and_load_unlocked File \\\"<frozen importlib._bootstrap>\\\", line 219, in _call_with_frames_removed File \\\"<frozen importlib._bootstrap>\\\", line 1014, in _gcd_import File \\\"<frozen importlib._bootstrap>\\\", line 991, in _find_and_load File \\\"<frozen importlib._bootstrap>\\\", line 973, in _find_and_load_unlockedModuleNotFoundError: No module named 'airflow.gcs'\",\n \"insertId\": \"pt2eu6fl9z5vw\",\n \"resource\": {\n \"type\": \"cloud_composer_environment\",\n \"labels\": {\n \"environment_name\": \"openlineage\",\n \"location\": \"us-west1\",\n \"project_id\": \"acceldata-acm\"\n }\n },\n \"timestamp\": \"2023-09-13T06:20:44.131577764Z\",\n \"severity\": \"ERROR\",\n \"labels\": {\n \"worker_id\": \"airflow-worker-xttt8\"\n },\n \"logName\": \"projects/acceldata-acm/logs/airflow-worker\",\n \"receiveTimestamp\": \"2023-09-13T06:20:48.847319607Z\"\n },```\nit doesn't see `No module named 'airflow.gcs'` that is part of your extractor path `airflow.gcs.dags.big_query_insert_job_extractor.BigQueryInsertJobExtractor`\nhowever, is it necessary? I generally see people using imports directly from dags folder", - "user": "U01RA9B5GG2", - "ts": "1694608712.776939", - "blocks": [ - { - "type": "rich_text", - "block_id": "SNTlV", - "elements": [ - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": " {\n \"textPayload\": \"Traceback (most recent call last): File \\\"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py\\\", line 427, in import_from_string module = importlib.import_module(module_path) File \\\"/opt/python3.8/lib/python3.8/importlib/__init__.py\\\", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File \\\"\\\", line 1014, in _gcd_import File \\\"\\\", line 991, in _find_and_load File \\\"\\\", line 961, in _find_and_load_unlocked File \\\"\\\", line 219, in _call_with_frames_removed File \\\"\\\", line 1014, in _gcd_import File \\\"\\\", line 991, in _find_and_load File \\\"\\\", line 961, in _find_and_load_unlocked File \\\"\\\", line 219, in _call_with_frames_removed File \\\"\\\", line 1014, in _gcd_import File \\\"\\\", line 991, in _find_and_load File \\\"\\\", line 973, in _find_and_load_unlockedModuleNotFoundError: No module named 'airflow.gcs'\",\n \"insertId\": \"pt2eu6fl9z5vw\",\n \"resource\": {\n \"type\": \"cloud_composer_environment\",\n \"labels\": {\n \"environment_name\": \"openlineage\",\n \"location\": \"us-west1\",\n \"project_id\": \"acceldata-acm\"\n }\n },\n \"timestamp\": \"2023-09-13T06:20:44.131577764Z\",\n \"severity\": \"ERROR\",\n \"labels\": {\n \"worker_id\": \"airflow-worker-xttt8\"\n },\n \"logName\": \"projects/acceldata-acm/logs/airflow-worker\",\n \"receiveTimestamp\": \"2023-09-13T06:20:48.847319607Z\"\n }," - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "\nit doesn't see " - }, - { - "type": "text", - "text": "No module named 'airflow.gcs'", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " that is part of your extractor path " - }, - { - "type": "text", - "text": "airflow.gcs.dags.big_query_insert_job_extractor.BigQueryInsertJobExtractor", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "\nhowever, is it necessary? I generally see people using imports directly from dags folder" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U01RA9B5GG2", - "ts": "1694608729.000000" - }, - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "type": "message", - "text": "this is one of the experimentation that i have did, but then i reverted it back to keeping it to dependencies.big_query_insert_job_extractor.BigQueryInsertJobExtractor…where dependencies is a module i have created inside my dags folder", - "files": [ - { - "id": "F05S2NCLM9T", - "created": 1694609020, - "timestamp": 1694609020, - "name": "Screenshot 2023-09-13 at 6.13.36 PM.png", - "title": "Screenshot 2023-09-13 at 6.13.36 PM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05QL7LN2GH", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 55433, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05S2NCLM9T/screenshot_2023-09-13_at_6.13.36_pm.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05S2NCLM9T/download/screenshot_2023-09-13_at_6.13.36_pm.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NCLM9T-09572876f7/screenshot_2023-09-13_at_6.13.36_pm_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NCLM9T-09572876f7/screenshot_2023-09-13_at_6.13.36_pm_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NCLM9T-09572876f7/screenshot_2023-09-13_at_6.13.36_pm_360.png", - "thumb_360_w": 360, - "thumb_360_h": 79, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NCLM9T-09572876f7/screenshot_2023-09-13_at_6.13.36_pm_480.png", - "thumb_480_w": 480, - "thumb_480_h": 105, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NCLM9T-09572876f7/screenshot_2023-09-13_at_6.13.36_pm_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NCLM9T-09572876f7/screenshot_2023-09-13_at_6.13.36_pm_720.png", - "thumb_720_w": 720, - "thumb_720_h": 158, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NCLM9T-09572876f7/screenshot_2023-09-13_at_6.13.36_pm_800.png", - "thumb_800_w": 800, - "thumb_800_h": 175, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NCLM9T-09572876f7/screenshot_2023-09-13_at_6.13.36_pm_960.png", - "thumb_960_w": 960, - "thumb_960_h": 210, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NCLM9T-09572876f7/screenshot_2023-09-13_at_6.13.36_pm_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 224, - "original_w": 1270, - "original_h": 278, - "thumb_tiny": "AwAKADDRyc460vNLRQA0McUuT6UtFACc+1HPtS0UAf/Z", - "permalink": "https://openlineage.slack.com/files/U05QL7LN2GH/F05S2NCLM9T/screenshot_2023-09-13_at_6.13.36_pm.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05S2NCLM9T-af73e2f6b1", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05QL7LN2GH", - "display_as_bot": false, - "ts": "1694609051.132379", - "blocks": [ - { - "type": "rich_text", - "block_id": "lfYf2", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "this is one of the experimentation that i have did, but then i reverted it back to keeping it to dependencies.big_query_insert_job_extractor.BigQueryInsertJobExtractor…where dependencies is a module i have created inside my dags folder" - } - ] - } - ] - } - ], - "client_msg_id": "5815ad59-e69c-4c15-8610-a8117ceb31c2", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "type": "message", - "text": "", - "files": [ - { - "id": "F05RM6EV6DV", - "created": 1694545739, - "timestamp": 1694545739, - "name": "Screenshot 2023-09-13 at 12.38.55 AM.png", - "title": "Screenshot 2023-09-13 at 12.38.55 AM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05QL7LN2GH", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 951188, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05RM6EV6DV/screenshot_2023-09-13_at_12.38.55_am.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05RM6EV6DV/download/screenshot_2023-09-13_at_12.38.55_am.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_360.png", - "thumb_360_w": 360, - "thumb_360_h": 122, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_480.png", - "thumb_480_w": 480, - "thumb_480_h": 162, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_720.png", - "thumb_720_w": 720, - "thumb_720_h": 243, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_800.png", - "thumb_800_w": 800, - "thumb_800_h": 270, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_960.png", - "thumb_960_w": 960, - "thumb_960_h": 324, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RM6EV6DV-e20cfb50c7/screenshot_2023-09-13_at_12.38.55_am_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 346, - "original_w": 3024, - "original_h": 1022, - "thumb_tiny": "AwAQADDRx2pefWg5zRQAtFJ+FHNAC0YpMmjmgD//2Q==", - "permalink": "https://openlineage.slack.com/files/U05QL7LN2GH/F05RM6EV6DV/screenshot_2023-09-13_at_12.38.55_am.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05RM6EV6DV-62656c8fb4", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05QL7LN2GH", - "display_as_bot": false, - "x_files": [ - "F05RM6EV6DV" - ], - "ts": "1694609073.130239", - "blocks": [ - { - "type": "rich_text", - "block_id": "BGYlB", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://openlineage.slack.com/files/U05QL7LN2GH/F05RM6EV6DV/screenshot_2023-09-13_at_12.38.55_am.png" - } - ] - } - ] - } - ], - "client_msg_id": "457a61b4-706d-433a-b375-bf640843c913", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "type": "message", - "text": "these are the logs of the triggerer pod specifically", - "files": [ - { - "id": "F05S2NNQ1UM", - "created": 1694609135, - "timestamp": 1694609135, - "name": "Screenshot 2023-09-13 at 6.15.30 PM.png", - "title": "Screenshot 2023-09-13 at 6.15.30 PM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05QL7LN2GH", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 1349379, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05S2NNQ1UM/screenshot_2023-09-13_at_6.15.30_pm.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05S2NNQ1UM/download/screenshot_2023-09-13_at_6.15.30_pm.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NNQ1UM-7870a10644/screenshot_2023-09-13_at_6.15.30_pm_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NNQ1UM-7870a10644/screenshot_2023-09-13_at_6.15.30_pm_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NNQ1UM-7870a10644/screenshot_2023-09-13_at_6.15.30_pm_360.png", - "thumb_360_w": 360, - "thumb_360_h": 162, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NNQ1UM-7870a10644/screenshot_2023-09-13_at_6.15.30_pm_480.png", - "thumb_480_w": 480, - "thumb_480_h": 217, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NNQ1UM-7870a10644/screenshot_2023-09-13_at_6.15.30_pm_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NNQ1UM-7870a10644/screenshot_2023-09-13_at_6.15.30_pm_720.png", - "thumb_720_w": 720, - "thumb_720_h": 325, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NNQ1UM-7870a10644/screenshot_2023-09-13_at_6.15.30_pm_800.png", - "thumb_800_w": 800, - "thumb_800_h": 361, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NNQ1UM-7870a10644/screenshot_2023-09-13_at_6.15.30_pm_960.png", - "thumb_960_w": 960, - "thumb_960_h": 433, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S2NNQ1UM-7870a10644/screenshot_2023-09-13_at_6.15.30_pm_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 462, - "original_w": 3018, - "original_h": 1362, - "thumb_tiny": "AwAVADCpz60hHvS0Z9qAG496SlNJigAooooAcDRmkpaAENJSmkPWgBM0ZoooA//Z", - "permalink": "https://openlineage.slack.com/files/U05QL7LN2GH/F05S2NNQ1UM/screenshot_2023-09-13_at_6.15.30_pm.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05S2NNQ1UM-9a548d0a76", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05QL7LN2GH", - "display_as_bot": false, - "ts": "1694609146.705369", - "blocks": [ - { - "type": "rich_text", - "block_id": "SHNkd", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "these are the logs of the triggerer pod specifically" - } - ] - } - ] - } - ], - "client_msg_id": "47c532c4-9680-4dff-9073-84fd361f08cc", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "3032c52c-7420-4f63-9c06-9b33633e03a3", - "type": "message", - "text": "yeah it would be expected to have this in triggerer where it's not mounted, but will it behave the same for worker where it's mounted?", - "user": "U01RA9B5GG2", - "ts": "1694609191.704139", - "blocks": [ - { - "type": "rich_text", - "block_id": "iH3ag", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yeah it would be expected to have this in triggerer where it's not mounted, but will it behave the same for worker where it's mounted?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "db50bbab-6b72-43a1-b68f-db0eb7052cf0", - "type": "message", - "text": "maybe `___init___.py` is missing for top-level dag path?", - "user": "U01RA9B5GG2", - "ts": "1694609229.577469", - "blocks": [ - { - "type": "rich_text", - "block_id": "7+IIh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "maybe " - }, - { - "type": "text", - "text": "__", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "init", - "style": { - "italic": true, - "code": true - } - }, - { - "type": "text", - "text": "__.py", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " is missing for top-level dag path?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "type": "message", - "text": "these are the logs of the worker pod at startup, where it does not complain of the plugin like in triggerer, but when tasks are run on this worker…somehow it is not picking up the extractor for the operator that i have written it for", - "files": [ - { - "id": "F05RR4YHH1V", - "created": 1694609270, - "timestamp": 1694609270, - "name": "Screenshot 2023-09-13 at 6.17.46 PM.png", - "title": "Screenshot 2023-09-13 at 6.17.46 PM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05QL7LN2GH", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 1106398, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05RR4YHH1V/screenshot_2023-09-13_at_6.17.46_pm.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05RR4YHH1V/download/screenshot_2023-09-13_at_6.17.46_pm.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RR4YHH1V-63b8fc9fb8/screenshot_2023-09-13_at_6.17.46_pm_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RR4YHH1V-63b8fc9fb8/screenshot_2023-09-13_at_6.17.46_pm_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RR4YHH1V-63b8fc9fb8/screenshot_2023-09-13_at_6.17.46_pm_360.png", - "thumb_360_w": 360, - "thumb_360_h": 175, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RR4YHH1V-63b8fc9fb8/screenshot_2023-09-13_at_6.17.46_pm_480.png", - "thumb_480_w": 480, - "thumb_480_h": 234, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RR4YHH1V-63b8fc9fb8/screenshot_2023-09-13_at_6.17.46_pm_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RR4YHH1V-63b8fc9fb8/screenshot_2023-09-13_at_6.17.46_pm_720.png", - "thumb_720_w": 720, - "thumb_720_h": 350, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RR4YHH1V-63b8fc9fb8/screenshot_2023-09-13_at_6.17.46_pm_800.png", - "thumb_800_w": 800, - "thumb_800_h": 389, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RR4YHH1V-63b8fc9fb8/screenshot_2023-09-13_at_6.17.46_pm_960.png", - "thumb_960_w": 960, - "thumb_960_h": 467, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RR4YHH1V-63b8fc9fb8/screenshot_2023-09-13_at_6.17.46_pm_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 498, - "original_w": 3020, - "original_h": 1470, - "thumb_tiny": "AwAXADCpTT160+oz1oAM0UUUAFFFFAC0vFIOho7UABx70lB60UAFFJRQB//Z", - "permalink": "https://openlineage.slack.com/files/U05QL7LN2GH/F05RR4YHH1V/screenshot_2023-09-13_at_6.17.46_pm.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05RR4YHH1V-7d6ea17c5a", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05QL7LN2GH", - "display_as_bot": false, - "ts": "1694609341.476869", - "blocks": [ - { - "type": "rich_text", - "block_id": "BVyLS", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "these are the logs of the worker pod at startup, where it does not complain of the plugin like in triggerer, but when tasks are run on this worker…somehow it is not picking up the extractor for the operator that i have written it for" - } - ] - } - ] - } - ], - "client_msg_id": "b2c18b33-ea4b-4010-b787-3e371fc708c5", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "46021901-2df0-4d3e-9c30-a09253aa8dbd", - "type": "message", - "text": " --> you mean to make the dags folder as well like a module by adding the init.py?", - "user": "U05QL7LN2GH", - "ts": "1694609394.103649", - "blocks": [ - { - "type": "rich_text", - "block_id": "ErKg8", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1694609229577469?thread_ts=1694545905.974339&cid=C01CK9T7HKR" - }, - { - "type": "text", - "text": " --> you mean to make the dags folder as well like a module by adding the init.py?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05QL7LN2GH", - "ts": "1694609406.000000" - }, - "attachments": [ - { - "from_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1694609229577469?thread_ts=1694545905.974339&cid=C01CK9T7HKR", - "ts": "1694609229.577469", - "author_id": "U01RA9B5GG2", - "channel_id": "C01CK9T7HKR", - "channel_team": "T01CWUYP5AR", - "is_msg_unfurl": true, - "is_reply_unfurl": true, - "message_blocks": [ - { - "team": "T01CWUYP5AR", - "channel": "C01CK9T7HKR", - "ts": "1694609229.577469", - "message": { - "blocks": [ - { - "type": "rich_text", - "block_id": "7+IIh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "maybe " - }, - { - "type": "text", - "text": "__", - "style": { - "code": true - } - }, - { - "type": "text", - "text": "init", - "style": { - "italic": true, - "code": true - } - }, - { - "type": "text", - "text": "__.py", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " is missing for top-level dag path?" - } - ] - } - ] - } - ] - } - } - ], - "id": 1, - "original_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1694609229577469?thread_ts=1694545905.974339&cid=C01CK9T7HKR", - "fallback": "[September 13th, 2023 5:47 AM] maciej.obuchowski: maybe `___init___.py` is missing for top-level dag path?", - "text": "maybe `___init___.py` is missing for top-level dag path?", - "author_name": "Maciej Obuchowski", - "author_link": "https://openlineage.slack.com/team/U01RA9B5GG2", - "author_icon": "https://avatars.slack-edge.com/2021-03-16/1888464110336_2f556751039c2a20f1fe_48.jpg", - "author_subname": "Maciej Obuchowski", - "mrkdwn_in": [ - "text" - ], - "footer": "Thread in Slack Conversation" - } - ], - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "d35288ee-7c82-48a5-b263-7d727e78e062", - "type": "message", - "text": "yes, I would put whole custom code directly in dags folder, to make sure import paths are the problem", - "user": "U01RA9B5GG2", - "ts": "1694609724.897809", - "blocks": [ - { - "type": "rich_text", - "block_id": "fbJhf", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yes, I would put whole custom code directly in dags folder, to make sure import paths are the problem" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "14066c89-79fc-48d5-816c-c915928e4c27", - "type": "message", - "text": "and would be nice if you could set\n```AIRFLOW__LOGGING__LOGGING_LEVEL=\"DEBUG\"```", - "user": "U01RA9B5GG2", - "ts": "1694609748.555169", - "blocks": [ - { - "type": "rich_text", - "block_id": "FE5kO", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "and would be nice if you could set\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "AIRFLOW__LOGGING__LOGGING_LEVEL=\"DEBUG\"" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "0b0d7e3c-1c26-44cf-ba66-c1527f6c7308", - "type": "message", - "text": "```Starting the process, got command: triggerer\nInitializing airflow.cfg.\nairflow.cfg initialization is done.\n[2023-09-13T13:11:46.620+0000] {settings.py:267} DEBUG - Setting up DB connection pool (PID 8)\n[2023-09-13T13:11:46.622+0000] {settings.py:372} DEBUG - settings.prepare_engine_args(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=570, pid=8\n[2023-09-13T13:11:46.742+0000] {cli_action_loggers.py:39} DEBUG - Adding <function default_action_log at 0x7ff39ca1d3a0> to pre execution callback\n[2023-09-13T13:11:47.638+0000] {cli_action_loggers.py:65} DEBUG - Calling callbacks: [<function default_action_log at 0x7ff39ca1d3a0>]\n ____________ _____________\n ____ |__( )_________ __/__ /________ __\n____ /| |_ /__ ___/_ /_ __ /_ __ \\_ | /| / /\n___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /\n _/_/ |_/_/ /_/ /_/ /_/ \\____/____/|__/\n[2023-09-13T13:11:50.527+0000] {plugins_manager.py:300} DEBUG - Loading plugins\n[2023-09-13T13:11:50.580+0000] {plugins_manager.py:244} DEBUG - Loading plugins from directory: /home/airflow/gcs/plugins\n[2023-09-13T13:11:50.581+0000] {plugins_manager.py:224} DEBUG - Loading plugins from entrypoints\n[2023-09-13T13:11:50.587+0000] {plugins_manager.py:227} DEBUG - Importing entry_point plugin OpenLineagePlugin\n[2023-09-13T13:11:50.740+0000] {utils.py:430} WARNING - No module named 'boto3'\n[2023-09-13T13:11:50.743+0000] {utils.py:430} WARNING - No module named 'botocore'\n[2023-09-13T13:11:50.833+0000] {utils.py:430} WARNING - No module named 'airflow.providers.sftp'\n[2023-09-13T13:11:51.144+0000] {utils.py:430} WARNING - No module named 'big_query_insert_job_extractor'\n[2023-09-13T13:11:51.145+0000] {plugins_manager.py:237} ERROR - Failed to import plugin OpenLineagePlugin\nTraceback (most recent call last):\n File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py\", line 427, in import_from_string\n module = importlib.import_module(module_path)\n File \"/opt/python3.8/lib/python3.8/importlib/__init__.py\", line 127, in import_module\n return _bootstrap._gcd_import(name[level:], package, level)\n File \"<frozen importlib._bootstrap>\", line 1014, in _gcd_import\n File \"<frozen importlib._bootstrap>\", line 991, in _find_and_load\n File \"<frozen importlib._bootstrap>\", line 973, in _find_and_load_unlocked\nModuleNotFoundError: No module named 'big_query_insert_job_extractor'\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/plugins_manager.py\", line 229, in load_entrypoint_plugins\n plugin_class = entry_point.load()\n File \"/opt/python3.8/lib/python3.8/site-packages/setuptools/_vendor/importlib_metadata/__init__.py\", line 194, in load\n module = import_module(match.group('module'))\n File \"/opt/python3.8/lib/python3.8/importlib/__init__.py\", line 127, in import_module\n return _bootstrap._gcd_import(name[level:], package, level)\n File \"<frozen importlib._bootstrap>\", line 1014, in _gcd_import\n File \"<frozen importlib._bootstrap>\", line 991, in _find_and_load\n File \"<frozen importlib._bootstrap>\", line 975, in _find_and_load_unlocked\n File \"<frozen importlib._bootstrap>\", line 671, in _load_unlocked\n File \"<frozen importlib._bootstrap_external>\", line 843, in exec_module\n File \"<frozen importlib._bootstrap>\", line 219, in _call_with_frames_removed\n File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/plugin.py\", line 32, in <module>\n from openlineage.airflow import listener\n File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/listener.py\", line 75, in <module>\n extractor_manager = ExtractorManager()\n File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/extractors/manager.py\", line 16, in __init__\n self.task_to_extractor = Extractors()\n File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/extractors/extractors.py\", line 122, in __init__\n extractor = import_from_string(extractor.strip())\n File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py\", line 431, in import_from_string\n raise ImportError(f\"Failed to import {path}\") from e\nImportError: Failed to import big_query_insert_job_extractor.BigQueryInsertJobExtractor\n[2023-09-13T13:11:51.235+0000] {plugins_manager.py:227} DEBUG - Importing entry_point plugin composer_menu_plugin\n[2023-09-13T13:11:51.719+0000] {plugins_manager.py:316} DEBUG - Loading 1 plugin(s) took 1.14 seconds\n[2023-09-13T13:11:51.733+0000] {triggerer_job.py:101} INFO - Starting the triggerer\n[2023-09-13T13:11:51.734+0000] {selector_events.py:59} DEBUG - Using selector: EpollSelector\n[2023-09-13T13:11:56.118+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:01.359+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:06.665+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:11.880+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:17.098+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:22.323+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:27.597+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:32.826+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:38.049+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:43.275+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:48.509+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:53.867+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:59.087+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:04.300+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:09.539+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:14.785+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:20.007+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:25.274+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:30.510+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:35.729+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:40.960+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:46.444+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:51.751+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:57.084+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:02.310+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:07.535+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:12.754+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:17.967+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:23.185+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:28.406+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:33.661+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:38.883+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:44.247+0000] {base_job.py:240} DEBUG - [heartbeat]```", - "user": "U05QL7LN2GH", - "ts": "1694610898.877169", - "blocks": [ - { - "type": "rich_text", - "block_id": "K1RCj", - "elements": [ - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "Starting the process, got command: triggerer\nInitializing airflow.cfg.\nairflow.cfg initialization is done.\n[2023-09-13T13:11:46.620+0000] {settings.py:267} DEBUG - Setting up DB connection pool (PID 8)\n[2023-09-13T13:11:46.622+0000] {settings.py:372} DEBUG - settings.prepare_engine_args(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=570, pid=8\n[2023-09-13T13:11:46.742+0000] {cli_action_loggers.py:39} DEBUG - Adding to pre execution callback\n[2023-09-13T13:11:47.638+0000] {cli_action_loggers.py:65} DEBUG - Calling callbacks: []\n ____________ _____________\n ____ |__( )_________ __/__ /________ __\n____ /| |_ /__ ___/_ /_ __ /_ __ \\_ | /| / /\n___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /\n _/_/ |_/_/ /_/ /_/ /_/ \\____/____/|__/\n[2023-09-13T13:11:50.527+0000] {plugins_manager.py:300} DEBUG - Loading plugins\n[2023-09-13T13:11:50.580+0000] {plugins_manager.py:244} DEBUG - Loading plugins from directory: /home/airflow/gcs/plugins\n[2023-09-13T13:11:50.581+0000] {plugins_manager.py:224} DEBUG - Loading plugins from entrypoints\n[2023-09-13T13:11:50.587+0000] {plugins_manager.py:227} DEBUG - Importing entry_point plugin OpenLineagePlugin\n[2023-09-13T13:11:50.740+0000] {utils.py:430} WARNING - No module named 'boto3'\n[2023-09-13T13:11:50.743+0000] {utils.py:430} WARNING - No module named 'botocore'\n[2023-09-13T13:11:50.833+0000] {utils.py:430} WARNING - No module named 'airflow.providers.sftp'\n[2023-09-13T13:11:51.144+0000] {utils.py:430} WARNING - No module named 'big_query_insert_job_extractor'\n[2023-09-13T13:11:51.145+0000] {plugins_manager.py:237} ERROR - Failed to import plugin OpenLineagePlugin\nTraceback (most recent call last):\n File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py\", line 427, in import_from_string\n module = importlib.import_module(module_path)\n File \"/opt/python3.8/lib/python3.8/importlib/__init__.py\", line 127, in import_module\n return _bootstrap._gcd_import(name[level:], package, level)\n File \"\", line 1014, in _gcd_import\n File \"\", line 991, in _find_and_load\n File \"\", line 973, in _find_and_load_unlocked\nModuleNotFoundError: No module named 'big_query_insert_job_extractor'\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/plugins_manager.py\", line 229, in load_entrypoint_plugins\n plugin_class = entry_point.load()\n File \"/opt/python3.8/lib/python3.8/site-packages/setuptools/_vendor/importlib_metadata/__init__.py\", line 194, in load\n module = import_module(match.group('module'))\n File \"/opt/python3.8/lib/python3.8/importlib/__init__.py\", line 127, in import_module\n return _bootstrap._gcd_import(name[level:], package, level)\n File \"\", line 1014, in _gcd_import\n File \"\", line 991, in _find_and_load\n File \"\", line 975, in _find_and_load_unlocked\n File \"\", line 671, in _load_unlocked\n File \"\", line 843, in exec_module\n File \"\", line 219, in _call_with_frames_removed\n File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/plugin.py\", line 32, in \n from openlineage.airflow import listener\n File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/listener.py\", line 75, in \n extractor_manager = ExtractorManager()\n File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/extractors/manager.py\", line 16, in __init__\n self.task_to_extractor = Extractors()\n File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/extractors/extractors.py\", line 122, in __init__\n extractor = import_from_string(extractor.strip())\n File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py\", line 431, in import_from_string\n raise ImportError(f\"Failed to import {path}\") from e\nImportError: Failed to import big_query_insert_job_extractor.BigQueryInsertJobExtractor\n[2023-09-13T13:11:51.235+0000] {plugins_manager.py:227} DEBUG - Importing entry_point plugin composer_menu_plugin\n[2023-09-13T13:11:51.719+0000] {plugins_manager.py:316} DEBUG - Loading 1 plugin(s) took 1.14 seconds\n[2023-09-13T13:11:51.733+0000] {triggerer_job.py:101} INFO - Starting the triggerer\n[2023-09-13T13:11:51.734+0000] {selector_events.py:59} DEBUG - Using selector: EpollSelector\n[2023-09-13T13:11:56.118+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:01.359+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:06.665+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:11.880+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:17.098+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:22.323+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:27.597+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:32.826+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:38.049+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:43.275+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:48.509+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:53.867+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:12:59.087+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:04.300+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:09.539+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:14.785+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:20.007+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:25.274+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:30.510+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:35.729+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:40.960+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:46.444+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:51.751+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:13:57.084+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:02.310+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:07.535+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:12.754+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:17.967+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:23.185+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:28.406+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:33.661+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:38.883+0000] {base_job.py:240} DEBUG - [heartbeat]\n[2023-09-13T13:14:44.247+0000] {base_job.py:240} DEBUG - [heartbeat]" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "233df8a1-29ac-45fc-9e9b-dcb1926b9628", - "type": "message", - "text": "still the same error in the triggerer pod", - "user": "U05QL7LN2GH", - "ts": "1694610910.432769", - "blocks": [ - { - "type": "rich_text", - "block_id": "6w/Pi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "still the same error in the triggerer pod" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "type": "message", - "text": "have changed the dags folder where i have added the init file as you suggested and then have updated the OPENLINEAGE_EXTRACTORS to big_query_insert_job_extractor.BigQueryInsertJobExtractor…still the same thing", - "files": [ - { - "id": "F05S5NP9N7M", - "created": 1694610929, - "timestamp": 1694610929, - "name": "Screenshot 2023-09-13 at 6.45.26 PM.png", - "title": "Screenshot 2023-09-13 at 6.45.26 PM.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05QL7LN2GH", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 997510, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05S5NP9N7M/screenshot_2023-09-13_at_6.45.26_pm.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05S5NP9N7M/download/screenshot_2023-09-13_at_6.45.26_pm.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S5NP9N7M-a5b84be03b/screenshot_2023-09-13_at_6.45.26_pm_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S5NP9N7M-a5b84be03b/screenshot_2023-09-13_at_6.45.26_pm_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S5NP9N7M-a5b84be03b/screenshot_2023-09-13_at_6.45.26_pm_360.png", - "thumb_360_w": 360, - "thumb_360_h": 126, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S5NP9N7M-a5b84be03b/screenshot_2023-09-13_at_6.45.26_pm_480.png", - "thumb_480_w": 480, - "thumb_480_h": 169, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S5NP9N7M-a5b84be03b/screenshot_2023-09-13_at_6.45.26_pm_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S5NP9N7M-a5b84be03b/screenshot_2023-09-13_at_6.45.26_pm_720.png", - "thumb_720_w": 720, - "thumb_720_h": 253, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S5NP9N7M-a5b84be03b/screenshot_2023-09-13_at_6.45.26_pm_800.png", - "thumb_800_w": 800, - "thumb_800_h": 281, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S5NP9N7M-a5b84be03b/screenshot_2023-09-13_at_6.45.26_pm_960.png", - "thumb_960_w": 960, - "thumb_960_h": 337, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05S5NP9N7M-a5b84be03b/screenshot_2023-09-13_at_6.45.26_pm_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 360, - "original_w": 2892, - "original_h": 1016, - "thumb_tiny": "AwAQADDRx/nNA4oBpaAFzRSd+tHPrQAtFJ+NJigD/9k=", - "permalink": "https://openlineage.slack.com/files/U05QL7LN2GH/F05S5NP9N7M/screenshot_2023-09-13_at_6.45.26_pm.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05S5NP9N7M-436a8b87bc", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05QL7LN2GH", - "display_as_bot": false, - "ts": "1694610983.761409", - "blocks": [ - { - "type": "rich_text", - "block_id": "CFHx4", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "have changed the dags folder where i have added the init file as you suggested and then have updated the OPENLINEAGE_EXTRACTORS to big_query_insert_job_extractor.BigQueryInsertJobExtractor…still the same thing" - } - ] - } - ] - } - ], - "client_msg_id": "8031872d-2c92-4071-98a5-8eef81bbdf58", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "73d99e91-e4ad-4a9a-90ce-50cf16106e0b", - "type": "message", - "text": "> still the same error in the triggerer pod\nit won't change, we're not trying to fix the triggerer import but worker, and should look only at worker pod at this point", - "user": "U01RA9B5GG2", - "ts": "1694612187.192779", - "blocks": [ - { - "type": "rich_text", - "block_id": "UEEwt", - "elements": [ - { - "type": "rich_text_quote", - "elements": [ - { - "type": "text", - "text": "still the same error in the triggerer pod" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "it won't change, we're not trying to fix the triggerer import but worker, and should look only at worker pod at this point" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U01RA9B5GG2", - "ts": "1694612229.000000" - }, - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "c930bd41-af54-463f-b011-37125a0ec49c", - "type": "message", - "text": "```extractor for <class 'airflow.providers.google.cloud.operators.bigquery.BigQueryInsertJobOperator'> is <class 'big_query_insert_job_extractor.BigQueryInsertJobExtractor'\n\nUsing extractor BigQueryInsertJobExtractor task_type=BigQueryInsertJobOperator airflow_dag_id=data_analytics_dag task_id=join_bq_datasets.bq_join_holidays_weather_data_2021 airflow_run_id=manual__2023-09-13T13:24:08.946947+00:00 \n\nfatal: not a git repository (or any parent up to mount point /home/airflow)\nStopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).\nfatal: not a git repository (or any parent up to mount point /home/airflow)\nStopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).```", - "user": "U05QL7LN2GH", - "ts": "1694612614.195119", - "blocks": [ - { - "type": "rich_text", - "block_id": "xTXcg", - "elements": [ - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "extractor for is is ", - "style": { - "code": true - } - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "707e745e-34c7-41e1-ad73-199ea31abe09", - "type": "message", - "text": "no `__init__.py` in base `dags` folder", - "user": "U02S6F54MAB", - "ts": "1694614562.336989", - "blocks": [ - { - "type": "rich_text", - "block_id": "X8CD7", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "no " - }, - { - "type": "text", - "text": "__init__.py", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " in base " - }, - { - "type": "text", - "text": "dags", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " folder" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "d83ea232-7514-4559-b8aa-afc6e49eb685", - "type": "message", - "text": "I also checked that triggerer pod indeed has no gcsfuse set up, tbh no idea why, maybe some kind of optimization\nthe only effect is that when loading plugins in triggerer it throws some errors in logs, we don’t do anything at the moment there", - "user": "U02S6F54MAB", - "ts": "1694614622.751639", - "blocks": [ - { - "type": "rich_text", - "block_id": "kGlCl", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I also checked that triggerer pod indeed has no gcsfuse set up, tbh no idea why, maybe some kind of optimization\nthe only effect is that when loading plugins in triggerer it throws some errors in logs, we don’t do anything at the moment there" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "4a20640a-4af5-4964-98cf-1fcfa16885a9", - "type": "message", - "text": "okk…got it <@U02S6F54MAB>…so the init at the top level of dags is as well not reqd, got it. Just one more doubt, there is a requirement where i want to change the operators property in the extractor inside the extract function, will that be taken into account and the operator’s execute be called with the property that i have populated in my extractor?", - "user": "U05QL7LN2GH", - "ts": "1694614766.540349", - "blocks": [ - { - "type": "rich_text", - "block_id": "u4Kyz", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "okk…got it " - }, - { - "type": "user", - "user_id": "U02S6F54MAB" - }, - { - "type": "text", - "text": "…so the init at the top level of dags is as well not reqd, got it. Just one more doubt, there is a requirement where i want to change the operators property in the extractor inside the extract function, will that be taken into account and the operator’s execute be called with the property that i have populated in my extractor?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05QL7LN2GH", - "ts": "1694614794.000000" - }, - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "5d797ae3-ddd8-4cc4-9d0c-05da684c2c4c", - "type": "message", - "text": "for example i want to add a custom job_id to the BigQueryInsertJobOperator, so wheneerv someone uses the BigQueryInsertJobOperator operator i want to intercept that and add this job_id property to the operator…will that work?", - "user": "U05QL7LN2GH", - "ts": "1694614888.624699", - "blocks": [ - { - "type": "rich_text", - "block_id": "Qo9jq", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "for example i want to add a custom job_id to the BigQueryInsertJobOperator, so wheneerv someone uses the BigQueryInsertJobOperator operator i want to intercept that and add this job_id property to the operator…will that work?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "63c25b13-9833-4adc-a4f3-b5429d113d7b", - "type": "message", - "text": "I’m not sure if using OL for such thing is best choice. Wouldn’t it be better to subclass the operator?", - "user": "U02S6F54MAB", - "ts": "1694615086.037269", - "blocks": [ - { - "type": "rich_text", - "block_id": "9kpzj", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I’m not sure if using OL for such thing is best choice. Wouldn’t it be better to subclass the operator?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "599fa46f-72bb-4aec-8130-ed0ef0ccd585", - "type": "message", - "text": "but the answer is: it dependes on the airflow version, in 2.3+ I’m pretty sure the changed property stays in execute method", - "user": "U02S6F54MAB", - "ts": "1694615137.622649", - "blocks": [ - { - "type": "rich_text", - "block_id": "lDV/q", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "but the answer is: it dependes on the airflow version, in 2.3+ I’m pretty sure the changed property stays in execute method" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "c272da2c-cb40-4bbc-884a-23d96e678d05", - "type": "message", - "text": "yeah ideally that is how we should have done this but the problem is our client is having around 1000+ Dag’s in different google cloud projects, which are owned by multiple teams…so they are not willing to change anything in their dag. Thankfully they are using airflow 2.4.3", - "user": "U05QL7LN2GH", - "ts": "1694615269.140809", - "blocks": [ - { - "type": "rich_text", - "block_id": "RPkDF", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yeah ideally that is how we should have done this but the problem is our client is having around 1000+ Dag’s in different google cloud projects, which are owned by multiple teams…so they are not willing to change anything in their dag. Thankfully they are using airflow 2.4.3" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "258df1ca-7e1a-4a4c-8b80-60ad502d1faf", - "type": "message", - "text": "task_policy might be better tool for that: ", - "user": "U01RA9B5GG2", - "ts": "1694615475.647459", - "blocks": [ - { - "type": "rich_text", - "block_id": "76umU", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "task_policy might be better tool for that: " - }, - { - "type": "link", - "url": "https://airflow.apache.org/docs/apache-airflow/2.6.0/administration-and-deployment/cluster-policies.html" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH", - "reactions": [ - { - "name": "heavy_plus_sign", - "users": [ - "U02S6F54MAB" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "5b7efe5d-f026-40af-9b28-721f4f06a322", - "type": "message", - "text": "btw I double-checked - execute method is in different process so this would not change task’s attribute there", - "user": "U02S6F54MAB", - "ts": "1694615730.010879", - "blocks": [ - { - "type": "rich_text", - "block_id": "dseTu", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "btw I double-checked - execute method is in different process so this would not change task’s attribute there" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - }, - { - "client_msg_id": "0fc11194-8888-436b-a4c7-ce13fe6392f5", - "type": "message", - "text": "<@U02S6F54MAB> any idea how can we achieve this one. ---> ", - "user": "U05QL7LN2GH", - "ts": "1694849569.693149", - "blocks": [ - { - "type": "rich_text", - "block_id": "wzGyQ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02S6F54MAB" - }, - { - "type": "text", - "text": " any idea how can we achieve this one. ---> " - }, - { - "type": "link", - "url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1694849427228709" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1694849427228709", - "ts": "1694849427.228709", - "author_id": "U05QL7LN2GH", - "channel_id": "C01CK9T7HKR", - "channel_team": "T01CWUYP5AR", - "is_msg_unfurl": true, - "message_blocks": [ - { - "team": "T01CWUYP5AR", - "channel": "C01CK9T7HKR", - "ts": "1694849427.228709", - "message": { - "blocks": [ - { - "type": "rich_text", - "block_id": "AHZek", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "here" - }, - { - "type": "text", - "text": " we have dataproc operator getting called from a dag which submits a spark job, we wanted to maintain that continuity of parent job in the spark job so according to the documentation we can acheive that by using a macro called lineage_run_id that requires task and task_instance as the parameters. The problem we are facing is that our client’s have 1000's of dags, so asking them to change this everywhere it is used is not feasible, so we thought of using the task_policy feature in the airflow…but the problem is that task_policy gives you access to only the task/operator, but we don’t have the access to the task instance..that is required as a parameter to the lineage_run_id function. Can anyone kindly help us on how should we go about this one\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "t1 = DataProcPySparkOperator(\n task_id=job_name,\n #required pyspark configuration,\n job_name=job_name,\n dataproc_pyspark_properties={\n 'spark.driver.extraJavaOptions':\n f\"-javaagent:{jar}={os.environ.get('OPENLINEAGE_URL')}/api/v1/namespaces/{os.getenv('OPENLINEAGE_NAMESPACE', 'default')}/jobs/{job_name}/runs/{{{{macros.OpenLineagePlugin.lineage_run_id(task, task_instance)}}}}?api_key={os.environ.get('OPENLINEAGE_API_KEY')}\"\n dag=dag)" - } - ], - "border": 0 - } - ] - } - ] - } - } - ], - "id": 1, - "original_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1694849427228709", - "fallback": "[September 16th, 2023 12:30 AM] jeevan: we have dataproc operator getting called from a dag which submits a spark job, we wanted to maintain that continuity of parent job in the spark job so according to the documentation we can acheive that by using a macro called lineage_run_id that requires task and task_instance as the parameters. The problem we are facing is that our client’s have 1000's of dags, so asking them to change this everywhere it is used is not feasible, so we thought of using the task_policy feature in the airflow…but the problem is that task_policy gives you access to only the task/operator, but we don’t have the access to the task instance..that is required as a parameter to the lineage_run_id function. Can anyone kindly help us on how should we go about this one\n```t1 = DataProcPySparkOperator(\n task_id=job_name,\n #required pyspark configuration,\n job_name=job_name,\n dataproc_pyspark_properties={\n 'spark.driver.extraJavaOptions':\n f\"-javaagent:{jar}={os.environ.get('OPENLINEAGE_URL')}/api/v1/namespaces/{os.getenv('OPENLINEAGE_NAMESPACE', 'default')}/jobs/{job_name}/runs/{{{{macros.OpenLineagePlugin.lineage_run_id(task, task_instance)}}}}?api_key={os.environ.get('OPENLINEAGE_API_KEY')}\"\n dag=dag)```", - "text": " we have dataproc operator getting called from a dag which submits a spark job, we wanted to maintain that continuity of parent job in the spark job so according to the documentation we can acheive that by using a macro called lineage_run_id that requires task and task_instance as the parameters. The problem we are facing is that our client’s have 1000's of dags, so asking them to change this everywhere it is used is not feasible, so we thought of using the task_policy feature in the airflow…but the problem is that task_policy gives you access to only the task/operator, but we don’t have the access to the task instance..that is required as a parameter to the lineage_run_id function. Can anyone kindly help us on how should we go about this one\n```t1 = DataProcPySparkOperator(\n task_id=job_name,\n #required pyspark configuration,\n job_name=job_name,\n dataproc_pyspark_properties={\n 'spark.driver.extraJavaOptions':\n f\"-javaagent:{jar}={os.environ.get('OPENLINEAGE_URL')}/api/v1/namespaces/{os.getenv('OPENLINEAGE_NAMESPACE', 'default')}/jobs/{job_name}/runs/{{{{macros.OpenLineagePlugin.lineage_run_id(task, task_instance)}}}}?api_key={os.environ.get('OPENLINEAGE_API_KEY')}\"\n dag=dag)```", - "author_name": "Guntaka Jeevan Paul", - "author_link": "https://openlineage.slack.com/team/U05QL7LN2GH", - "author_icon": "https://avatars.slack-edge.com/2023-08-30/5820708969061_5ef61b937b8b8e9b2d6e_48.png", - "author_subname": "Guntaka Jeevan Paul", - "mrkdwn_in": [ - "text" - ], - "footer": "Slack Conversation" - } - ], - "thread_ts": "1694545905.974339", - "parent_user_id": "U05QL7LN2GH" - } - ] - }, - { - "client_msg_id": "80f8a117-68eb-4bc7-9352-9d5254dab39c", - "type": "message", - "text": "This particular code in docker-compose exits with code 1 because it is unable to find wait-for-it.sh, file in the container. I have checked the mounting path from the local machine, It is correct and the path on the container for Marquez is also correct i.e. /usr/src/app but it is unable to mount the wait-for-it.sh. Does anyone know why is this? This code exists in the open lineage repository as well \n```# Marquez as an OpenLineage Client\n api:\n image: marquezproject/marquez\n container_name: marquez-api\n ports:\n - \"5000:5000\"\n - \"5001:5001\"\n volumes:\n - ./docker/wait-for-it.sh:/usr/src/app/wait-for-it.sh\n links:\n - \"db:postgres\"\n depends_on:\n - db\n entrypoint: [ \"./wait-for-it.sh\", \"db:5432\", \"--\", \"./entrypoint.sh\" ]```", - "user": "U05QNRSQW1E", - "ts": "1694520846.519609", - "blocks": [ - { - "type": "rich_text", - "block_id": "DEp/x", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "This particular code in docker-compose exits with code 1 because it is unable to find wait-for-it.sh, file in the container. I have checked the mounting path from the local machine, It is correct and the path on the container for Marquez is also correct i.e. /usr/src/app but it is unable to mount the wait-for-it.sh. Does anyone know why is this? This code exists in the open lineage repository as well " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/docker-compose.yml" - }, - { - "type": "text", - "text": "\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "# Marquez as an OpenLineage Client\n api:\n image: marquezproject/marquez\n container_name: marquez-api\n ports:\n - \"5000:5000\"\n - \"5001:5001\"\n volumes:\n - ./docker/wait-for-it.sh:/usr/src/app/wait-for-it.sh\n links:\n - \"db:postgres\"\n depends_on:\n - db\n entrypoint: [ \"./wait-for-it.sh\", \"db:5432\", \"--\", \"./entrypoint.sh\" ]" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/docker-compose.yml", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\nversion: \"3.7\"\nservices:\n notebook:\n image: jupyter/pyspark-notebook:spark-3.1.2\n ports:\n - \"8888:8888\"\n volumes:\n - ./docker/notebooks:/home/jovyan/notebooks\n - ./build:/home/jovyan/openlineage\n links:\n - \"api:marquez\"\n depends_on:\n - api\n\n# Marquez as an OpenLineage Client\n api:\n image: marquezproject/marquez\n container_name: marquez-api\n ports:\n - \"5000:5000\"\n - \"5001:5001\"\n volumes:\n - ./docker/wait-for-it.sh:/usr/src/app/wait-for-it.sh\n links:\n - \"db:postgres\"\n depends_on:\n - db\n entrypoint: [ \"./wait-for-it.sh\", \"db:5432\", \"--\", \"./entrypoint.sh\" ]\n\n db:\n image: postgres:12.1\n container_name: marquez-db\n ports:\n - \"5432:5432\"\n environment:\n - POSTGRES_USER=postgres\n - POSTGRES_PASSWORD=password\n - MARQUEZ_DB=marquez\n - MARQUEZ_USER=marquez\n - MARQUEZ_PASSWORD=marquez\n volumes:\n - ./docker/init-db.sh:/docker-entrypoint-initdb.d/init-db.sh\n # Enables SQL statement logging (see: )\n # command: [\"postgres\", \"-c\", \"log_statement=all\"]\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1694520846.519609", - "reply_count": 2, - "reply_users_count": 2, - "latest_reply": "1694529521.989119", - "reply_users": [ - "U05QNRSQW1E", - "U01RA9B5GG2" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "type": "message", - "text": "This is the error message:", - "files": [ - { - "id": "F05RZ520LH0", - "created": 1694520912, - "timestamp": 1694520912, - "name": "image.png", - "title": "image.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05QNRSQW1E", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 38678, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05RZ520LH0/image.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05RZ520LH0/download/image.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RZ520LH0-31293b6eef/image_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RZ520LH0-31293b6eef/image_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RZ520LH0-31293b6eef/image_360.png", - "thumb_360_w": 360, - "thumb_360_h": 69, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RZ520LH0-31293b6eef/image_480.png", - "thumb_480_w": 480, - "thumb_480_h": 92, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RZ520LH0-31293b6eef/image_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RZ520LH0-31293b6eef/image_720.png", - "thumb_720_w": 720, - "thumb_720_h": 138, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RZ520LH0-31293b6eef/image_800.png", - "thumb_800_w": 800, - "thumb_800_h": 153, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RZ520LH0-31293b6eef/image_960.png", - "thumb_960_w": 960, - "thumb_960_h": 183, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05RZ520LH0-31293b6eef/image_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 196, - "original_w": 1162, - "original_h": 222, - "thumb_tiny": "AwAJADCjnAo6jpTaevQUANxnt+tJgen608dKU0AREe4pKf3oboaAP//Z", - "permalink": "https://openlineage.slack.com/files/U05QNRSQW1E/F05RZ520LH0/image.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05RZ520LH0-0af16fdefc", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05QNRSQW1E", - "display_as_bot": false, - "ts": "1694520919.480389", - "blocks": [ - { - "type": "rich_text", - "block_id": "OE+KD", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "This is the error message:" - } - ] - } - ] - } - ], - "client_msg_id": "fad44818-d26b-4144-857a-27c72de151f9", - "thread_ts": "1694520846.519609", - "parent_user_id": "U05QNRSQW1E" - }, - { - "client_msg_id": "b6974478-e3b6-4ce1-9e1e-dd32299bde91", - "type": "message", - "text": "no permissions?", - "user": "U01RA9B5GG2", - "ts": "1694529521.989119", - "blocks": [ - { - "type": "rich_text", - "block_id": "7Uh5u", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "no permissions?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694520846.519609", - "parent_user_id": "U05QNRSQW1E" - } - ] - }, - { - "type": "message", - "subtype": "thread_broadcast", - "text": "Opened a PR for this here: ", - "user": "U04AZ7992SU", - "ts": "1694468262.274069", - "thread_ts": "1694466446.599329", - "root": { - "client_msg_id": "375d0fac-c7ba-4741-bf3b-4194cf48df0e", - "type": "message", - "text": "I’m seeing some odd behavior with my http transport when upgrading airflow/openlineage-airflow from 2.3.2 -> 2.6.3 and 0.24.0 -> 0.28.0. Previously I had a config like this that let me provide my own auth tokens. However, after upgrading I’m getting a 401 from the endpoint and further debugging seems to reveal that we’re not using the token provided in my TokenProvider. Does anyone know if something changed between these versions that could be causing this? (more details in :thread: )\n```transport:\n type: http\n url: \n auth:\n type: some.fully.qualified.classpath```", - "user": "U04AZ7992SU", - "ts": "1694466446.599329", - "blocks": [ - { - "type": "rich_text", - "block_id": "zNf8V", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I’m seeing some odd behavior with my http transport when upgrading airflow/openlineage-airflow from 2.3.2 -> 2.6.3 and 0.24.0 -> 0.28.0. Previously I had a config like this that let me provide my own auth tokens. However, after upgrading I’m getting a 401 from the endpoint and further debugging seems to reveal that we’re not using the token provided in my TokenProvider. Does anyone know if something changed between these versions that could be causing this? (more details in " - }, - { - "type": "emoji", - "name": "thread", - "unicode": "1f9f5" - }, - { - "type": "text", - "text": " )\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "transport:\n type: http\n url: " - }, - { - "type": "link", - "url": "https://my.fake-marquez-endpoint.com" - }, - { - "type": "text", - "text": "\n auth:\n type: some.fully.qualified.classpath" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U04AZ7992SU", - "ts": "1694466508.000000" - }, - "thread_ts": "1694466446.599329", - "reply_count": 4, - "reply_users_count": 1, - "latest_reply": "1694468262.274069", - "reply_users": [ - "U04AZ7992SU" - ], - "is_locked": false, - "subscribed": false - }, - "blocks": [ - { - "type": "rich_text", - "block_id": "Sua8X", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Opened a PR for this here: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2100" - } - ] - } - ] - } - ], - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1694468121, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/pull/2100", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2100 python: fix custom http transport TokenProvider", - "text": "*Problem*\n\nLooks like in we introduced a bug where we would always use the base `TokenProvider` class instead of a given custom token provider, even if the validation conditions passed.\n\n*Solution*\n\nRevert this to instantiate the subclass\n\n☐ Your change modifies the OpenLineage model\n☐ Your change modifies one or more OpenLineage \n\nIf you're contributing a new integration, please specify the scope of the integration and how/where it has been tested (e.g., Apache Spark integration supports `S3` and `GCS` filesystem operations, tested with AWS EMR).\n\n*One-line summary:*\n*Checklist*\n\n☑︎ You've your work\n☑︎ Your pull request title follows our \n☐ Your changes are accompanied by tests (_if relevant_)\n☑︎ Your change contains a and is self-contained\n☐ You've updated any relevant documentation (_if relevant_)\n☐ Your comment includes a one-liner for the changelog about the specific purpose of the change (_if necessary_)\n☐ You've versioned the core OpenLineage model or facets according to (_if relevant_)\n☐ You've added a to source files (_if relevant_)\n\n* * *\n\nSPDX-License-Identifier: Apache-2.0 \nCopyright 2018-2023 contributors to the OpenLineage project", - "title": "#2100 python: fix custom http transport TokenProvider", - "title_link": "https://github.com/OpenLineage/OpenLineage/pull/2100", - "footer": "", - "fields": [ - { - "value": "client/python", - "title": "Labels", - "short": true - }, - { - "value": "1", - "title": "Comments", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "client_msg_id": "880a3f24-c44a-4818-ad1c-2eef97fec037", - "reactions": [ - { - "name": "heart", - "users": [ - "U01DCLP0GU9" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "375d0fac-c7ba-4741-bf3b-4194cf48df0e", - "type": "message", - "text": "I’m seeing some odd behavior with my http transport when upgrading airflow/openlineage-airflow from 2.3.2 -> 2.6.3 and 0.24.0 -> 0.28.0. Previously I had a config like this that let me provide my own auth tokens. However, after upgrading I’m getting a 401 from the endpoint and further debugging seems to reveal that we’re not using the token provided in my TokenProvider. Does anyone know if something changed between these versions that could be causing this? (more details in :thread: )\n```transport:\n type: http\n url: \n auth:\n type: some.fully.qualified.classpath```", - "user": "U04AZ7992SU", - "ts": "1694466446.599329", - "blocks": [ - { - "type": "rich_text", - "block_id": "zNf8V", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I’m seeing some odd behavior with my http transport when upgrading airflow/openlineage-airflow from 2.3.2 -> 2.6.3 and 0.24.0 -> 0.28.0. Previously I had a config like this that let me provide my own auth tokens. However, after upgrading I’m getting a 401 from the endpoint and further debugging seems to reveal that we’re not using the token provided in my TokenProvider. Does anyone know if something changed between these versions that could be causing this? (more details in " - }, - { - "type": "emoji", - "name": "thread", - "unicode": "1f9f5" - }, - { - "type": "text", - "text": " )\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "transport:\n type: http\n url: " - }, - { - "type": "link", - "url": "https://my.fake-marquez-endpoint.com" - }, - { - "type": "text", - "text": "\n auth:\n type: some.fully.qualified.classpath" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U04AZ7992SU", - "ts": "1694466508.000000" - }, - "thread_ts": "1694466446.599329", - "reply_count": 4, - "reply_users_count": 1, - "latest_reply": "1694468262.274069", - "reply_users": [ - "U04AZ7992SU" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "5a0e67f9-4d57-4a3a-af1e-8a59d8d38fa8", - "type": "message", - "text": "If I log this line I can tell the TokenProvider is the class instance I would expect: ", - "user": "U04AZ7992SU", - "ts": "1694466580.365229", - "blocks": [ - { - "type": "rich_text", - "block_id": "SJEKq", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "If I log this line I can tell the TokenProvider is the class instance I would expect: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/45d94fb73b5488d34b8ca544b58317382ceb3980/client/python/openlineage/client/transport/http.py#L55" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/45d94fb73b5488d34b8ca544b58317382ceb3980/client/python/openlineage/client/transport/http.py#L55", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\n subclass = try_import_from_string(of_type)\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1694466446.599329", - "parent_user_id": "U04AZ7992SU" - }, - { - "client_msg_id": "f3cbb88c-eacd-404a-884d-ffc74ba4d696", - "type": "message", - "text": "However, if I log the `token_provider` here I get the origin TokenProvider: ", - "user": "U04AZ7992SU", - "ts": "1694466674.881149", - "blocks": [ - { - "type": "rich_text", - "block_id": "2GXxm", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "However, if I log the " - }, - { - "type": "text", - "text": "token_provider", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " here I get the origin TokenProvider: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/45d94fb73b5488d34b8ca544b58317382ceb3980/client/python/openlineage/client/transport/http.py#L154" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/45d94fb73b5488d34b8ca544b58317382ceb3980/client/python/openlineage/client/transport/http.py#L154", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\n bearer = token_provider.get_bearer()\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1694466446.599329", - "parent_user_id": "U04AZ7992SU" - }, - { - "client_msg_id": "7a84dd3f-bd66-446b-8664-550e3e9c367b", - "type": "message", - "text": "Ah I think I see the issue. Looks like this was introduced here, we are instantiating with the base token provider here when we should be using the subclass: ", - "user": "U04AZ7992SU", - "ts": "1694467136.944659", - "blocks": [ - { - "type": "rich_text", - "block_id": "cUROS", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Ah I think I see the issue. Looks like this was introduced here, we are instantiating with the base token provider here when we should be using the subclass: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/1869/files#diff-2f8ea6f9a22b5567de8ab56c6a63da8e7adf40cb436ee5e7e6b16e70a82afe05R57" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694466446.599329", - "parent_user_id": "U04AZ7992SU" - }, - { - "type": "message", - "subtype": "thread_broadcast", - "text": "Opened a PR for this here: ", - "user": "U04AZ7992SU", - "ts": "1694468262.274069", - "thread_ts": "1694466446.599329", - "root": { - "client_msg_id": "375d0fac-c7ba-4741-bf3b-4194cf48df0e", - "type": "message", - "text": "I’m seeing some odd behavior with my http transport when upgrading airflow/openlineage-airflow from 2.3.2 -> 2.6.3 and 0.24.0 -> 0.28.0. Previously I had a config like this that let me provide my own auth tokens. However, after upgrading I’m getting a 401 from the endpoint and further debugging seems to reveal that we’re not using the token provided in my TokenProvider. Does anyone know if something changed between these versions that could be causing this? (more details in :thread: )\n```transport:\n type: http\n url: \n auth:\n type: some.fully.qualified.classpath```", - "user": "U04AZ7992SU", - "ts": "1694466446.599329", - "blocks": [ - { - "type": "rich_text", - "block_id": "zNf8V", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I’m seeing some odd behavior with my http transport when upgrading airflow/openlineage-airflow from 2.3.2 -> 2.6.3 and 0.24.0 -> 0.28.0. Previously I had a config like this that let me provide my own auth tokens. However, after upgrading I’m getting a 401 from the endpoint and further debugging seems to reveal that we’re not using the token provided in my TokenProvider. Does anyone know if something changed between these versions that could be causing this? (more details in " - }, - { - "type": "emoji", - "name": "thread", - "unicode": "1f9f5" - }, - { - "type": "text", - "text": " )\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "transport:\n type: http\n url: " - }, - { - "type": "link", - "url": "https://my.fake-marquez-endpoint.com" - }, - { - "type": "text", - "text": "\n auth:\n type: some.fully.qualified.classpath" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U04AZ7992SU", - "ts": "1694466508.000000" - }, - "thread_ts": "1694466446.599329", - "reply_count": 4, - "reply_users_count": 1, - "latest_reply": "1694468262.274069", - "reply_users": [ - "U04AZ7992SU" - ], - "is_locked": false, - "subscribed": false - }, - "blocks": [ - { - "type": "rich_text", - "block_id": "Sua8X", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Opened a PR for this here: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2100" - } - ] - } - ] - } - ], - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1694468121, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/pull/2100", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2100 python: fix custom http transport TokenProvider", - "text": "*Problem*\n\nLooks like in we introduced a bug where we would always use the base `TokenProvider` class instead of a given custom token provider, even if the validation conditions passed.\n\n*Solution*\n\nRevert this to instantiate the subclass\n\n☐ Your change modifies the OpenLineage model\n☐ Your change modifies one or more OpenLineage \n\nIf you're contributing a new integration, please specify the scope of the integration and how/where it has been tested (e.g., Apache Spark integration supports `S3` and `GCS` filesystem operations, tested with AWS EMR).\n\n*One-line summary:*\n*Checklist*\n\n☑︎ You've your work\n☑︎ Your pull request title follows our \n☐ Your changes are accompanied by tests (_if relevant_)\n☑︎ Your change contains a and is self-contained\n☐ You've updated any relevant documentation (_if relevant_)\n☐ Your comment includes a one-liner for the changelog about the specific purpose of the change (_if necessary_)\n☐ You've versioned the core OpenLineage model or facets according to (_if relevant_)\n☐ You've added a to source files (_if relevant_)\n\n* * *\n\nSPDX-License-Identifier: Apache-2.0 \nCopyright 2018-2023 contributors to the OpenLineage project", - "title": "#2100 python: fix custom http transport TokenProvider", - "title_link": "https://github.com/OpenLineage/OpenLineage/pull/2100", - "footer": "", - "fields": [ - { - "value": "client/python", - "title": "Labels", - "short": true - }, - { - "value": "1", - "title": "Comments", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "client_msg_id": "880a3f24-c44a-4818-ad1c-2eef97fec037", - "reactions": [ - { - "name": "heart", - "users": [ - "U01DCLP0GU9" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "b03bb166-9c8f-4e02-9630-39985ec4a91a", - "type": "message", - "text": "\nThe first Toronto OpenLineage Meetup, featuring a presentation by recent adopter , is just one week away. On the agenda:\n1. *Evolution of spec presentation/discussion (project background/history)*\n2. *State of the community*\n3. *Integrating OpenLineage with (by special guests & )*\n4. *Spark/Column lineage update*\n5. *Airflow Provider update*\n6. *Roadmap Discussion*\n*Find more details and RSVP *.", - "user": "U02LXF3HUN7", - "ts": "1694441261.486759", - "blocks": [ - { - "type": "rich_text", - "block_id": "t94g1", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nThe first Toronto OpenLineage Meetup, featuring a presentation by recent adopter " - }, - { - "type": "link", - "url": "https://metaphor.io/", - "text": "Metaphor" - }, - { - "type": "text", - "text": ", is just one week away. On the agenda:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Evolution of spec presentation/discussion (project background/history)", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "State of the community", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Integrating OpenLineage with ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://metaphor.io/", - "text": "Metaphor", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " (by special guests ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://www.linkedin.com/in/yeliu84/", - "text": "Ye", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " & ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://www.linkedin.com/in/ivanperepelitca/", - "text": "Ivan", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": ")", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark/Column lineage update", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Airflow Provider update", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Roadmap Discussion", - "style": { - "bold": true - } - } - ] - } - ], - "style": "ordered", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Find more details and RSVP ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "text": "here", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "image_url": "https://static.metaphor.io/preview.jpg", - "image_width": 719, - "image_height": 378, - "image_bytes": 122301, - "from_url": "https://metaphor.io/", - "id": 1, - "original_url": "https://metaphor.io/", - "fallback": "Metaphor - The Social Platform for Data", - "text": "Making Data Actionable, At Scale - Designed for data teams building cloud-native, self-service data platforms for their business users. Explore our Data Governance, Data Lineage, Data Discovery, and Data Trust capabilities today.", - "title": "Metaphor - The Social Platform for Data", - "title_link": "https://metaphor.io/", - "service_name": "metaphor.io" - }, - { - "from_url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "image_url": "https://secure.meetupstatic.com/photos/event/5/4/2/d/600_515181549.jpeg", - "image_width": 600, - "image_height": 338, - "image_bytes": 16248, - "service_icon": "https://secure.meetupstatic.com/next/images/general/m_swarm_120x120.png", - "id": 2, - "original_url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "fallback": "Meetup: Toronto OpenLineage Meetup at Airflow Summit, Mon, Sep 18, 2023, 5:00 PM | Meetup", - "text": "Data engineers and pipeline managers know that producing data lineage – end-to-end pipeline metadata instrumented at runtime or parsed at design time – is a heavy lift with", - "title": "Toronto OpenLineage Meetup at Airflow Summit, Mon, Sep 18, 2023, 5:00 PM | Meetup", - "title_link": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "service_name": "Meetup" - } - ], - "reactions": [ - { - "name": "raised_hands", - "users": [ - "U01HVNU6A4C", - "U0282HEEHB8", - "U01HNKK4XAM", - "U01RA9B5GG2", - "U05KKM07PJP", - "U02MK6YNAQ5", - "U05GTE94SKY" - ], - "count": 7 - } - ] - }, - { - "client_msg_id": "f78687fc-543e-4f8f-a24a-4d116a7734e7", - "type": "message", - "text": "\nThis month’s TSC meeting is next Thursday the 14th at 10am PT. On the tentative agenda:\n• announcements\n• recent releases\n• demo: Spark integration tests in Databricks runtime\n• open discussion\n• more (TBA)\nMore info and the meeting link can be found on the . All are welcome! Also, feel free to reply or DM me with discussion topics, agenda items, etc.", - "user": "U02LXF3HUN7", - "ts": "1694113940.400549", - "blocks": [ - { - "type": "rich_text", - "block_id": "Yv9ts", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nThis month’s TSC meeting is next Thursday the 14th at 10am PT. On the tentative agenda:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "announcements" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "recent releases" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "demo: Spark integration tests in Databricks runtime" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "open discussion" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "more (TBA)" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "More info and the meeting link can be found on the " - }, - { - "type": "link", - "url": "https://openlineage.io/meetings/", - "text": "website" - }, - { - "type": "text", - "text": ". All are welcome! Also, feel free to reply or DM me with discussion topics, agenda items, etc." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02LXF3HUN7", - "ts": "1694178288.000000" - }, - "attachments": [ - { - "from_url": "https://openlineage.io/meetings/", - "service_icon": "https://openlineage.io/img/favicon.ico", - "id": 1, - "original_url": "https://openlineage.io/meetings/", - "fallback": "TSC Meetings | OpenLineage", - "text": "The OpenLineage Technical Steering Committee meets monthly, and is open to all.", - "title": "TSC Meetings | OpenLineage", - "title_link": "https://openlineage.io/meetings/", - "service_name": "openlineage.io" - } - ], - "reactions": [ - { - "name": "+1", - "users": [ - "U01RA9B5GG2" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "c5f1a92b-646e-40de-a2ae-09d105a3e983", - "type": "message", - "text": "Has there been any conversation on the extensibility of facets/concepts? E.g.:\n• how does one extends the list of run states to add a paused/resumed state?\n• how does one extend to add a created at field?", - "user": "U0595Q78HUG", - "ts": "1694036652.124299", - "blocks": [ - { - "type": "rich_text", - "block_id": "OU1qt", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Has there been any conversation on the extensibility of facets/concepts? E.g.:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "how does one extends the list of run states " - }, - { - "type": "link", - "url": "https://openlineage.io/docs/spec/run-cycle" - }, - { - "type": "text", - "text": " to add a paused/resumed state?" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "how does one extend " - }, - { - "type": "link", - "url": "https://openlineage.io/docs/spec/facets/run-facets/nominal_time" - }, - { - "type": "text", - "text": " to add a created at field?" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.io/docs/spec/run-cycle", - "service_icon": "https://openlineage.io/img/favicon.ico", - "id": 1, - "original_url": "https://openlineage.io/docs/spec/run-cycle", - "fallback": "The Run Cycle | OpenLineage", - "text": "The OpenLineage object model is event-based and updates provide an OpenLineage backend with details about the activities of a Job.", - "title": "The Run Cycle | OpenLineage", - "title_link": "https://openlineage.io/docs/spec/run-cycle", - "service_name": "openlineage.io" - }, - { - "from_url": "https://openlineage.io/docs/spec/facets/run-facets/nominal_time", - "service_icon": "https://openlineage.io/img/favicon.ico", - "id": 2, - "original_url": "https://openlineage.io/docs/spec/facets/run-facets/nominal_time", - "fallback": "Nominal Time Facet | OpenLineage", - "text": "The facet to describe the nominal start and end time of the run. The nominal usually means the time the job run was expected to run (like a scheduled time), and the actual time can be different.", - "title": "Nominal Time Facet | OpenLineage", - "title_link": "https://openlineage.io/docs/spec/facets/run-facets/nominal_time", - "service_name": "openlineage.io" - } - ], - "thread_ts": "1694036652.124299", - "reply_count": 3, - "reply_users_count": 2, - "latest_reply": "1694039743.460029", - "reply_users": [ - "U01DCLP0GU9", - "U0595Q78HUG" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "522ea318-8211-40ff-9e3e-ade1bfd05c31", - "type": "message", - "text": "Hello Bernat,\n\nThe primary mechanism to extend the model is through facets. You can either:\n• create new standard facets in the spec: \n• create custom facets defined somewhere else with a prefix in their name: \n• Update existing facets with a backward compatible change (example: adding an optional field). \nThe core spec can also be modified. Here is an example of adding \nThat being said I think more granular states like pause/resume are probably better suited in a run facet. There was an issue opened for that particular one a while ago: maybe that particular discussion can continue there.\n\nFor the nominal time facet, You could open an issue describing the use case and on community agreement follow up with a PR on the facet itself: \n(adding an optional field is backwards compatible)", - "user": "U01DCLP0GU9", - "ts": "1694039297.283689", - "blocks": [ - { - "type": "rich_text", - "block_id": "SE9Oj", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hello Bernat,\n\nThe primary mechanism to extend the model is through facets. You can either:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "create new standard facets in the spec: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/tree/main/spec/facets" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "create custom facets defined somewhere else with a prefix in their name: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md#custom-facet-naming" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Update existing facets with a backward compatible change (example: adding an optional field). " - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "\nThe core spec can also be modified. Here is an example of adding " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/commit/7243d916f5400dbbceaece5cda89da961ad005d3", - "text": "a state" - }, - { - "type": "text", - "text": "\nThat being said I think more granular states like pause/resume are probably better suited in a run facet. There was an issue opened for that particular one a while ago: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/9" - }, - { - "type": "text", - "text": " maybe that particular discussion can continue there.\n\nFor the nominal time facet, You could open an issue describing the use case and on community agreement follow up with a PR on the facet itself: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/spec/facets/NominalTimeRunFacet.json" - }, - { - "type": "text", - "text": "\n(adding an optional field is backwards compatible)" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694036652.124299", - "parent_user_id": "U0595Q78HUG", - "reactions": [ - { - "name": "eyes", - "users": [ - "U05HFGKEYVB" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "64582b1d-c6d5-4b4d-b3f4-169c315315b9", - "type": "message", - "text": "I see, so in general one is best copying a standard facet and maintain it under a different name. That way can be made mandatory :slightly_smiling_face: and one does not need to be blocked for a long time until there's a community agreement :thinking_face:", - "user": "U0595Q78HUG", - "ts": "1694039472.212619", - "blocks": [ - { - "type": "rich_text", - "block_id": "mJcfA", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I see, so in general one is best copying a standard facet and maintain it under a different name. That way can be made mandatory " - }, - { - "type": "emoji", - "name": "slightly_smiling_face", - "unicode": "1f642" - }, - { - "type": "text", - "text": " and one does not need to be blocked for a long time until there's a community agreement " - }, - { - "type": "emoji", - "name": "thinking_face", - "unicode": "1f914" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694036652.124299", - "parent_user_id": "U0595Q78HUG" - }, - { - "client_msg_id": "ceeabfce-4337-4d8f-be9b-a43ab63e2a4d", - "type": "message", - "text": "Yes, The goal of custom facets is to allow you to experiment and extend the spec however you want without having to wait for approval.\nIf the custom facet is very specific to a third party project/product then it makes sense for it to stay a custom facet.\nIf it is more generic then it makes sense to add it to the core facets as part of the spec.\nHopefully community agreement can be achieved relatively quickly. Unless someone is strongly against something, it can be added without too much red tape. Typically with support in at least one of the integrations to validate the model.", - "user": "U01DCLP0GU9", - "ts": "1694039743.460029", - "blocks": [ - { - "type": "rich_text", - "block_id": "FA7ar", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Yes, The goal of custom facets is to allow you to experiment and extend the spec however you want without having to wait for approval.\nIf the custom facet is very specific to a third party project/product then it makes sense for it to stay a custom facet.\nIf it is more generic then it makes sense to add it to the core facets as part of the spec.\nHopefully community agreement can be achieved relatively quickly. Unless someone is strongly against something, it can be added without too much red tape. Typically with support in at least one of the integrations to validate the model." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694036652.124299", - "parent_user_id": "U0595Q78HUG" - } - ] - }, - { - "client_msg_id": "78d7f8ab-38ea-4413-a07b-03c683d63028", - "type": "message", - "text": "Hello Everyone,\n\nI've been diving into the Marquez codebase and found a performance bottleneck in `JobDao.java` for the query related to `namespaceName=MyNameSpace` `limit=10` and 12s with `limit=25`. I managed to optimize it using CTEs, and the execution times dropped dramatically to 300ms (for `limit=100`) and under 100ms (for `limit=25` ) on the same cluster.\nIssue link : \n\nI believe there's even more room for optimization, especially if we adjust the `job_facets_view` to include the `namespace_name` column.\n\nWould the team be open to a PR where I share the optimized query and discuss potential further refinements? I believe these changes could significantly enhance the Marquez web UI experience.\n\nPR link : \n\nLooking forward to your feedback.", - "user": "U05HBLE7YPL", - "ts": "1694032987.624809", - "blocks": [ - { - "type": "rich_text", - "block_id": "htkZ4", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hello Everyone,\n\nI've been diving into the Marquez codebase and found a performance bottleneck in " - }, - { - "type": "text", - "text": "JobDao.java", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " for the query related to " - }, - { - "type": "text", - "text": "namespaceName=MyNameSpace", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "text", - "text": "limit=10", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " and 12s with " - }, - { - "type": "text", - "text": "limit=25", - "style": { - "code": true - } - }, - { - "type": "text", - "text": ". I managed to optimize it using CTEs, and the execution times dropped dramatically to 300ms (for " - }, - { - "type": "text", - "text": "limit=100", - "style": { - "code": true - } - }, - { - "type": "text", - "text": ") and under 100ms (for " - }, - { - "type": "text", - "text": "limit=25", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " ) on the same cluster.\nIssue link : " - }, - { - "type": "link", - "url": "https://github.com/MarquezProject/marquez/issues/2608" - }, - { - "type": "text", - "text": "\n\nI believe there's even more room for optimization, especially if we adjust the `job_facets_view` to include the " - }, - { - "type": "text", - "text": "namespace_name", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " column.\n\nWould the team be open to a PR where I share the optimized query and discuss potential further refinements? I believe these changes could significantly enhance the Marquez web UI experience.\n\nPR link : " - }, - { - "type": "link", - "url": "https://github.com/MarquezProject/marquez/pull/2609" - }, - { - "type": "text", - "text": "\n\nLooking forward to your feedback." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05HBLE7YPL", - "ts": "1694033114.000000" - }, - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1694025187, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/MarquezProject/marquez/issues/2608", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2608 Performance Issue with Query on Large Data Sets in JobDao.java", - "title": "#2608 Performance Issue with Query on Large Data Sets in JobDao.java", - "title_link": "https://github.com/MarquezProject/marquez/issues/2608", - "footer": "", - "mrkdwn_in": [ - "text" - ] - }, - { - "id": 2, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1694032737, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/MarquezProject/marquez/pull/2609", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2609 Perf/improve jobdao query", - "title": "#2609 Perf/improve jobdao query", - "title_link": "https://github.com/MarquezProject/marquez/pull/2609", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1694032987.624809", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1694037781.710149", - "reply_users": [ - "U02S6F54MAB" - ], - "is_locked": false, - "subscribed": false, - "reactions": [ - { - "name": "fire", - "users": [ - "U02S6F54MAB", - "U01HNKK4XAM", - "U02MK6YNAQ5", - "U01RA9B5GG2" - ], - "count": 4 - } - ], - "replies": [ - { - "client_msg_id": "e841183d-05e0-4ab1-898a-20856329f92a", - "type": "message", - "text": "<@U01DCMDFHBK> wdyt?", - "user": "U02S6F54MAB", - "ts": "1694037781.710149", - "blocks": [ - { - "type": "rich_text", - "block_id": "o6dVs", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U01DCMDFHBK" - }, - { - "type": "text", - "text": " wdyt?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1694032987.624809", - "parent_user_id": "U05HBLE7YPL" - } - ] - }, - { - "client_msg_id": "3e9318e1-fd7b-44e0-bf1f-8d5e0cb45d8f", - "type": "message", - "text": "it looks like my dynamic task mapping in Airflow has the same run ID in marquez, so even if I am processing 100 files, there is only one version of the data. is there a way to have a separate version of each dynamic task so I can track the filename etc?", - "user": "U05NMJ0NBUK", - "ts": "1693877705.781699", - "blocks": [ - { - "type": "rich_text", - "block_id": "1Ox3", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "it looks like my dynamic task mapping in Airflow has the same run ID in marquez, so even if I am processing 100 files, there is only one version of the data. is there a way to have a separate version of each dynamic task so I can track the filename etc?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "reply_count": 17, - "reply_users_count": 2, - "latest_reply": "1694393357.711739", - "reply_users": [ - "U02S6F54MAB", - "U05NMJ0NBUK" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "e3e7a32e-a0c7-4253-9a1b-9fb29b7a1642", - "type": "message", - "text": "`map_index` should be indeed included when calculating run ID (it’s deterministic in Airflow integration)\nwhat version of Airflow are you using btw?", - "user": "U02S6F54MAB", - "ts": "1693918497.791789", - "blocks": [ - { - "type": "rich_text", - "block_id": "eoO", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "map_index", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " should be indeed included when calculating run ID (it’s deterministic in Airflow integration)\nwhat version of Airflow are you using btw?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "744dc85b-d6b8-4746-8f91-67321a39994a", - "type": "message", - "text": "2.7.0\n\nI do see this error log in all of my dynamic tasks which might explain it:\n\n```[2023-09-05, 00:31:57 UTC] {manager.py:200} ERROR - Extractor returns non-valid metadata: None\n[2023-09-05, 00:31:57 UTC] {utils.py:401} ERROR - cannot import name 'get_operator_class' from 'airflow.providers.openlineage.utils' (/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/__init__.py)\nTraceback (most recent call last):\n File \"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/utils.py\", line 399, in wrapper\n return f(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^\n File \"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/plugins/listener.py\", line 93, in on_running\n **get_custom_facets(task_instance),\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/utils.py\", line 148, in get_custom_facets\n custom_facets[\"airflow_mappedTask\"] = AirflowMappedTaskRunFacet.from_task_instance(task_instance)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/plugins/facets.py\", line 36, in from_task_instance\n from airflow.providers.openlineage.utils import get_operator_class\nImportError: cannot import name 'get_operator_class' from 'airflow.providers.openlineage.utils' (/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/__init__.py)```", - "user": "U05NMJ0NBUK", - "ts": "1693919054.875009", - "blocks": [ - { - "type": "rich_text", - "block_id": "rlf/", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "2.7.0\n\nI do see this error log in all of my dynamic tasks which might explain it:\n\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "[2023-09-05, 00:31:57 UTC] {manager.py:200} ERROR - Extractor returns non-valid metadata: None\n[2023-09-05, 00:31:57 UTC] {utils.py:401} ERROR - cannot import name 'get_operator_class' from 'airflow.providers.openlineage.utils' (/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/__init__.py)\nTraceback (most recent call last):\n File \"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/utils.py\", line 399, in wrapper\n return f(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^\n File \"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/plugins/listener.py\", line 93, in on_running\n **get_custom_facets(task_instance),\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/utils.py\", line 148, in get_custom_facets\n custom_facets[\"airflow_mappedTask\"] = AirflowMappedTaskRunFacet.from_task_instance(task_instance)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/plugins/facets.py\", line 36, in from_task_instance\n from airflow.providers.openlineage.utils import get_operator_class\nImportError: cannot import name 'get_operator_class' from 'airflow.providers.openlineage.utils' (/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/__init__.py)" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "8606bfed-db0d-45ad-b6ac-1a399568d656", - "type": "message", - "text": "I only have a few custom operators with the on_complete facet so I think this is a built in one - it runs before my task custom logs for example", - "user": "U05NMJ0NBUK", - "ts": "1693919134.573409", - "blocks": [ - { - "type": "rich_text", - "block_id": "NJh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I only have a few custom operators with the on_complete facet so I think this is a built in one - it runs before my task custom logs for example" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "5a721b5e-ac98-45ce-b0b8-79bc03906086", - "type": "message", - "text": "and any time I messed up my custom facet, the error would be at the bottom of the logs. this is on top, probably an on_start facet?", - "user": "U05NMJ0NBUK", - "ts": "1693919165.643139", - "blocks": [ - { - "type": "rich_text", - "block_id": "xZGQ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "and any time I messed up my custom facet, the error would be at the bottom of the logs. this is on top, probably an on_start facet?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "6b30cc6d-d096-4c95-be57-856255f2bc11", - "type": "message", - "text": "seems like some circular import", - "user": "U02S6F54MAB", - "ts": "1693919792.748089", - "blocks": [ - { - "type": "rich_text", - "block_id": "PpJ6", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "seems like some circular import" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "81770407-4d16-4f26-a14f-6113f6328317", - "type": "message", - "text": "I just tested it manually, it’s a bug in OL provider. let me fix that", - "user": "U02S6F54MAB", - "ts": "1693919987.802519", - "blocks": [ - { - "type": "rich_text", - "block_id": "JZE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I just tested it manually, it’s a bug in OL provider. let me fix that" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "e5c44698-d548-4e9b-bc54-bfed500771f4", - "type": "message", - "text": "cool, thanks. I am glad it is just a bug, I was afraid dynamic tasks were not supported for a minute there", - "user": "U05NMJ0NBUK", - "ts": "1693925608.644659", - "blocks": [ - { - "type": "rich_text", - "block_id": "AbK/7", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "cool, thanks. I am glad it is just a bug, I was afraid dynamic tasks were not supported for a minute there" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "778e62b6-2296-4d74-a8a3-110dc66f8725", - "type": "message", - "text": "how do the provider updates work? they can be released in between Airflow releases and issues for them are raised on the main Airflow repo?", - "user": "U05NMJ0NBUK", - "ts": "1694101580.250669", - "blocks": [ - { - "type": "rich_text", - "block_id": "Yh7yu", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "how do the provider updates work? they can be released in between Airflow releases and issues for them are raised on the main Airflow repo?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "4da90885-da4c-4e50-aab2-b2c4b524c000", - "type": "message", - "text": "generally speaking anything related to OL-Airflow should be placed to Airflow repo, important changes/bug fixes would be implemented in OL repo as well", - "user": "U02S6F54MAB", - "ts": "1694101807.839469", - "blocks": [ - { - "type": "rich_text", - "block_id": "BC2mn", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "generally speaking anything related to OL-Airflow should be placed to Airflow repo, important changes/bug fixes would be implemented in OL repo as well" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "e9894bbb-1fd0-4415-b0da-8032c4d9de4f", - "type": "message", - "text": "got it, thanks", - "user": "U05NMJ0NBUK", - "ts": "1694115631.653209", - "blocks": [ - { - "type": "rich_text", - "block_id": "F4swH", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "got it, thanks" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "0e789d74-721b-4a4e-bd78-923bdb6c1a35", - "type": "message", - "text": "is there a way for me to install the openlineage provider based on the commit you made to fix the circular imports?\n\ni was going to try to install from Airflow main branch but didnt want to mess anything up", - "user": "U05NMJ0NBUK", - "ts": "1694130226.955559", - "blocks": [ - { - "type": "rich_text", - "block_id": "B0iPX", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "is there a way for me to install the openlineage provider based on the commit you made to fix the circular imports?\n\ni was going to try to install from Airflow main branch but didnt want to mess anything up" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "db1fce8f-bc50-4191-81d0-8600b7f0047c", - "type": "message", - "text": "I saw it was merged to airflow main but it is not in 2.7.1 and there is no 1.0.3 provider version yet, so I wondered if I could manually install it for the time being", - "user": "U05NMJ0NBUK", - "ts": "1694130279.428999", - "blocks": [ - { - "type": "rich_text", - "block_id": "DkrRa", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I saw it was merged to airflow main but it is not in 2.7.1 and there is no 1.0.3 provider version yet, so I wondered if I could manually install it for the time being" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "15b65d6f-0c6f-4664-8ae7-849d70576c4b", - "type": "message", - "text": "\nbuilding the provider package on your own could be best idea probably? that depends on how you manage your Airflow instance", - "user": "U02S6F54MAB", - "ts": "1694166348.933729", - "blocks": [ - { - "type": "rich_text", - "block_id": "LBF0r", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/apache/airflow/blob/main/BREEZE.rst#preparing-provider-packages" - }, - { - "type": "text", - "text": "\nbuilding the provider package on your own could be best idea probably? that depends on how you manage your Airflow instance" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "cf630c89-866d-4825-9b44-233ee9dba474", - "type": "message", - "text": "there's 1.1.0rc1 btw", - "user": "U02S6F54MAB", - "ts": "1694188913.801429", - "blocks": [ - { - "type": "rich_text", - "block_id": "1QZeR", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "there's 1.1.0rc1 btw" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "f2941a89-0ff4-475d-a429-05c9d1ee2958", - "type": "message", - "text": "perfect, thanks. I got started with breeze but then stopped haha", - "user": "U05NMJ0NBUK", - "ts": "1694195084.901899", - "blocks": [ - { - "type": "rich_text", - "block_id": "BgOk/", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "perfect, thanks. I got started with breeze but then stopped haha" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK", - "reactions": [ - { - "name": "+1", - "users": [ - "U02S6F54MAB" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "76841536-72fe-4547-aeb2-01e32b929bcd", - "type": "message", - "text": "The dynamic task mapping error is gone, I did run into this:\n\nFile \"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/extractors/base.py\", line 70, in disabled_operators\n operator.strip() for operator in conf.get(\"openlineage\", \"disabled_for_operators\").split(\";\")\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/airflow/.local/lib/python3.11/site-packages/airflow/configuration.py\", line 1065, in get\n raise AirflowConfigException(f\"section/key [{section}/{key}] not found in config\")\n\nI am redeploying now with that option added to my config. I guess it did not use the default which should be \"\"", - "user": "U05NMJ0NBUK", - "ts": "1694392140.545519", - "blocks": [ - { - "type": "rich_text", - "block_id": "3TrVA", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "The dynamic task mapping error is gone, I did run into this:\n\nFile \"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/extractors/base.py\", line 70, in disabled_operators\n operator.strip() for operator in conf.get(\"openlineage\", \"disabled_for_operators\").split(\";\")\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/home/airflow/.local/lib/python3.11/site-packages/airflow/configuration.py\", line 1065, in get\n raise AirflowConfigException(f\"section/key [{section}/{key}] not found in config\")\n\nI am redeploying now with that option added to my config. I guess it did not use the default which should be \"\"" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "71d6750a-2e77-4315-be0f-3faff3a6ebaa", - "type": "message", - "text": "added \"disabled_for_operators\" to my openlineage config and it worked (using Airflow helm chart - not sure if that means there is an error because the value I provided should just be the default value, not sure why I needed to explicitly specify it)\n\n openlineage:\n disabled_for_operators: \"\"\n ...\n\n\nthis is so much better and makes a lot more sense. most of my tasks are dynamic so I was missing a lot of metadata before the fix, thanks!", - "user": "U05NMJ0NBUK", - "ts": "1694393357.711739", - "blocks": [ - { - "type": "rich_text", - "block_id": "2+P6O", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "added \"disabled_for_operators\" to my openlineage config and it worked (using Airflow helm chart - not sure if that means there is an error because the value I provided should just be the default value, not sure why I needed to explicitly specify it)\n\n openlineage:\n disabled_for_operators: \"\"\n ...\n\n\nthis is so much better and makes a lot more sense. most of my tasks are dynamic so I was missing a lot of metadata before the fix, thanks!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693877705.781699", - "parent_user_id": "U05NMJ0NBUK" - } - ] - }, - { - "client_msg_id": "f45db95f-7a2c-4f68-8cd9-e0d54d599cfb", - "type": "message", - "text": "Also, another small clarification is that when using `MergeIntoCommand`, I'm receiving the lineage events on the backend, but I cannot seem to find any logging of the payload when I enable debug mode in openlineage. I remember there was a similar issue reported by another user in the past. May I check if it might be possible to help with this? It's making debugging quite hard for these cases. Thanks!", - "user": "U04EZ2LPDV4", - "ts": "1693823945.734419", - "blocks": [ - { - "type": "rich_text", - "block_id": "7hi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Also, another small clarification is that when using " - }, - { - "type": "text", - "text": "MergeIntoCommand", - "style": { - "code": true - } - }, - { - "type": "text", - "text": ", I'm receiving the lineage events on the backend, but I cannot seem to find any logging of the payload when I enable debug mode in openlineage. I remember there was a similar issue reported by another user in the past. May I check if it might be possible to help with this? It's making debugging quite hard for these cases. Thanks!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693823945.734419", - "reply_count": 8, - "reply_users_count": 3, - "latest_reply": "1693907271.993249", - "reply_users": [ - "U01RA9B5GG2", - "U04EZ2LPDV4", - "U02MK6YNAQ5" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "756996c8-a47e-4a4f-8b84-3915be1e9467", - "type": "message", - "text": "I think it only depends on log4j configuration", - "user": "U01RA9B5GG2", - "ts": "1693824852.378539", - "blocks": [ - { - "type": "rich_text", - "block_id": "eP1pE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I think it only depends on log4j configuration" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U01RA9B5GG2", - "ts": "1693824855.000000" - }, - "thread_ts": "1693823945.734419", - "parent_user_id": "U04EZ2LPDV4" - }, - { - "client_msg_id": "0eda9409-a5c3-4e10-b654-55bd5257dc70", - "type": "message", - "text": "```# Set everything to be logged to the console\nlog4j.rootCategory=INFO, console\nlog4j.appender.console=org.apache.log4j.ConsoleAppender\nlog4j.appender.console.target=System.err\nlog4j.appender.console.layout=org.apache.log4j.PatternLayout\nlog4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n\n\n# set the log level for the openlineage spark library\nlog4j.logger.io.openlineage.spark=DEBUG```\nthis is what we have in `log4j.properties` in test environment and it works", - "user": "U01RA9B5GG2", - "ts": "1693825035.249249", - "blocks": [ - { - "type": "rich_text", - "block_id": "Er3", - "elements": [ - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "# Set everything to be logged to the console\nlog4j.rootCategory=INFO, console\nlog4j.appender.console=org.apache.log4j.ConsoleAppender\nlog4j.appender.console.target=System.err\nlog4j.appender.console.layout=org.apache.log4j.PatternLayout\nlog4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n\n\n# set the log level for the openlineage spark library\nlog4j.logger.io.openlineage.spark=DEBUG" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "this is what we have in " - }, - { - "type": "text", - "text": "log4j.properties", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " in test environment and it works" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693823945.734419", - "parent_user_id": "U04EZ2LPDV4" - }, - { - "client_msg_id": "6682df1c-6ce5-47f2-bde2-547fe609b22d", - "type": "message", - "text": "Hmm... I can see the logs for the other commands, like createViewCommand etc. I just cannot see it for any of the delta runs", - "user": "U04EZ2LPDV4", - "ts": "1693841291.922409", - "blocks": [ - { - "type": "rich_text", - "block_id": "fPkW2", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hmm... I can see the logs for the other commands, like createViewCommand etc. I just cannot see it for any of the delta runs" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693823945.734419", - "parent_user_id": "U04EZ2LPDV4" - }, - { - "client_msg_id": "322c8ef9-6f20-4555-9a85-ae592f74d230", - "type": "message", - "text": "that's interesting. So, logging is done here: and this code is unaware of delta.\n\nThe possible problem could be filtering delta events (which we do bcz of delta being noisy)", - "user": "U02MK6YNAQ5", - "ts": "1693899183.161579", - "blocks": [ - { - "type": "rich_text", - "block_id": "iMhU", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "that's interesting. So, logging is done here: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/main/java/io/openlineage/spark/agent/EventEmitter.java#L63" - }, - { - "type": "text", - "text": " and this code is unaware of delta.\n\nThe possible problem could be filtering delta events (which we do bcz of delta being noisy)" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/main/java/io/openlineage/spark/agent/EventEmitter.java#L63", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\n log.debug(\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1693823945.734419", - "parent_user_id": "U04EZ2LPDV4" - }, - { - "client_msg_id": "b9bc6be7-8245-46fc-9dc6-28d544ce0448", - "type": "message", - "text": "Recently, we've closed that which prevents generating events for `\n```createOrReplaceTempView```\n", - "user": "U02MK6YNAQ5", - "ts": "1693899216.117129", - "blocks": [ - { - "type": "rich_text", - "block_id": "hA5c", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Recently, we've closed that " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/1982" - }, - { - "type": "text", - "text": " which prevents generating events for `\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "createOrReplaceTempView" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1689841378, - "color": "cb2431", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/issues/1982", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#1982 [Spark] CreateViewCommand", - "text": "`CreateViewCommand` does not contain is a spark action triggered with:\n\n```\nwords.createOrReplaceTempView('words')\n```\n\nIt should be filtered in order not to generate Openlineag events, as they do have no inputs or outputs.", - "title": "#1982 [Spark] CreateViewCommand", - "title_link": "https://github.com/OpenLineage/OpenLineage/issues/1982", - "footer": "", - "fields": [ - { - "value": "", - "title": "Assignees", - "short": true - }, - { - "value": "integration/spark", - "title": "Labels", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1693823945.734419", - "parent_user_id": "U04EZ2LPDV4" - }, - { - "client_msg_id": "6dfd3f2b-c88f-4b48-aaac-99351d7dddc2", - "type": "message", - "text": "and this is the code change: ", - "user": "U02MK6YNAQ5", - "ts": "1693899312.200049", - "blocks": [ - { - "type": "rich_text", - "block_id": "Mzx+", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "and this is the code change: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/1987/files" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693823945.734419", - "parent_user_id": "U04EZ2LPDV4" - }, - { - "client_msg_id": "6ec9440f-ecb7-4302-adc0-bf65bd939af4", - "type": "message", - "text": "Hmm I'm a little confused here. I thought we are only filtering out events for certain specific commands, like show table etc. because its noisy right? Some important commands like MergeInto or SaveIntoDataSource used to be logged before, but I notice now that its not being logged anymore...\nI'm using 0.23.0 openlineage version.", - "user": "U04EZ2LPDV4", - "ts": "1693905562.492299", - "blocks": [ - { - "type": "rich_text", - "block_id": "NQ9", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hmm I'm a little confused here. I thought we are only filtering out events for certain specific commands, like show table etc. because its noisy right? Some important commands like MergeInto or SaveIntoDataSource used to be logged before, but I notice now that its not being logged anymore...\nI'm using 0.23.0 openlineage version." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693823945.734419", - "parent_user_id": "U04EZ2LPDV4" - }, - { - "client_msg_id": "fe882ea2-5856-4922-b641-08988e4d42f4", - "type": "message", - "text": "yes, we do. it's just sometimes when doing a filter, we can remove too much. but SaveIntoDataSource and MergeInto should be fine, as we do check them within the tests", - "user": "U02MK6YNAQ5", - "ts": "1693907271.993249", - "blocks": [ - { - "type": "rich_text", - "block_id": "6Ihi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yes, we do. it's just sometimes when doing a filter, we can remove too much. but SaveIntoDataSource and MergeInto should be fine, as we do check them within the tests" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693823945.734419", - "parent_user_id": "U04EZ2LPDV4" - } - ] - }, - { - "client_msg_id": "a9c658ee-20e7-4918-b081-db806e488787", - "type": "message", - "text": "Hi guys, I'd like to capture the `spark.databricks.clusterUsageTags.clusterAllTags` property from databricks. However, the value of this is a list of keys, and therefore cannot be supported by custom environment facet builder.\nI was thinking that capturing this property might be useful for most databricks workloads, and whether it might make sense to auto-capture it along with other databricks variables, similar to how we capture mount points for the databricks jobs.\nDoes this sound okay? If so, then I can help to contribute this functionality", - "user": "U04EZ2LPDV4", - "ts": "1693813108.356499", - "blocks": [ - { - "type": "rich_text", - "block_id": "LHT/", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi guys, I'd like to capture the " - }, - { - "type": "text", - "text": "spark.databricks.clusterUsageTags.clusterAllTags", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " property from databricks. However, the value of this is a list of keys, and therefore cannot be supported by custom environment facet builder.\nI was thinking that capturing this property might be useful for most databricks workloads, and whether it might make sense to auto-capture it along with other databricks variables, similar to how we capture mount points for the databricks jobs.\nDoes this sound okay? If so, then I can help to contribute this functionality" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U04EZ2LPDV4", - "ts": "1693823112.000000" - }, - "thread_ts": "1693813108.356499", - "reply_count": 2, - "reply_users_count": 2, - "latest_reply": "1694423703.899839", - "reply_users": [ - "U01RA9B5GG2", - "U04EZ2LPDV4" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "c8fdc5de-b404-4811-82b2-9cd25b2196e1", - "type": "message", - "text": "Sounds good to me", - "user": "U01RA9B5GG2", - "ts": "1693824227.481319", - "blocks": [ - { - "type": "rich_text", - "block_id": "ullD6", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Sounds good to me" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693813108.356499", - "parent_user_id": "U04EZ2LPDV4" - }, - { - "client_msg_id": "84d28e31-97b0-42b7-9f83-d855803620e9", - "type": "message", - "text": "Added this here: ", - "user": "U04EZ2LPDV4", - "ts": "1694423703.899839", - "blocks": [ - { - "type": "rich_text", - "block_id": "40oFa", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Added this here: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2099" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1694423544, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/pull/2099", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2099 Capture clusterAllTags variable from databricks", - "text": "*Problem*\n\nAuto-collect spark.databricks.clusterUsageTags.clusterAllTags environment variable from databricks\n\nCloses: \n\n*Solution*\n\nPlease describe your change as it relates to the problem, or bug fix, as well as any dependencies. If your change requires a schema change, please describe the schema modification(s) and whether it's a _backwards-incompatible_ or _backwards-compatible_ change, then select one of the following:\n\n> *Note:* All schema changes require discussion. Please for context.\n\n☐ Your change modifies the OpenLineage model\n☐ Your change modifies one or more OpenLineage \n\nIf you're contributing a new integration, please specify the scope of the integration and how/where it has been tested (e.g., Apache Spark integration supports `S3` and `GCS` filesystem operations, tested with AWS EMR).\n\n*One-line summary:*\n*Checklist*\n\n☑︎ You've your work\n☐ Your pull request title follows our \n☐ Your changes are accompanied by tests (_if relevant_)\n☐ Your change contains a and is self-contained\n☐ You've updated any relevant documentation (_if relevant_)\n☐ Your comment includes a one-liner for the changelog about the specific purpose of the change (_if necessary_)\n☐ You've versioned the core OpenLineage model or facets according to (_if relevant_)\n☐ You've added a to source files (_if relevant_)\n\n* * *\n\nSPDX-License-Identifier: Apache-2.0 \nCopyright 2018-2023 contributors to the OpenLineage project", - "title": "#2099 Capture clusterAllTags variable from databricks", - "title_link": "https://github.com/OpenLineage/OpenLineage/pull/2099", - "footer": "", - "fields": [ - { - "value": "integration/spark", - "title": "Labels", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1693813108.356499", - "parent_user_id": "U04EZ2LPDV4" - } - ] - }, - { - "client_msg_id": "c7d465d5-0fda-4602-9b45-f8f499222a61", - "type": "message", - "text": "\nThe is out now! Please to get it directly in your inbox each month.", - "user": "U02LXF3HUN7", - "ts": "1693602981.025489", - "blocks": [ - { - "type": "rich_text", - "block_id": "WNVO", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nThe " - }, - { - "type": "link", - "url": "https://mailchi.mp/ba1d4031cbe0/openlineage-news-july-9586289?e=ef0563a7f8", - "text": "latest issue of OpenLineage News" - }, - { - "type": "text", - "text": " is out now! Please " - }, - { - "type": "link", - "url": "http://bit.ly/OL_news", - "text": "subscribe" - }, - { - "type": "text", - "text": " to get it directly in your inbox each month." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "http://bit.ly/OL_news", - "id": 1, - "original_url": "http://bit.ly/OL_news", - "fallback": "OpenLineage Project", - "text": "OpenLineage Project Email Forms", - "title": "OpenLineage Project", - "title_link": "http://bit.ly/OL_news", - "service_name": "apache.us14.list-manage.com" - } - ], - "reactions": [ - { - "name": "raised_hands", - "users": [ - "U02S6F54MAB", - "U01RA9B5GG2" - ], - "count": 2 - }, - { - "name": "raised_hands::skin-tone-3", - "users": [ - "U05HFGKEYVB" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "e74d588f-4242-44ea-b401-addf7e46d3f2", - "type": "message", - "text": "It sounds like there have been a few announcements at Google Next:\n\n", - "user": "U01DCLP0GU9", - "ts": "1693519820.292119", - "blocks": [ - { - "type": "rich_text", - "block_id": "j3xY", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "It sounds like there have been a few announcements at Google Next:\n" - }, - { - "type": "link", - "url": "https://cloud.google.com/data-catalog/docs/how-to/open-lineage" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "link", - "url": "https://cloud.google.com/dataproc/docs/guides/lineage" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "image_url": "https://cloud.google.com/_static/cloud/images/social-icon-google-cloud-1200-630.png", - "image_width": 1200, - "image_height": 630, - "image_bytes": 17405, - "from_url": "https://cloud.google.com/data-catalog/docs/how-to/open-lineage", - "service_icon": "https://www.gstatic.com/devrel-devsite/prod/vbad4fd6eb290ad214822e7a397f826be8dbcc36ca2a922ba48f41fb14286829c/cloud/images/favicons/onecloud/super_cloud.png", - "id": 1, - "original_url": "https://cloud.google.com/data-catalog/docs/how-to/open-lineage", - "fallback": "Google Cloud: Integrate with OpenLineage  |  Data Catalog Documentation  |  Google Cloud", - "title": "Integrate with OpenLineage  |  Data Catalog Documentation  |  Google Cloud", - "title_link": "https://cloud.google.com/data-catalog/docs/how-to/open-lineage", - "service_name": "Google Cloud" - }, - { - "image_url": "https://cloud.google.com/_static/cloud/images/social-icon-google-cloud-1200-630.png", - "image_width": 1200, - "image_height": 630, - "image_bytes": 17405, - "from_url": "https://cloud.google.com/dataproc/docs/guides/lineage", - "service_icon": "https://www.gstatic.com/devrel-devsite/prod/vbad4fd6eb290ad214822e7a397f826be8dbcc36ca2a922ba48f41fb14286829c/cloud/images/favicons/onecloud/super_cloud.png", - "id": 2, - "original_url": "https://cloud.google.com/dataproc/docs/guides/lineage", - "fallback": "Google Cloud: Use data lineage in Dataproc  |  Dataproc Documentation  |  Google Cloud", - "title": "Use data lineage in Dataproc  |  Dataproc Documentation  |  Google Cloud", - "title_link": "https://cloud.google.com/dataproc/docs/guides/lineage", - "service_name": "Google Cloud" - } - ], - "thread_ts": "1693519820.292119", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1693624195.600379", - "reply_users": [ - "U01DCLP0GU9" - ], - "is_locked": false, - "subscribed": false, - "reactions": [ - { - "name": "tada", - "users": [ - "U01HNKK4XAM", - "U01DCMDFHBK", - "U05Q3HT6PBR", - "U05KKM07PJP", - "U01RA9B5GG2", - "U02MK6YNAQ5", - "U0323HG8C8H", - "U053LLVTHRN", - "U02LXF3HUN7", - "U02S6F54MAB", - "U05EN2CKBS8", - "U01DPTNCGU8", - "U05QTSH1UQP" - ], - "count": 13 - }, - { - "name": "raised_hands", - "users": [ - "U01HNKK4XAM", - "U01DCMDFHBK", - "U01HVNU6A4C", - "U05KKM07PJP", - "U01RA9B5GG2", - "U02MK6YNAQ5", - "U03DEPL699B", - "U0323HG8C8H", - "U053LLVTHRN", - "U02LXF3HUN7" - ], - "count": 10 - }, - { - "name": "heart", - "users": [ - "U01DCMDFHBK", - "U01RA9B5GG2", - "U05NMJ0NBUK", - "U053LLVTHRN", - "U02LXF3HUN7" - ], - "count": 5 - } - ], - "replies": [ - { - "client_msg_id": "1aaaa7b5-fbb8-4705-8602-2d6080b059b5", - "type": "message", - "text": "", - "user": "U01DCLP0GU9", - "ts": "1693624195.600379", - "blocks": [ - { - "type": "rich_text", - "block_id": "xTV", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://www.youtube.com/watch?v=zvCdrNJsxBo&t=2260s" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://www.youtube.com/watch?v=zvCdrNJsxBo&t=2260s", - "service_icon": "https://a.slack-edge.com/80588/img/unfurl_icons/youtube.png", - "thumb_url": "https://i.ytimg.com/vi/zvCdrNJsxBo/hqdefault.jpg", - "thumb_width": 480, - "thumb_height": 360, - "video_html": "", - "video_html_width": 400, - "video_html_height": 225, - "id": 1, - "original_url": "https://www.youtube.com/watch?v=zvCdrNJsxBo&t=2260s", - "fallback": "YouTube Video: What’s new in data governance", - "title": "What’s new in data governance", - "title_link": "https://www.youtube.com/watch?v=zvCdrNJsxBo&t=2260s", - "author_name": "Google Cloud", - "author_link": "https://www.youtube.com/@googlecloud", - "service_name": "YouTube", - "service_url": "https://www.youtube.com/" - } - ], - "thread_ts": "1693519820.292119", - "parent_user_id": "U01DCLP0GU9" - } - ] - }, - { - "client_msg_id": "a684bc56-75b6-4439-8234-42364c3c0c01", - "type": "message", - "text": "Will the August meeting be put up at soon? (usually it’s up in a few days :slightly_smiling_face:", - "user": "U0323HG8C8H", - "ts": "1693510399.153829", - "blocks": [ - { - "type": "rich_text", - "block_id": "s/M", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Will the August meeting be put up at " - }, - { - "type": "link", - "url": "https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting" - }, - { - "type": "text", - "text": " soon? (usually it’s up in a few days " - }, - { - "type": "emoji", - "name": "slightly_smiling_face", - "unicode": "1f642" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693510399.153829", - "reply_count": 2, - "reply_users_count": 2, - "latest_reply": "1693602812.536779", - "reply_users": [ - "U01RA9B5GG2", - "U02LXF3HUN7" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1693602812.536779", - "replies": [ - { - "client_msg_id": "9b157e0c-cf3b-40ba-9d95-1e3464256e93", - "type": "message", - "text": "<@U02LXF3HUN7>", - "user": "U01RA9B5GG2", - "ts": "1693562453.099679", - "blocks": [ - { - "type": "rich_text", - "block_id": "hxiDL", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02LXF3HUN7" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693510399.153829", - "parent_user_id": "U0323HG8C8H" - }, - { - "client_msg_id": "df0a8bef-5b99-4248-a47b-e52fd2a18189", - "type": "message", - "text": "The recording is on the youtube channel . I’ll update the wiki ASAP", - "user": "U02LXF3HUN7", - "ts": "1693602812.536779", - "blocks": [ - { - "type": "rich_text", - "block_id": "pKLvP", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "The recording is on the youtube channel " - }, - { - "type": "link", - "url": "https://youtu.be/0Q5dWHvIDLo", - "text": "here" - }, - { - "type": "text", - "text": ". I’ll update the wiki ASAP" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://youtu.be/0Q5dWHvIDLo", - "service_icon": "https://a.slack-edge.com/80588/img/unfurl_icons/youtube.png", - "thumb_url": "https://i.ytimg.com/vi/0Q5dWHvIDLo/hqdefault.jpg", - "thumb_width": 480, - "thumb_height": 360, - "video_html": "", - "video_html_width": 400, - "video_html_height": 225, - "id": 1, - "original_url": "https://youtu.be/0Q5dWHvIDLo", - "fallback": "YouTube Video: OpenLineage Community Meeting | August 10, 2023", - "title": "OpenLineage Community Meeting | August 10, 2023", - "title_link": "https://youtu.be/0Q5dWHvIDLo", - "author_name": "OpenLineage Project", - "author_link": "https://www.youtube.com/@openlineageproject6897", - "service_name": "YouTube", - "service_url": "https://www.youtube.com/" - } - ], - "thread_ts": "1693510399.153829", - "parent_user_id": "U0323HG8C8H", - "reactions": [ - { - "name": "white_check_mark", - "users": [ - "U0323HG8C8H" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "74bd2622-054d-4162-b75f-1fec491ca542", - "type": "message", - "text": "ok,it`s my first use thie lineage tool. first,I added dependences in my pom.xml like this:\n<dependency>\n <groupId>io.openlineage</groupId>\n <artifactId>openlineage-java</artifactId>\n <version>0.12.0</version>\n </dependency>\n <dependency>\n <groupId>org.apache.logging.log4j</groupId>\n <artifactId>log4j-api</artifactId>\n <version>2.7</version>\n </dependency>\n <dependency>\n <groupId>org.apache.logging.log4j</groupId>\n <artifactId>log4j-core</artifactId>\n <version>2.7</version>\n </dependency>\n <dependency>\n <groupId>org.apache.logging.log4j</groupId>\n <artifactId>log4j-slf4j-impl</artifactId>\n <version>2.7</version>\n </dependency>\n <dependency>\n <groupId>io.openlineage</groupId>\n <artifactId>openlineage-spark</artifactId>\n <version>0.30.1</version>\n </dependency>\n\nmy spark version is 3.3.1 and the version can not change\n\nsecond, in file Openlineage/intergration/spark I enter command : docker-compose up and follow the steps in this doc:\n\nthere is no erro when i use notebook to execute pyspark for openlineage and I could get json message.\nbut after I enter \"docker-compose up\" ,I want to use my Idea tool to execute scala code like above,the erro happend like above. It seems that I does not configure the environment correctly. so how can i fix the problem .", - "user": "U05NGJ8AM8X", - "ts": "1693468312.450209", - "blocks": [ - { - "type": "rich_text", - "block_id": "5TpvU", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "ok,it`s my first use thie lineage tool. first,I added dependences in my pom.xml like this:\n\n io.openlineage\n openlineage-java\n 0.12.0\n \n \n org.apache.logging.log4j\n log4j-api\n 2.7\n \n \n org.apache.logging.log4j\n log4j-core\n 2.7\n \n \n org.apache.logging.log4j\n log4j-slf4j-impl\n 2.7\n \n \n io.openlineage\n openlineage-spark\n 0.30.1\n \n\nmy spark version is 3.3.1 and the version can not change\n\nsecond, in file Openlineage/intergration/spark I enter command : docker-compose up and follow the steps in this doc:\n" - }, - { - "type": "link", - "url": "https://openlineage.io/docs/integrations/spark/quickstart_local" - }, - { - "type": "text", - "text": "\nthere is no erro when i use notebook to execute pyspark for openlineage and I could get json message.\nbut after I enter \"docker-compose up\" ,I want to use my Idea tool to execute scala code like above,the erro happend like above. It seems that I does not configure the environment correctly. so how can i fix the problem ." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.io/docs/integrations/spark/quickstart_local", - "service_icon": "https://openlineage.io/img/favicon.ico", - "id": 1, - "original_url": "https://openlineage.io/docs/integrations/spark/quickstart_local", - "fallback": "Quickstart with Jupyter | OpenLineage", - "text": "Trying out the Spark integration is super easy if you already have Docker Desktop and git installed.", - "title": "Quickstart with Jupyter | OpenLineage", - "title_link": "https://openlineage.io/docs/integrations/spark/quickstart_local", - "service_name": "openlineage.io" - } - ], - "thread_ts": "1693468312.450209", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1693559728.904199", - "reply_users": [ - "U02MK6YNAQ5" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "40ce1b5e-6c22-4196-8167-fc30863316ca", - "type": "message", - "text": "please use latest `io.openlineage:openlineage-spark:1.1.0` instead. `openlineage-java` is already contained in the jar, no need to add it on your own.", - "user": "U02MK6YNAQ5", - "ts": "1693559728.904199", - "blocks": [ - { - "type": "rich_text", - "block_id": "CeD7M", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "please use latest " - }, - { - "type": "text", - "text": "io.openlineage:openlineage-spark:1.1.0", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " instead. " - }, - { - "type": "text", - "text": "openlineage-java", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " is already contained in the jar, no need to add it on your own." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693468312.450209", - "parent_user_id": "U05NGJ8AM8X" - } - ] - }, - { - "client_msg_id": "cc9f0572-fc1b-4324-9e0f-d65ee769768c", - "type": "message", - "text": "hello,everyone,i can run openLineage spark code in my notebook with python,but when use my idea to execute scala code like this:\nimport org.apache.spark.internal.Logging\nimport org.apache.spark.sql.SparkSession\nimport io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml\nimport org.apache.spark.scheduler.{SparkListener, SparkListenerApplicationEnd, SparkListenerApplicationStart}\nimport sun.java2d.marlin.MarlinUtils.logInfo\nobject Test {\n def main(args: Array[String]): Unit = {\n\n val spark = SparkSession\n .builder()\n .master(\"local\")\n .appName(\"test\")\n .config(\"spark.jars.packages\",\"io.openlineage:openlineage-spark:0.12.0\")\n .config(\"spark.extraListeners\",\"io.openlineage.spark.agent.OpenLineageSparkListener\")\n .config(\"spark.openlineage.transport.type\",\"console\")\n .getOrCreate()\n\n spark.sparkContext.setLogLevel(\"INFO\")\n\n //spark.sparkContext.addSparkListener(new MySparkAppListener)\n import spark.implicits._\n val input = Seq((1, \"zs\", 2020), (2, \"ls\", 2023)).toDF(\"id\", \"name\", \"year\")\n\n input.select(\"id\", \"name\").orderBy(\"id\").show()\n\n }\n\n}\n\nthere is something wrong:\nException in thread \"spark-listener-group-shared\" java.lang.NoSuchMethodError: io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml(Ljava/io/InputStream;)Lio/openlineage/client/OpenLineageYaml;\n\tat io.openlineage.spark.agent.ArgumentParser.extractOpenlineageConfFromSparkConf(ArgumentParser.java:114)\n\tat io.openlineage.spark.agent.ArgumentParser.parse(ArgumentParser.java:78)\n\tat io.openlineage.spark.agent.OpenLineageSparkListener.initializeContextFactoryIfNotInitialized(OpenLineageSparkListener.java:277)\n\tat io.openlineage.spark.agent.OpenLineageSparkListener.onApplicationStart(OpenLineageSparkListener.java:267)\n\tat org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:55)\n\tat org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)\n\tat org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)\n\tat org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)\n\tat org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)\n\tat org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)\n\tat org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)\n\tat org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)\n\tat scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)\n\tat scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)\n\tat $apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)\n\tat org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)\n\tat org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446)\n\tat org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)\n\ni want to know how can i set idea scala environment correctly", - "user": "U05NGJ8AM8X", - "ts": "1693463508.522729", - "blocks": [ - { - "type": "rich_text", - "block_id": "OyUng", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "hello,everyone,i can run openLineage spark code in my notebook with python,but when use my idea to execute scala code like this:\nimport org.apache.spark.internal.Logging\nimport org.apache.spark.sql.SparkSession\nimport io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml\nimport org.apache.spark.scheduler.{SparkListener, SparkListenerApplicationEnd, SparkListenerApplicationStart}\nimport sun.java2d.marlin.MarlinUtils.logInfo\nobject Test {\n def main(args: Array[String]): Unit = {\n\n val spark = SparkSession\n .builder()\n .master(\"local\")\n .appName(\"test\")\n .config(\"spark.jars.packages\",\"io.openlineage:openlineage-spark:0.12.0\")\n .config(\"spark.extraListeners\",\"io.openlineage.spark.agent.OpenLineageSparkListener\")\n .config(\"spark.openlineage.transport.type\",\"console\")\n .getOrCreate()\n\n spark.sparkContext.setLogLevel(\"INFO\")\n\n //spark.sparkContext.addSparkListener(new MySparkAppListener)\n import spark.implicits._\n val input = Seq((1, \"zs\", 2020), (2, \"ls\", 2023)).toDF(\"id\", \"name\", \"year\")\n\n input.select(\"id\", \"name\").orderBy(\"id\").show()\n\n }\n\n}\n\nthere is something wrong:\nException in thread \"spark-listener-group-shared\" java.lang.NoSuchMethodError: io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml(Ljava/io/InputStream;)Lio/openlineage/client/OpenLineageYaml;\n\tat io.openlineage.spark.agent.ArgumentParser.extractOpenlineageConfFromSparkConf(ArgumentParser.java:114)\n\tat io.openlineage.spark.agent.ArgumentParser.parse(ArgumentParser.java:78)\n\tat io.openlineage.spark.agent.OpenLineageSparkListener.initializeContextFactoryIfNotInitialized(OpenLineageSparkListener.java:277)\n\tat io.openlineage.spark.agent.OpenLineageSparkListener.onApplicationStart(OpenLineageSparkListener.java:267)\n\tat org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:55)\n\tat org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)\n\tat org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)\n\tat org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)\n\tat org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)\n\tat org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)\n\tat org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)\n\tat org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)\n\tat scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)\n\tat scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)\n\tat " - }, - { - "type": "link", - "url": "http://org.apache.spark.scheduler.AsyncEventQueue.org", - "text": "org.apache.spark.scheduler.AsyncEventQueue.org" - }, - { - "type": "text", - "text": "$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)\n\tat org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)\n\tat org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446)\n\tat org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)\n\ni want to know how can i set idea scala environment correctly" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693463508.522729", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1693465121.120999", - "reply_users": [ - "U02MK6YNAQ5" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "7aa5f249-2197-4c0b-aaa8-0e2053f62c03", - "type": "message", - "text": "`io.openlineage:openlineage-spark:0.12.0` -> could you repeat the steps with newer version?", - "user": "U02MK6YNAQ5", - "ts": "1693465121.120999", - "blocks": [ - { - "type": "rich_text", - "block_id": "nz3", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "io.openlineage:openlineage-spark:0.12.0", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " -> could you repeat the steps with newer version?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693463508.522729", - "parent_user_id": "U05NGJ8AM8X" - } - ] - }, - { - "client_msg_id": "077333c6-55c2-4dd0-b529-25220920e053", - "type": "message", - "text": "Can anyone let 3 people stuck downstairs into the 7th floor?", - "user": "U05EC8WB74N", - "ts": "1693445911.744069", - "blocks": [ - { - "type": "rich_text", - "block_id": "nkv", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Can anyone let 3 people stuck downstairs into the 7th floor?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693445911.744069", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1693452321.429109", - "reply_users": [ - "U01DCMDFHBK" - ], - "is_locked": false, - "subscribed": false, - "reactions": [ - { - "name": "+1", - "users": [ - "U01DCMDFHBK" - ], - "count": 1 - } - ], - "replies": [ - { - "client_msg_id": "4634016F-F84A-465A-AF4B-4D0E2B5743DE", - "type": "message", - "text": "Sorry about that!", - "user": "U01DCMDFHBK", - "ts": "1693452321.429109", - "blocks": [ - { - "type": "rich_text", - "block_id": "n+oGh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "S" - }, - { - "type": "text", - "text": "orry about that!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693445911.744069", - "parent_user_id": "U05EC8WB74N" - } - ] - }, - { - "client_msg_id": "8550eec8-1a37-46ed-b901-92af92360a0f", - "type": "message", - "text": "\nFriendly reminder: there’s a meetup at Astronomer’s offices in SF!", - "user": "U02LXF3HUN7", - "ts": "1693410605.894959", - "blocks": [ - { - "type": "rich_text", - "block_id": "rdN5X", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nFriendly reminder: there’s a meetup " - }, - { - "type": "link", - "url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1692973763570629", - "text": "tonight" - }, - { - "type": "text", - "text": " at Astronomer’s offices in SF!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1692973763570629", - "ts": "1692973763.570629", - "author_id": "U02LXF3HUN7", - "channel_id": "C01CK9T7HKR", - "channel_team": "T01CWUYP5AR", - "is_msg_unfurl": true, - "message_blocks": [ - { - "team": "T01CWUYP5AR", - "channel": "C01CK9T7HKR", - "ts": "1692973763.570629", - "message": { - "blocks": [ - { - "type": "rich_text", - "block_id": "T3fR", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nFriendly reminder: our next in-person meetup is next Wednesday, August 30th in San Francisco at Astronomer’s offices in the Financial District. You can sign up and find the details on the " - }, - { - "type": "link", - "url": "https://www.meetup.com/meetup-group-bnfqymxe/events/295195280/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "text": "meetup event page" - }, - { - "type": "text", - "text": "." - } - ] - } - ] - } - ] - } - } - ], - "id": 1, - "original_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1692973763570629", - "fallback": "[August 25th, 2023 7:29 AM] michael282: \nFriendly reminder: our next in-person meetup is next Wednesday, August 30th in San Francisco at Astronomer’s offices in the Financial District. You can sign up and find the details on the .", - "text": "\nFriendly reminder: our next in-person meetup is next Wednesday, August 30th in San Francisco at Astronomer’s offices in the Financial District. You can sign up and find the details on the .", - "author_name": "Michael Robinson", - "author_link": "https://openlineage.slack.com/team/U02LXF3HUN7", - "author_icon": "https://avatars.slack-edge.com/2022-01-25/3019716733729_66fea720e9504dc08144_48.jpg", - "author_subname": "Michael Robinson", - "mrkdwn_in": [ - "text" - ], - "footer": "Slack Conversation" - } - ], - "thread_ts": "1693410605.894959", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1693412131.696559", - "reply_users": [ - "U01DCLP0GU9" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1693412131.696559", - "reactions": [ - { - "name": "white_check_mark", - "users": [ - "U0323HG8C8H" - ], - "count": 1 - } - ], - "replies": [ - { - "client_msg_id": "7837D15A-EFC0-49C1-B259-D9099711E94F", - "type": "message", - "text": "I’ll be there and looking forward to see <@U04AZ7992SU> ‘s presentation ", - "user": "U01DCLP0GU9", - "ts": "1693412131.696559", - "blocks": [ - { - "type": "rich_text", - "block_id": "F85", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I" - }, - { - "type": "text", - "text": "’" - }, - { - "type": "text", - "text": "ll be there and looking forward to see " - }, - { - "type": "user", - "user_id": "U04AZ7992SU" - }, - { - "type": "text", - "text": " ‘s presentation " - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693410605.894959", - "parent_user_id": "U02LXF3HUN7" - } - ] - }, - { - "client_msg_id": "f42da97f-7aa0-4298-ba23-92bf84894fb9", - "type": "message", - "text": "Hi, Will really appreciate if someone can guide me or provide me any pointer - if they have been able to implement authentication/authorization for access to Marquez. Have not seen much info around it. Any pointers greatly appreciated. Thanks in advance.", - "user": "U05JY6MN8MS", - "ts": "1693397729.978309", - "blocks": [ - { - "type": "rich_text", - "block_id": "1xS", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi, Will really appreciate if someone can guide me or provide me any pointer - if they have been able to implement authentication/authorization for access to Marquez. Have not seen much info around it. Any pointers greatly appreciated. Thanks in advance." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693397729.978309", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1693412598.034059", - "reply_users": [ - "U01DCLP0GU9" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "761A1B6F-8A7E-4F32-B083-BE3B49D990F2", - "type": "message", - "text": "I’ve seen people do this through the ingress controller in Kubernetes. Unfortunately I don’t have documentation besides k8s specific ones you would find for the ingress controller you’re using. You’d redirect any unauthenticated request to your identity provider ", - "user": "U01DCLP0GU9", - "ts": "1693412598.034059", - "blocks": [ - { - "type": "rich_text", - "block_id": "QR1c", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I’ve seen people do this through the ingress controller in Kubernetes" - }, - { - "type": "text", - "text": "." - }, - { - "type": "text", - "text": " Unfortunately I " - }, - { - "type": "text", - "text": "don’t" - }, - { - "type": "text", - "text": " have documentation besides k8s specific ones you would find for the ingress controller you’re using" - }, - { - "type": "text", - "text": "." - }, - { - "type": "text", - "text": " You’d redirect any unauthenticated request to your identity provider " - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693397729.978309", - "parent_user_id": "U05JY6MN8MS", - "reactions": [ - { - "name": "gratitude-thank-you", - "users": [ - "U05JY6MN8MS" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "b40995eb-c851-4057-b01e-55c9e96334f3", - "type": "message", - "text": "for namespaces, if my data is moving between sources (SFTP -> GCS -> Azure Blob (synapse connects to parquet datasets) then should my namespace be based on the client I am working with? my current namespace has been to refer to the bucket, but that falls apart when considering the data sources and some destinations. perhaps I should just add a field for client-name instead to have a consolidated view?", - "user": "U05NMJ0NBUK", - "ts": "1693329152.193929", - "blocks": [ - { - "type": "rich_text", - "block_id": "jb/UL", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "for namespaces, if my data is moving between sources (SFTP -> GCS -> Azure Blob (synapse connects to parquet datasets) then should my namespace be based on the client I am working with? my current namespace has been " - }, - { - "type": "link", - "url": "gcs://client-name" - }, - { - "type": "text", - "text": " to refer to the bucket, but that falls apart when considering the data sources and some destinations. perhaps I should just add a field for client-name instead to have a consolidated view?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693329152.193929", - "reply_count": 4, - "reply_users_count": 2, - "latest_reply": "1693415198.889139", - "reply_users": [ - "U01RA9B5GG2", - "U05NMJ0NBUK" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "8ca4e2bc-3db6-4e27-be54-8bd928a797d0", - "type": "message", - "text": "> then should my namespace be based on the client I am working with?\nI think each of those sources should be a different namespace?", - "user": "U01RA9B5GG2", - "ts": "1693407188.766089", - "blocks": [ - { - "type": "rich_text", - "block_id": "UTao", - "elements": [ - { - "type": "rich_text_quote", - "elements": [ - { - "type": "text", - "text": "then should my namespace be based on the client I am working with?" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "\nI think each of those sources should be a different namespace?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693329152.193929", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "ddfe6454-c9c8-40c1-813a-98f67405d6ff", - "type": "message", - "text": "got it, yeah I was kind of picturing as one namespace for the client (we handle many clients but they are completely distinct entities). I was able to get it to work with multiple namespaces like you suggested and Marquez was able to plot everything correctly in the visualization", - "user": "U05NMJ0NBUK", - "ts": "1693414793.644179", - "blocks": [ - { - "type": "rich_text", - "block_id": "UwxW", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "got it, yeah I was kind of picturing as one namespace for the client (we handle many clients but they are completely distinct entities). I was able to get it to work with multiple namespaces like you suggested and Marquez was able to plot everything correctly in the visualization" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693329152.193929", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "c3070060-9622-4b17-94fc-e119ada63562", - "type": "message", - "text": "I noticed some of my Dataset facets make more sense as Run facets, for example, the name of the specific file I processed and how many rows of data / size of the data for that schedule. that won't impact the Run facets Airflow provides right? I can still have the schedule information + my custom run facets?", - "user": "U05NMJ0NBUK", - "ts": "1693414878.749489", - "blocks": [ - { - "type": "rich_text", - "block_id": "H46+", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I noticed some of my Dataset facets make more sense as Run facets, for example, the name of the specific file I processed and how many rows of data / size of the data for that schedule. that won't impact the Run facets Airflow provides right? I can still have the schedule information + my custom run facets?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693329152.193929", - "parent_user_id": "U05NMJ0NBUK" - }, - { - "client_msg_id": "e84ab19a-7437-4fe3-9a12-27b2d7069434", - "type": "message", - "text": "Yes, unless you name it the same as one of the Airflow facets :slightly_smiling_face:", - "user": "U01RA9B5GG2", - "ts": "1693415198.889139", - "blocks": [ - { - "type": "rich_text", - "block_id": "9yt4", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Yes, unless you name it the same as one of the Airflow facets " - }, - { - "type": "emoji", - "name": "slightly_smiling_face", - "unicode": "1f642" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693329152.193929", - "parent_user_id": "U05NMJ0NBUK" - } - ] - }, - { - "client_msg_id": "1a1fe5e1-6305-4ae6-87ab-aa7e2b732990", - "type": "message", - "text": "hi folks, for now I'm producing `.jsonl` (or `.ndjson` ) files with one event per line, do you know if there's any way to validate those? would standard JSON Schema tools work?", - "user": "U05HFGKEYVB", - "ts": "1693300839.710459", - "blocks": [ - { - "type": "rich_text", - "block_id": "Od1G", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "hi folks, for now I'm producing " - }, - { - "type": "text", - "text": ".jsonl", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " (or " - }, - { - "type": "text", - "text": ".ndjson", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " ) files with one event per line, do you know if there's any way to validate those? would standard JSON Schema tools work?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693300839.710459", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1693321109.057039", - "reply_users": [ - "U05HFGKEYVB" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "e0bff8b7-7427-490d-9cb9-26dd87e2c1a5", - "type": "message", - "text": "reply by <@U0544QC1DS9>: yes :slightly_smiling_face::100:", - "user": "U05HFGKEYVB", - "ts": "1693321109.057039", - "blocks": [ - { - "type": "rich_text", - "block_id": "ofv", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "reply by " - }, - { - "type": "user", - "user_id": "U0544QC1DS9" - }, - { - "type": "text", - "text": ": yes " - }, - { - "type": "emoji", - "name": "slightly_smiling_face", - "unicode": "1f642" - }, - { - "type": "emoji", - "name": "100", - "unicode": "1f4af" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693300839.710459", - "parent_user_id": "U05HFGKEYVB", - "reactions": [ - { - "name": "+1", - "users": [ - "U01RA9B5GG2" - ], - "count": 1 - } - ] - } - ] - }, - { - "type": "message", - "text": "Hello, I'm currently in the process of following the instructions outlined in the provided getting started guide at . However, I've encountered a problem while attempting to complete **Step 1** of the guide. Unfortunately, I'm encountering an internal server error at this stage. I did manage to successfully run Marquez, but it appears that there might be an issue that needs to be addressed. I have attached screen shots.", - "files": [ - { - "id": "F05PSGC7D8E", - "created": 1693292986, - "timestamp": 1693292986, - "name": "image.png", - "title": "image.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05QNRSQW1E", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 66903, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05PSGC7D8E/image.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05PSGC7D8E/download/image.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PSGC7D8E-3fd1285666/image_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PSGC7D8E-3fd1285666/image_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PSGC7D8E-3fd1285666/image_360.png", - "thumb_360_w": 360, - "thumb_360_h": 181, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PSGC7D8E-3fd1285666/image_480.png", - "thumb_480_w": 480, - "thumb_480_h": 241, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PSGC7D8E-3fd1285666/image_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PSGC7D8E-3fd1285666/image_720.png", - "thumb_720_w": 720, - "thumb_720_h": 361, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PSGC7D8E-3fd1285666/image_800.png", - "thumb_800_w": 800, - "thumb_800_h": 401, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PSGC7D8E-3fd1285666/image_960.png", - "thumb_960_w": 960, - "thumb_960_h": 482, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PSGC7D8E-3fd1285666/image_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 514, - "original_w": 1907, - "original_h": 957, - "thumb_tiny": "AwAYADCqLiYf8tW/OlNzN/z0cfjUIopgS/aZv+er/nR9pn/56v8AnUVFAiY3Mx6yP+dN8+X/AJ6N+dR0UAAopT1pKBhRSig0AJRRRQI//9k=", - "permalink": "https://openlineage.slack.com/files/U05QNRSQW1E/F05PSGC7D8E/image.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05PSGC7D8E-683e6f3655", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - }, - { - "id": "F05PW99E3L5", - "created": 1693293378, - "timestamp": 1693293378, - "name": "image.png", - "title": "image.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05QNRSQW1E", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 50611, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05PW99E3L5/image.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05PW99E3L5/download/image.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PW99E3L5-0dd146164b/image_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PW99E3L5-0dd146164b/image_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PW99E3L5-0dd146164b/image_360.png", - "thumb_360_w": 360, - "thumb_360_h": 191, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PW99E3L5-0dd146164b/image_480.png", - "thumb_480_w": 480, - "thumb_480_h": 255, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PW99E3L5-0dd146164b/image_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PW99E3L5-0dd146164b/image_720.png", - "thumb_720_w": 720, - "thumb_720_h": 383, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PW99E3L5-0dd146164b/image_800.png", - "thumb_800_w": 800, - "thumb_800_h": 425, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PW99E3L5-0dd146164b/image_960.png", - "thumb_960_w": 960, - "thumb_960_h": 510, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PW99E3L5-0dd146164b/image_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 544, - "original_w": 1048, - "original_h": 557, - "thumb_tiny": "AwAZADClk0hOaXB9KTFaEBSUtFACUUUUAJlvT9KMn0qWmt1pWKuMyfSlB9qKKLCuGaKKKYj/2Q==", - "permalink": "https://openlineage.slack.com/files/U05QNRSQW1E/F05PW99E3L5/image.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05PW99E3L5-4393f52650", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05QNRSQW1E", - "ts": "1693293484.701439", - "blocks": [ - { - "type": "rich_text", - "block_id": "omnO", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hello, I'm currently in the process of following the instructions outlined in the provided getting started guide at " - }, - { - "type": "link", - "url": "https://openlineage.io/getting-started/" - }, - { - "type": "text", - "text": ". However, I've encountered a problem while attempting to complete **Step 1** of the guide. Unfortunately, I'm encountering an internal server error at this stage. I did manage to successfully run Marquez, but it appears that there might be an issue that needs to be addressed. I have attached screen shots." - } - ] - } - ] - } - ], - "client_msg_id": "db59bd12-2597-4c8f-85c4-a32b2603c798", - "thread_ts": "1693293484.701439", - "reply_count": 3, - "reply_users_count": 3, - "latest_reply": "1693317758.935859", - "reply_users": [ - "U02S6F54MAB", - "U05QNRSQW1E", - "U01RA9B5GG2" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "2ed33632-14e6-47f6-9cc6-d4df5594ab8c", - "type": "message", - "text": "is 5000 port taken by any other application? or `./docker/up.sh` has some errors in logs?", - "user": "U02S6F54MAB", - "ts": "1693293618.259409", - "blocks": [ - { - "type": "rich_text", - "block_id": "Tmd", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "is 5000 port taken by any other application? or " - }, - { - "type": "text", - "text": "./docker/up.sh", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " has some errors in logs?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693293484.701439", - "parent_user_id": "U05QNRSQW1E" - }, - { - "type": "message", - "text": "<@U02S6F54MAB> 5000 port is not taken by any other application. The logs show some errors but I am not sure what is the issue here.", - "files": [ - { - "id": "F05PKBM0RRV", - "created": 1693300976, - "timestamp": 1693300976, - "name": "image.png", - "title": "image.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U05QNRSQW1E", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 50149, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05PKBM0RRV/image.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05PKBM0RRV/download/image.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PKBM0RRV-3396b1753a/image_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PKBM0RRV-3396b1753a/image_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PKBM0RRV-3396b1753a/image_360.png", - "thumb_360_w": 360, - "thumb_360_h": 56, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PKBM0RRV-3396b1753a/image_480.png", - "thumb_480_w": 480, - "thumb_480_h": 75, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PKBM0RRV-3396b1753a/image_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PKBM0RRV-3396b1753a/image_720.png", - "thumb_720_w": 720, - "thumb_720_h": 113, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PKBM0RRV-3396b1753a/image_800.png", - "thumb_800_w": 800, - "thumb_800_h": 125, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PKBM0RRV-3396b1753a/image_960.png", - "thumb_960_w": 960, - "thumb_960_h": 150, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05PKBM0RRV-3396b1753a/image_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 160, - "original_w": 1598, - "original_h": 250, - "thumb_tiny": "AwAHADCgetJ+FKaSgBOfQUEE+lLS0AMxSGnGmmgD/9k=", - "permalink": "https://openlineage.slack.com/files/U05QNRSQW1E/F05PKBM0RRV/image.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05PKBM0RRV-27cc4fb62d", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U05QNRSQW1E", - "display_as_bot": false, - "ts": "1693300981.357099", - "blocks": [ - { - "type": "rich_text", - "block_id": "luX", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02S6F54MAB" - }, - { - "type": "text", - "text": " 5000 port is not taken by any other application. The logs show some errors but I am not sure what is the issue here." - } - ] - } - ] - } - ], - "client_msg_id": "3dd8369b-2ee3-4b65-bcc4-93695ee8f310", - "thread_ts": "1693293484.701439", - "parent_user_id": "U05QNRSQW1E" - }, - { - "client_msg_id": "10c815a8-7d6c-45c6-9395-322d85226ab9", - "type": "message", - "text": "I think Marquez is running on WSL while you're trying to connect from host computer?", - "user": "U01RA9B5GG2", - "ts": "1693317758.935859", - "blocks": [ - { - "type": "rich_text", - "block_id": "L0TPP", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I think Marquez is running on WSL while you're trying to connect from host computer?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693293484.701439", - "parent_user_id": "U05QNRSQW1E" - } - ] - }, - { - "client_msg_id": "1a525866-90f7-4512-8bdb-c2e01d6ce8f5", - "type": "message", - "text": "New on the OpenLineage blog: , including:\n• the critical improvements it brings to the integration\n• the high-level design\n• implementation details\n• an example operator\n• planned enhancements\n• a list of supported operators\n• more.\nThe post, by <@U01RA9B5GG2>, <@U01DCLP0GU9> and myself is live now on the OpenLineage blog.", - "user": "U02LXF3HUN7", - "ts": "1693267537.810959", - "blocks": [ - { - "type": "rich_text", - "block_id": "VOPjI", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "New on the OpenLineage blog: " - }, - { - "type": "link", - "url": "https://openlineage.io/blog/airflow-provider", - "text": "a close look at the new OpenLineage Airflow Provider" - }, - { - "type": "text", - "text": ", including:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": " the critical improvements it brings to the integration" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "the high-level design" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "implementation details" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "an example operator" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "planned enhancements" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "a list of supported operators" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "more." - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "The " - }, - { - "type": "text", - "text": "post", - "style": { - "unlink": true - } - }, - { - "type": "text", - "text": ", by " - }, - { - "type": "user", - "user_id": "U01RA9B5GG2" - }, - { - "type": "text", - "text": ", " - }, - { - "type": "user", - "user_id": "U01DCLP0GU9" - }, - { - "type": "text", - "text": " and myself is live now on the OpenLineage blog." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.io/blog/airflow-provider", - "service_icon": "https://openlineage.io/img/favicon.ico", - "ts": 1692748800, - "id": 1, - "original_url": "https://openlineage.io/blog/airflow-provider", - "fallback": "The OpenLineage Airflow Provider is Here | OpenLineage", - "text": "Built-in OpenLineage support in Airflow means big improvements in reliability, lineage output, and custom operator implementation.", - "title": "The OpenLineage Airflow Provider is Here | OpenLineage", - "title_link": "https://openlineage.io/blog/airflow-provider", - "service_name": "openlineage.io" - } - ], - "reactions": [ - { - "name": "tada", - "users": [ - "U05P973CGHZ", - "U01HNKK4XAM", - "U01RA9B5GG2", - "U0544QC1DS9", - "U01HVNU6A4C" - ], - "count": 5 - } - ] - }, - { - "client_msg_id": "3efe9a2f-372c-41e9-8a46-f1c88615977f", - "type": "message", - "text": "\nThe agenda for the on 9/18 has been updated. This promises to be an exciting, richly productive discussion. Don’t miss it if you’ll be in the area!\n1. Intros\n2. Evolution of spec presentation/discussion (project background/history)\n3. State of the community\n4. Spark/Column lineage update\n5. Airflow Provider update \n6. Roadmap Discussion\n7. Action items review/next steps\n", - "user": "U02LXF3HUN7", - "ts": "1693258111.112809", - "blocks": [ - { - "type": "rich_text", - "block_id": "pEYP", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nThe agenda for the " - }, - { - "type": "link", - "url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "text": "Toronto Meetup at Airflow Summit" - }, - { - "type": "text", - "text": " on 9/18 has been updated. This promises to be an exciting, richly productive discussion. Don’t miss it if you’ll be in the area!\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Intros" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Evolution of spec presentation/discussion (project background/history)" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "State of the community" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark/Column lineage update" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Airflow Provider update " - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Roadmap Discussion" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Action items review/next steps" - } - ] - } - ], - "style": "ordered", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "image_url": "https://secure.meetupstatic.com/photos/event/5/4/2/d/600_515181549.jpeg", - "image_width": 600, - "image_height": 338, - "image_bytes": 16248, - "service_icon": "https://secure.meetupstatic.com/next/images/general/m_swarm_120x120.png", - "id": 1, - "original_url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "fallback": "Meetup: Toronto OpenLineage Meetup at Airflow Summit, Mon, Sep 18, 2023, 2:00 PM | Meetup", - "text": "Data engineers and pipeline managers know that producing data lineage – end-to-end pipeline metadata instrumented at runtime or parsed at design time – is a heavy lift with", - "title": "Toronto OpenLineage Meetup at Airflow Summit, Mon, Sep 18, 2023, 2:00 PM | Meetup", - "title_link": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "service_name": "Meetup" - } - ], - "reactions": [ - { - "name": "heart", - "users": [ - "U0282HEEHB8", - "U02MK6YNAQ5", - "U055N2GRT4P" - ], - "count": 3 - } - ] - }, - { - "client_msg_id": "fddd2e84-0315-454b-949c-beed8a7eed48", - "type": "message", - "text": "Hi folks. I have some pure golang jobs from which I need to emit OL events to Marquez. Is the right way to go about this to generate a Golang client from the Marquez OpenAPI spec and use that client from my go jobs?", - "user": "U05PVS8GRJ6", - "ts": "1693243558.640159", - "blocks": [ - { - "type": "rich_text", - "block_id": "O4/r", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi folks. I have some pure golang jobs from which I need to emit OL events to Marquez. Is the right way to go about this to generate a Golang client from the Marquez OpenAPI spec and use that client from my go jobs?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693243558.640159", - "reply_count": 3, - "reply_users_count": 2, - "latest_reply": "1693251005.275449", - "reply_users": [ - "U02S6F54MAB", - "U05PVS8GRJ6" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "12fbc59c-cee7-48d7-96c8-42c4daa5ab3b", - "type": "message", - "text": "I'd rather generate them from OL spec (compliant with JSON Schema)", - "user": "U02S6F54MAB", - "ts": "1693247004.031009", - "blocks": [ - { - "type": "rich_text", - "block_id": "dKe", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I'd rather generate them from OL spec (compliant with JSON Schema)" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693243558.640159", - "parent_user_id": "U05PVS8GRJ6" - }, - { - "client_msg_id": "e249d204-d3b7-4f44-94de-e9efb608a17f", - "type": "message", - "text": "I'll look into this. I take you to mean that I would use the OL spec which is available as a set of JSON schemas to create the data object and then HTTP POST it using vanilla Golang. Is that correct? Thank you for your help!", - "user": "U05PVS8GRJ6", - "ts": "1693249941.395039", - "blocks": [ - { - "type": "rich_text", - "block_id": "hlNf", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I'll look into this. I take you to mean that I would use the OL spec which is available as a set of JSON schemas to create the data object and then HTTP POST it using vanilla Golang. Is that correct? Thank you for your help!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05PVS8GRJ6", - "ts": "1693249952.000000" - }, - "thread_ts": "1693243558.640159", - "parent_user_id": "U05PVS8GRJ6" - }, - { - "client_msg_id": "a2580fde-0703-4859-aad9-72cf2f4e4008", - "type": "message", - "text": "Correct! You’re also very welcome to contribute Golang client (currently we have Python & Java clients) if you manage to send events using golang :slightly_smiling_face:", - "user": "U02S6F54MAB", - "ts": "1693251005.275449", - "blocks": [ - { - "type": "rich_text", - "block_id": "8ICeX", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Correct! You’re also very welcome to contribute Golang client (currently we have Python & Java clients) if you manage to send events using golang " - }, - { - "type": "emoji", - "name": "slightly_smiling_face", - "unicode": "1f642" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693243558.640159", - "parent_user_id": "U05PVS8GRJ6", - "reactions": [ - { - "name": "clap", - "users": [ - "U05PVS8GRJ6" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "55149130-15e5-4a9d-8832-af9e9ba9541c", - "type": "message", - "text": "and something else: I understand that Marquez does not yet support the 2.0 spec, hence it's incompatible with static metadata right? I tried to emit a list of `DatasetEvent` s and got `HTTPError: 422 Client Error: Unprocessable Entity for url: ` (I'm using a `FileTransport` for now)", - "user": "U05HFGKEYVB", - "ts": "1693212561.369509", - "blocks": [ - { - "type": "rich_text", - "block_id": "SZo", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "and something else: I understand that Marquez does not yet support the 2.0 spec, hence it's incompatible with static metadata right? I tried to emit a list of " - }, - { - "type": "text", - "text": "DatasetEvent", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " s and got " - }, - { - "type": "text", - "text": "HTTPError: 422 Client Error: Unprocessable Entity for url: ", - "style": { - "code": true - } - }, - { - "type": "link", - "url": "http://localhost:3000/api/v1/lineage", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " (I'm using a " - }, - { - "type": "text", - "text": "FileTransport", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " for now)" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05HFGKEYVB", - "ts": "1693212597.000000" - }, - "thread_ts": "1693212561.369509", - "reply_count": 3, - "reply_users_count": 2, - "latest_reply": "1693217141.924759", - "reply_users": [ - "U02MK6YNAQ5", - "U05HFGKEYVB" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "61807027-58e2-425f-a65e-9af09cc49346", - "type": "message", - "text": "marquez is not capable of reflecting `DatasetEvents` in DB but it should respond with `Unsupported event type`", - "user": "U02MK6YNAQ5", - "ts": "1693216969.231179", - "blocks": [ - { - "type": "rich_text", - "block_id": "10Wp8", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "marquez is not capable of reflecting " - }, - { - "type": "text", - "text": "DatasetEvents", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " in DB but it should respond with " - }, - { - "type": "text", - "text": "Unsupported event type", - "style": { - "code": true - } - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693212561.369509", - "parent_user_id": "U05HFGKEYVB" - }, - { - "client_msg_id": "9ce1c23c-1210-4d7a-85b6-66a829fb4d6f", - "type": "message", - "text": "and return 200 instead of `201` created", - "user": "U02MK6YNAQ5", - "ts": "1693216995.122589", - "blocks": [ - { - "type": "rich_text", - "block_id": "wcHt9", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "and return 200 instead of " - }, - { - "type": "text", - "text": "201", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " created" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693212561.369509", - "parent_user_id": "U05HFGKEYVB" - }, - { - "client_msg_id": "c054a5c8-3e0f-4de9-b92f-e769fb9e024a", - "type": "message", - "text": "I'll have a deeper look then, probably I'm doing something wrong. thanks <@U02MK6YNAQ5>", - "user": "U05HFGKEYVB", - "ts": "1693217141.924759", - "blocks": [ - { - "type": "rich_text", - "block_id": "eKH7I", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I'll have a deeper look then, probably I'm doing something wrong. thanks " - }, - { - "type": "user", - "user_id": "U02MK6YNAQ5" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693212561.369509", - "parent_user_id": "U05HFGKEYVB" - } - ] - }, - { - "client_msg_id": "e403e4d5-e318-4c72-bba4-aa1fbebe40a8", - "type": "message", - "text": "hi folks, I'm looking into exporting static metadata, and found that `DatasetEvent` requires a `eventTime`, which in my mind doesn't make sense for static events. I'm setting it to `None` and the Python client seems to work, but wanted to ask if I'm missing something.", - "user": "U05HFGKEYVB", - "ts": "1693212473.810659", - "blocks": [ - { - "type": "rich_text", - "block_id": "EBPk", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "hi folks, I'm looking into exporting static metadata, and found that " - }, - { - "type": "text", - "text": "DatasetEvent", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " requires a " - }, - { - "type": "text", - "text": "eventTime", - "style": { - "code": true - } - }, - { - "type": "text", - "text": ", which in my mind doesn't make sense for static events. I'm setting it to " - }, - { - "type": "text", - "text": "None", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " and the Python client seems to work, but wanted to ask if I'm missing something." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693212473.810659", - "reply_count": 3, - "reply_users_count": 2, - "latest_reply": "1693216913.315869", - "reply_users": [ - "U02MK6YNAQ5", - "U05HFGKEYVB" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "c7d4fc24-9e83-44ee-b43a-c1befb6434e2", - "type": "message", - "text": "Although you emit `DatasetEvent`, you still emit an event and `eventTime` is a valid marker.", - "user": "U02MK6YNAQ5", - "ts": "1693216750.036829", - "blocks": [ - { - "type": "rich_text", - "block_id": "r2Ree", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Although you emit " - }, - { - "type": "text", - "text": "DatasetEvent", - "style": { - "code": true - } - }, - { - "type": "text", - "text": ", you still emit an event and " - }, - { - "type": "text", - "text": "eventTime", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " is a valid marker." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693212473.810659", - "parent_user_id": "U05HFGKEYVB" - }, - { - "client_msg_id": "d6389b59-61f0-4cb1-91fc-50a90c9d1d6c", - "type": "message", - "text": "so, should I use the current time at the moment of emitting it and that's it?", - "user": "U05HFGKEYVB", - "ts": "1693216900.409389", - "blocks": [ - { - "type": "rich_text", - "block_id": "Gr6M", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "so, should I use the current time at the moment of emitting it and that's it?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693212473.810659", - "parent_user_id": "U05HFGKEYVB" - }, - { - "client_msg_id": "d22e44aa-b4b7-43af-bcd9-11183e637c33", - "type": "message", - "text": "yes, that should be it", - "user": "U02MK6YNAQ5", - "ts": "1693216913.315869", - "blocks": [ - { - "type": "rich_text", - "block_id": "I0McS", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yes, that should be it" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1693212473.810659", - "parent_user_id": "U05HFGKEYVB", - "reactions": [ - { - "name": "gratitude-thank-you", - "users": [ - "U05HFGKEYVB" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "5a0c0a38-e883-47de-bb00-09784e9edc84", - "type": "message", - "text": "hi Openlineage team , we would like to join one of your meetups(me and <@U05HK41VCH1> nad @Phil Rolph and we're wondering if you are hosting any meetups after the 18/9 ? We are trying to join this but air - tickets are quite expensive", - "user": "U05J5GRKY10", - "ts": "1692975450.380969", - "blocks": [ - { - "type": "rich_text", - "block_id": "vhZR", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "hi Openlineage team , we would like to join one of your meetups(me and " - }, - { - "type": "user", - "user_id": "U05HK41VCH1" - }, - { - "type": "text", - "text": " nad @Phil Rolph and we're wondering if you are hosting any meetups after the 18/9 ? We are trying to join this but air - tickets are quite expensive" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1692975450.380969", - "reply_count": 4, - "reply_users_count": 3, - "latest_reply": "1692978699.479329", - "reply_users": [ - "U01HNKK4XAM", - "U05J5GRKY10", - "U02LXF3HUN7" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1692978699.479329", - "replies": [ - { - "client_msg_id": "d90db7c4-ea60-4eff-a0b3-24568c51f105", - "type": "message", - "text": "there will certainly be more meetups, don’t worry about that!", - "user": "U01HNKK4XAM", - "ts": "1692977532.949389", - "blocks": [ - { - "type": "rich_text", - "block_id": "RF3H", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "there will certainly be more meetups, don’t worry about that!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1692975450.380969", - "parent_user_id": "U05J5GRKY10" - }, - { - "client_msg_id": "47c313f7-9d2c-4b8f-9b55-7e20ee01debc", - "type": "message", - "text": "where are you located? perhaps we can try to organize a meetup closer to where you are.", - "user": "U01HNKK4XAM", - "ts": "1692977550.649789", - "blocks": [ - { - "type": "rich_text", - "block_id": "Um2m", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "where are you located? perhaps we can try to organize a meetup closer to where you are." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1692975450.380969", - "parent_user_id": "U05J5GRKY10" - }, - { - "client_msg_id": "ef556e07-2f4c-432f-a51a-a129ea8e7274", - "type": "message", - "text": "Thanks a lot for the response, we are in London. We'd be glad to help you organise a meetup and also meet in person!", - "user": "U05J5GRKY10", - "ts": "1692978577.530149", - "blocks": [ - { - "type": "rich_text", - "block_id": "UEDw", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks a lot for the response, we are in London. We'd be glad to help you organise a meetup and also meet in person!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1692975450.380969", - "parent_user_id": "U05J5GRKY10" - }, - { - "client_msg_id": "a8f02080-dd7e-4a05-a624-02656ebea128", - "type": "message", - "text": "This is awesome, thanks <@U05J5GRKY10>. I’ll start a channel and invite you", - "user": "U02LXF3HUN7", - "ts": "1692978699.479329", - "blocks": [ - { - "type": "rich_text", - "block_id": "MiX", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "This is awesome, thanks " - }, - { - "type": "user", - "user_id": "U05J5GRKY10" - }, - { - "type": "text", - "text": ". I’ll start a channel and invite you" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1692975450.380969", - "parent_user_id": "U05J5GRKY10" - } - ] - }, - { - "client_msg_id": "597a7abe-590e-4827-accc-276859f36a52", - "type": "message", - "text": "\nFriendly reminder: our next in-person meetup is next Wednesday, August 30th in San Francisco at Astronomer’s offices in the Financial District. You can sign up and find the details on the .", - "user": "U02LXF3HUN7", - "ts": "1692973763.570629", - "blocks": [ - { - "type": "rich_text", - "block_id": "T3fR", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nFriendly reminder: our next in-person meetup is next Wednesday, August 30th in San Francisco at Astronomer’s offices in the Financial District. You can sign up and find the details on the " - }, - { - "type": "link", - "url": "https://www.meetup.com/meetup-group-bnfqymxe/events/295195280/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "text": "meetup event page" - }, - { - "type": "text", - "text": "." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://www.meetup.com/meetup-group-bnfqymxe/events/295195280/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "image_url": "https://secure.meetupstatic.com/photos/event/b/d/0/c/600_514848396.jpeg", - "image_width": 600, - "image_height": 338, - "image_bytes": 13814, - "service_icon": "https://secure.meetupstatic.com/next/images/general/m_swarm_120x120.png", - "id": 1, - "original_url": "https://www.meetup.com/meetup-group-bnfqymxe/events/295195280/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "fallback": "Meetup: OpenLineage Meetup @ Astronomer, Wed, Aug 30, 2023, 5:30 PM | Meetup", - "text": "Data engineers and pipeline managers know that producing data lineage – end-to-end pipeline metadata instrumented at runtime or parsed at design time – is a heavy lift with", - "title": "OpenLineage Meetup @ Astronomer, Wed, Aug 30, 2023, 5:30 PM | Meetup", - "title_link": "https://www.meetup.com/meetup-group-bnfqymxe/events/295195280/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "service_name": "Meetup" - } - ] - }, - { - "client_msg_id": "e6f83c78-bda8-445a-ac52-5511e8e9e1e1", - "type": "message", - "text": "\nWe released OpenLineage 1.1.0, including:\nAdditions:\n• Flink: create Openlineage configuration based on Flink configuration `#2033` \n• Java: add Javadocs to the Java client `#2004` \n• Spark: append output dataset name to a job name `#2036` \n• Spark: support Spark 3.4.1 `#2057` \nFixes:\n• Flink: fix a bug when getting schema for `KafkaSink` `#2042` \n• Spark: fix ignored event `adaptive_spark_plan` in Databricks `#2061` \nPlus additional bug fixes, doc changes and more.\nThanks to all the contributors, especially new contributors @pentium3 and <@U05HBLE7YPL>!\n*Release:* \n*Changelog:* \n*Commit history:* \n*Maven:* \n*PyPI:* ", - "user": "U02LXF3HUN7", - "ts": "1692817450.338859", - "blocks": [ - { - "type": "rich_text", - "block_id": "zaQ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nWe released OpenLineage 1.1.0, including:\nAdditions:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Flink: create Openlineage configuration based on Flink configuration " - }, - { - "type": "text", - "text": "#2033", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/pawel-big-lebowski", - "text": "@pawel-big-lebowski", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Java: add Javadocs to the Java client " - }, - { - "type": "text", - "text": "#2004", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/julienledem", - "text": "@julienledem", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark: append output dataset name to a job name " - }, - { - "type": "text", - "text": "#2036", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/pawel-big-lebowski", - "text": "@pawel-big-lebowski", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark: support Spark 3.4.1 " - }, - { - "type": "text", - "text": "#2057", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/pawel-big-lebowski", - "text": "@pawel-big-lebowski", - "unsafe": true - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Fixes:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Flink: fix a bug when getting schema for " - }, - { - "type": "text", - "text": "KafkaSink", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "text", - "text": "#2042", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/pentium3", - "text": "@pentium3", - "unsafe": true - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark: fix ignored event " - }, - { - "type": "text", - "text": "adaptive_spark_plan", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " in Databricks " - }, - { - "type": "text", - "text": "#2061", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/algorithmy1", - "text": "@algorithmy1", - "unsafe": true - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Plus additional bug fixes, doc changes and more.\nThanks to all the contributors, especially new contributors @pentium3 and " - }, - { - "type": "user", - "user_id": "U05HBLE7YPL" - }, - { - "type": "text", - "text": "!\n" - }, - { - "type": "text", - "text": "Release:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/releases/tag/1.1.0" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Changelog: ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Commit history:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/compare/1.0.0...1.1.0" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "Maven:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://oss.sonatype.org/#nexus-search;quick~openlineage" - }, - { - "type": "text", - "text": "\n" - }, - { - "type": "text", - "text": "PyPI:", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " " - }, - { - "type": "link", - "url": "https://pypi.org/project/openlineage-python/" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "reactions": [ - { - "name": "clap", - "users": [ - "U05P34LH669", - "U05HBLE7YPL", - "U058ZJBGSAJ", - "U05JBHLPY8K", - "U01HVNU6A4C", - "U01RA9B5GG2", - "U01HNKK4XAM", - "U05EN2CKBS8", - "U05G1LTK0F2" - ], - "count": 9 - }, - { - "name": "gratitude-thank-you", - "users": [ - "U05JY6MN8MS" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "571ffe0f-9038-4345-8581-052ad7c5199e", - "type": "message", - "text": "Hey folks! Do we have clear step-by-step documentation on how we can leverage the `ServiceLoader` based approach for injecting specific OpenLineage customisations for tweaking the transport type with defaults / tweaking column level lineage etc?", - "user": "U05JBHLPY8K", - "ts": "1692810528.463669", - "blocks": [ - { - "type": "rich_text", - "block_id": "ESmZ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hey folks! Do we have clear step-by-step documentation on how we can leverage the " - }, - { - "type": "text", - "text": "ServiceLoader", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " based approach for injecting specific OpenLineage customisations for tweaking the transport type with defaults / tweaking column level lineage etc?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1692810528.463669", - "reply_count": 7, - "reply_users_count": 2, - "latest_reply": "1692960750.563679", - "reply_users": [ - "U01RA9B5GG2", - "U05JBHLPY8K" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "c069a4de-86a9-43d2-84dd-df5ef9129a82", - "type": "message", - "text": "This proposal - ", - "user": "U01RA9B5GG2", - "ts": "1692811472.232389", - "blocks": [ - { - "type": "rich_text", - "block_id": "Gqz0", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "This proposal - " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/41dbbf1799595bd9cd1567df0a7027de08619741/proposals/168/making_spark_visitors_extensible.md" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1692810528.463669", - "parent_user_id": "U05JBHLPY8K" - }, - { - "client_msg_id": "ca29d7f7-b55b-4a88-bcd8-a80a1164af88", - "type": "message", - "text": "For custom transport, you have to provide implementation of interface and point to it in `META_INF` file", - "user": "U01RA9B5GG2", - "ts": "1692811745.594439", - "blocks": [ - { - "type": "rich_text", - "block_id": "4gw", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "For custom transport, you have to provide implementation of interface " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/blob/4a1a5c3bf9767467b71ca0e1b6d820ba9e0a1a1d/client/java/src/main/java/io/openlineage/client/transports/TransportBuilder.java#L8", - "text": "https://github.com/OpenLineage/OpenLineage/blob/4a1a5c3bf9767467b71ca0e1b6d820ba9e[…]ain/java/io/openlineage/client/transports/TransportBuilder.java" - }, - { - "type": "text", - "text": " and point to it in " - }, - { - "type": "text", - "text": "META_INF", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " file" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/blob/4a1a5c3bf9767467b71ca0e1b6d820ba9e0a1a1d/client/java/src/main/java/io/openlineage/client/transports/TransportBuilder.java#L8", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\npublic interface TransportBuilder {\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1692810528.463669", - "parent_user_id": "U05JBHLPY8K" - }, - { - "client_msg_id": "73254d9f-6e38-47e0-884c-1ecef8c79fbb", - "type": "message", - "text": "But if I understand correctly, if you want to change behavior rather than extend, the correct way may be to either contribute it to repo - if that behavior is useful to anyone, or fork the repo", - "user": "U01RA9B5GG2", - "ts": "1692811792.937659", - "blocks": [ - { - "type": "rich_text", - "block_id": "8He", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "But if I understand correctly, if you want to change behavior rather than extend, the correct way may be to either contribute it to repo - if that behavior is useful to anyone, or fork the repo" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1692810528.463669", - "parent_user_id": "U05JBHLPY8K" - }, - { - "client_msg_id": "117FDAFB-AA4A-489E-BAF1-FFE5AF2D0597", - "type": "message", - "text": "<@U01RA9B5GG2> - Can you elaborate more on the \"point to it in META_INF file\"? Let's say we have the custom transport type built in a standalone jar by extending transport builder - what're the exact next steps to use this custom transport in the standalone jar when doing spark-submit?", - "user": "U05JBHLPY8K", - "ts": "1692818083.000699", - "blocks": [ - { - "type": "rich_text", - "block_id": "Ani", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U01RA9B5GG2" - }, - { - "type": "text", - "text": " - Can you elaborate more on the \"point to it in META_INF file\"? Let's say we have the custom transport type built in a standalone jar by extending transport builder - what're the exact next steps to use this custom transport in the standalone jar when doing spark-submit?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1692810528.463669", - "parent_user_id": "U05JBHLPY8K" - }, - { - "client_msg_id": "d79e461b-28bd-4d3b-b282-f400256e558b", - "type": "message", - "text": "<@U05JBHLPY8K> your jar needs to have `META-INF/services/io.openlineage.client.transports.TransportBuilder` with fully qualified class names of your custom TransportBuilders there - like `openlineage-spark` has\n```io.openlineage.client.transports.HttpTransportBuilder\nio.openlineage.client.transports.KafkaTransportBuilder\nio.openlineage.client.transports.ConsoleTransportBuilder\nio.openlineage.client.transports.FileTransportBuilder\nio.openlineage.client.transports.KinesisTransportBuilder```", - "user": "U01RA9B5GG2", - "ts": "1692818593.597789", - "blocks": [ - { - "type": "rich_text", - "block_id": "j7yIr", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U05JBHLPY8K" - }, - { - "type": "text", - "text": " your jar needs to have " - }, - { - "type": "text", - "text": "META-INF/services/io.openlineage.client.transports.TransportBuilder", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " with fully qualified class names of your custom TransportBuilders there - like `openlineage-spark` has\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "io.openlineage.client.transports.HttpTransportBuilder\nio.openlineage.client.transports.KafkaTransportBuilder\nio.openlineage.client.transports.ConsoleTransportBuilder\nio.openlineage.client.transports.FileTransportBuilder\nio.openlineage.client.transports.KinesisTransportBuilder" - } - ], - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1692810528.463669", - "parent_user_id": "U05JBHLPY8K" - }, - { - "client_msg_id": "101b3bd4-b008-4ea4-8731-fdf91e2472cf", - "type": "message", - "text": "<@U01RA9B5GG2> - I think this change may be required for consumers to leverage custom transports, can you check & verify this GH comment?\n", - "user": "U05JBHLPY8K", - "ts": "1692942569.835759", - "blocks": [ - { - "type": "rich_text", - "block_id": "SDPrA", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U01RA9B5GG2" - }, - { - "type": "text", - "text": " - I think this change may be required for consumers to leverage custom transports, can you check & verify this GH comment?\n" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/issues/2007#issuecomment-1690350630" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1692811356, - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/issues/2007#issuecomment-1690350630", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "Comment on #2007 [PROPOSAL] Ability to support custom injectable dynamic header generation class/logic for HTTP transports", - "text": " - I think the `Type` enum in the Transport.java class has default visibility, which makes it invisible for OpenLineage consumers who are trying to define a custom transport logic\n\n\n\nI think it'll be great if we can make this `Type` enum have public visibility. Also, can we add a `CUSTOM` Type here for folks who are trying to define a custom transport of their own?\n\nPlease lmk If the changes proposed seem to be fine, I can draft a PR for the same", - "title": "Comment on #2007 [PROPOSAL] Ability to support custom injectable dynamic header generation class/logic for HTTP transports", - "title_link": "https://github.com/OpenLineage/OpenLineage/issues/2007#issuecomment-1690350630", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1692810528.463669", - "parent_user_id": "U05JBHLPY8K" - }, - { - "client_msg_id": "76108108-f51e-4d0f-9f2a-4eaa1513436c", - "type": "message", - "text": "Probably, I will look at more details next week <@U05JBHLPY8K> as I'm in transit", - "user": "U01RA9B5GG2", - "ts": "1692960750.563679", - "blocks": [ - { - "type": "rich_text", - "block_id": "8a=1", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Probably, I will look at more details next week " - }, - { - "type": "user", - "user_id": "U05JBHLPY8K" - }, - { - "type": "text", - "text": " as I'm in transit" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1692810528.463669", - "parent_user_id": "U05JBHLPY8K", - "reactions": [ - { - "name": "+1", - "users": [ - "U05JBHLPY8K" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "f35a05f2-10a6-4394-b1be-dc89175cbd68", - "type": "message", - "text": "Approve a new release please :slightly_smiling_face:\n• Fix spark integration filtering Databricks events. ", - "user": "U05HBLE7YPL", - "ts": "1692802510.386629", - "blocks": [ - { - "type": "rich_text", - "block_id": "PrcFP", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Approve a new release please " - }, - { - "type": "emoji", - "name": "slightly_smiling_face", - "unicode": "1f642" - }, - { - "type": "text", - "text": "\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Fix spark integration filtering Databricks events. " - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U05HBLE7YPL", - "ts": "1692802621.000000" - }, - "thread_ts": "1692802510.386629", - "reply_count": 2, - "reply_users_count": 1, - "latest_reply": "1692810798.179659", - "reply_users": [ - "U02LXF3HUN7" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1692810798.179659", - "reactions": [ - { - "name": "heavy_plus_sign", - "users": [ - "U05HBLE7YPL", - "U053LCT71BQ", - "U05MUN2MH2S", - "U05P34LH669", - "U05P34PFH7F", - "U02S6F54MAB", - "U02LXF3HUN7", - "U01HNKK4XAM", - "U01DCMDFHBK", - "U01RA9B5GG2", - "U01DCLP0GU9" - ], - "count": 11 - } - ], - "replies": [ - { - "client_msg_id": "334a3373-b7b4-4688-885c-557f86b0bf65", - "type": "message", - "text": "Thank you for requesting a release <@U05HBLE7YPL>. Three +1s from committers will authorize.", - "user": "U02LXF3HUN7", - "ts": "1692808035.154579", - "blocks": [ - { - "type": "rich_text", - "block_id": "q9Adm", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thank you for requesting a release " - }, - { - "type": "user", - "user_id": "U05HBLE7YPL" - }, - { - "type": "text", - "text": ". Three +1s from committers will authorize." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1692802510.386629", - "parent_user_id": "U05HBLE7YPL", - "reactions": [ - { - "name": "raised_hands", - "users": [ - "U05HBLE7YPL" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "45804461-4a1f-4c8e-8cdb-eca19c75336a", - "type": "message", - "text": "Thanks, all. The release is authorized and will be initiated within 2 business days.", - "user": "U02LXF3HUN7", - "ts": "1692810798.179659", - "blocks": [ - { - "type": "rich_text", - "block_id": "+5Iq", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks, all. The release is authorized and will be initiated within 2 business days." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1692802510.386629", - "parent_user_id": "U05HBLE7YPL" - } - ] - } -] \ No newline at end of file diff --git a/slack-archive/data/C01NAFMBVEY.json b/slack-archive/data/C01NAFMBVEY.json deleted file mode 100644 index 0637a08..0000000 --- a/slack-archive/data/C01NAFMBVEY.json +++ /dev/null @@ -1 +0,0 @@ -[] \ No newline at end of file diff --git a/slack-archive/data/C030F1J0264.json b/slack-archive/data/C030F1J0264.json deleted file mode 100644 index 0637a08..0000000 --- a/slack-archive/data/C030F1J0264.json +++ /dev/null @@ -1 +0,0 @@ -[] \ No newline at end of file diff --git a/slack-archive/data/C04E3Q18RR9.json b/slack-archive/data/C04E3Q18RR9.json deleted file mode 100644 index 0637a08..0000000 --- a/slack-archive/data/C04E3Q18RR9.json +++ /dev/null @@ -1 +0,0 @@ -[] \ No newline at end of file diff --git a/slack-archive/data/C04JPTTC876.json b/slack-archive/data/C04JPTTC876.json deleted file mode 100644 index 0637a08..0000000 --- a/slack-archive/data/C04JPTTC876.json +++ /dev/null @@ -1 +0,0 @@ -[] \ No newline at end of file diff --git a/slack-archive/data/C04QSV0GG23.json b/slack-archive/data/C04QSV0GG23.json deleted file mode 100644 index 0637a08..0000000 --- a/slack-archive/data/C04QSV0GG23.json +++ /dev/null @@ -1 +0,0 @@ -[] \ No newline at end of file diff --git a/slack-archive/data/C04THH1V90X.json b/slack-archive/data/C04THH1V90X.json deleted file mode 100644 index 0637a08..0000000 --- a/slack-archive/data/C04THH1V90X.json +++ /dev/null @@ -1 +0,0 @@ -[] \ No newline at end of file diff --git a/slack-archive/data/C051C93UZK9.json b/slack-archive/data/C051C93UZK9.json deleted file mode 100644 index 0637a08..0000000 --- a/slack-archive/data/C051C93UZK9.json +++ /dev/null @@ -1 +0,0 @@ -[] \ No newline at end of file diff --git a/slack-archive/data/C055GGUFMHQ.json b/slack-archive/data/C055GGUFMHQ.json deleted file mode 100644 index 0637a08..0000000 --- a/slack-archive/data/C055GGUFMHQ.json +++ /dev/null @@ -1 +0,0 @@ -[] \ No newline at end of file diff --git a/slack-archive/data/C056YHEU680.json b/slack-archive/data/C056YHEU680.json deleted file mode 100644 index fb1e1a9..0000000 --- a/slack-archive/data/C056YHEU680.json +++ /dev/null @@ -1,199 +0,0 @@ -[ - { - "type": "message", - "subtype": "channel_join", - "ts": "1695502057.851259", - "user": "U05TQPZ4R4L", - "text": "<@U05TQPZ4R4L> has joined the channel" - }, - { - "type": "message", - "text": "Some pictures from last night", - "files": [ - { - "id": "F05QC8C72TY", - "created": 1693538277, - "timestamp": 1693538277, - "name": "IMG_2417.jpg", - "title": "IMG_2417", - "mimetype": "image/jpeg", - "filetype": "jpg", - "pretty_type": "JPEG", - "user": "U01DCLP0GU9", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 1421083, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05QC8C72TY/img_2417.jpg", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05QC8C72TY/download/img_2417.jpg", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QC8C72TY-cee9310a09/img_2417_64.jpg", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QC8C72TY-cee9310a09/img_2417_80.jpg", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QC8C72TY-cee9310a09/img_2417_360.jpg", - "thumb_360_w": 360, - "thumb_360_h": 270, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QC8C72TY-cee9310a09/img_2417_480.jpg", - "thumb_480_w": 480, - "thumb_480_h": 360, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QC8C72TY-cee9310a09/img_2417_160.jpg", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QC8C72TY-cee9310a09/img_2417_720.jpg", - "thumb_720_w": 720, - "thumb_720_h": 540, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QC8C72TY-cee9310a09/img_2417_800.jpg", - "thumb_800_w": 800, - "thumb_800_h": 600, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QC8C72TY-cee9310a09/img_2417_960.jpg", - "thumb_960_w": 960, - "thumb_960_h": 720, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QC8C72TY-cee9310a09/img_2417_1024.jpg", - "thumb_1024_w": 1024, - "thumb_1024_h": 768, - "original_w": 4032, - "original_h": 3024, - "thumb_tiny": "AwAkADCBGljGMZX06ipllU9QV+vT86ftG8/WpdopMERO4C8jI7GmGNZYyR1AyKc0QJ+XcoPbHBpqRbUdlbt60rDuJLgdMVEpAbJqdLbMYJxmj7MoP3qLBcqiaV2xvY59BUkbskuHJOe5NQKSrArwfWrMcJf5pabBE2w7juckEdM1DIyKxUHB25yakkPlgFcYHrTdiSMGZQcDHNIBkcn7pmV+g5GKcjNFy5PPA3UiR4yEG0HihrdniCu2SvQ0ALCi9cc1N1qOL7tSDrQBXveVQeppydFHam3nSP8A3qcvb60ATDpThTR0pRSGf//Z", - "permalink": "https://openlineage.slack.com/files/U01DCLP0GU9/F05QC8C72TY/img_2417.jpg", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05QC8C72TY-6f4277379d", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - }, - { - "id": "F05QJQD2SE7", - "created": 1693538277, - "timestamp": 1693538277, - "name": "IMG_2416.jpg", - "title": "IMG_2416", - "mimetype": "image/jpeg", - "filetype": "jpg", - "pretty_type": "JPEG", - "user": "U01DCLP0GU9", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 1345466, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F05QJQD2SE7/img_2416.jpg", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F05QJQD2SE7/download/img_2416.jpg", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QJQD2SE7-e67c68ce7a/img_2416_64.jpg", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QJQD2SE7-e67c68ce7a/img_2416_80.jpg", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QJQD2SE7-e67c68ce7a/img_2416_360.jpg", - "thumb_360_w": 360, - "thumb_360_h": 270, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QJQD2SE7-e67c68ce7a/img_2416_480.jpg", - "thumb_480_w": 480, - "thumb_480_h": 360, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QJQD2SE7-e67c68ce7a/img_2416_160.jpg", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QJQD2SE7-e67c68ce7a/img_2416_720.jpg", - "thumb_720_w": 720, - "thumb_720_h": 540, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QJQD2SE7-e67c68ce7a/img_2416_800.jpg", - "thumb_800_w": 800, - "thumb_800_h": 600, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QJQD2SE7-e67c68ce7a/img_2416_960.jpg", - "thumb_960_w": 960, - "thumb_960_h": 720, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F05QJQD2SE7-e67c68ce7a/img_2416_1024.jpg", - "thumb_1024_w": 1024, - "thumb_1024_h": 768, - "original_w": 4032, - "original_h": 3024, - "thumb_tiny": "AwAkADBqysBiRd49R1qQFZExncvt1WmeTtkAL4z0HrTvK3NkEg/r+dKwxSdwCsRn1HQ0ySHbGXzSSxurZLe3TrTnV2iI+chevTmlYLlcYQc/eNIG3nBy3oq/1pwiVnA+bn1NWY4grALxx2p2C5FJNu8s917j6U1EuGGQXwecg1NJArheOn4UmxlQKGIA9KLhYQZkKs7jb9eQR60ku8E7XG0++OKljjAO7AyakZQeoBpXHYz0Z0YktyOnGamhmZpgZABgdqkkiUjgYx6cVAN2/nJ9M+lO4rIuilIpBSmpGJQaKQ0AMccikwNp9qc/akH3G+lMD//Z", - "permalink": "https://openlineage.slack.com/files/U01DCLP0GU9/F05QJQD2SE7/img_2416.jpg", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F05QJQD2SE7-1062701b11", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U01DCLP0GU9", - "ts": "1693538290.446329", - "blocks": [ - { - "type": "rich_text", - "block_id": "SoXJ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Some pictures from last night" - } - ] - } - ] - } - ], - "client_msg_id": "F51EEE94-187B-4A76-9CD0-632E2BE58C1E" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1693520941.967439", - "user": "U05Q3HT6PBR", - "text": "<@U05Q3HT6PBR> has joined the channel" - }, - { - "client_msg_id": "1a2569cb-8ea1-45b6-b8a2-aea02fe25d1f", - "type": "message", - "text": "Time: 5:30-8:30 pm\nAddress: 8 California St., San Francisco, CA, seventh floor\nGetting in: someone from Astronomer will be in the lobby to direct you", - "user": "U02LXF3HUN7", - "ts": "1693422775.587509", - "blocks": [ - { - "type": "rich_text", - "block_id": "F3M", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Time: 5:30-8:30 pm\nAddress: 8 California St., San Francisco, CA, seventh floor\nGetting in: someone from Astronomer will be in the lobby to direct you" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "82e1578c-2f96-4144-a3eb-aa236b28dce2", - "type": "message", - "text": "Adding the venue info in case it’s more convenient than the meetup page:", - "user": "U02LXF3HUN7", - "ts": "1693422678.406409", - "blocks": [ - { - "type": "rich_text", - "block_id": "gcfz", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Adding the venue info in case it’s more convenient than the meetup page:" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - } -] \ No newline at end of file diff --git a/slack-archive/data/C05N442RQUA.json b/slack-archive/data/C05N442RQUA.json deleted file mode 100644 index e15d8a4..0000000 --- a/slack-archive/data/C05N442RQUA.json +++ /dev/null @@ -1,586 +0,0 @@ -[ - { - "client_msg_id": "6d1c0549-f20e-4c41-9c13-9023d966d51c", - "type": "message", - "text": "Hi, if you’re wondering if you’re in the right place: look for Uncle Tetsu’s Cheesecake nextdoor and for the address (600 Bay St) above the door. The building is an older one (unlike the meeting space itself, which is modern and well-appointed)", - "user": "U02LXF3HUN7", - "ts": "1695068433.208409", - "blocks": [ - { - "type": "rich_text", - "block_id": "1YbUw", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi, if you’re wondering if you’re in the right place: look for Uncle Tetsu’s Cheesecake nextdoor and for the address (600 Bay St) above the door. The building is an older one (unlike the meeting space itself, which is modern and well-appointed)" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "99a7da7d-9233-482c-88c5-ba9bf6ea989f", - "type": "message", - "text": "Looking forward to seeing you on Monday! Here’s the time/place info again for your convenience:\n• Date: 9/18\n• Time: 5-8:00 PM ET\n• Place: Canarts, 600 Bay St., #410 (around the corner from the Airflow Summit venue)\n• Venue phone: \n• Meetup page with more info and signup: \nPlease send a message if you find yourself stuck in the lobby, etc.", - "user": "U02LXF3HUN7", - "ts": "1694794649.751439", - "blocks": [ - { - "type": "rich_text", - "block_id": "l3F59", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Looking forward to seeing you on Monday! Here’s the time/place info again for your convenience:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Date: 9/18" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Time: 5-8:00 PM ET" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Place: Canarts, 600 Bay St., #410 (around the corner from the Airflow Summit venue)" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Venue phone: " - }, - { - "type": "link", - "url": "tel:4168052286", - "text": "416-805-2286" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Meetup page with more info and signup: " - }, - { - "type": "link", - "url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Please send a message if you find yourself stuck in the lobby, etc." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "image_url": "https://secure.meetupstatic.com/photos/event/5/4/2/d/600_515181549.jpeg", - "image_width": 600, - "image_height": 338, - "image_bytes": 16248, - "service_icon": "https://secure.meetupstatic.com/next/images/general/m_swarm_120x120.png", - "id": 1, - "original_url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "fallback": "Meetup: Toronto OpenLineage Meetup at Airflow Summit, Mon, Sep 18, 2023, 5:00 PM | Meetup", - "text": "Data engineers and pipeline managers know that producing data lineage – end-to-end pipeline metadata instrumented at runtime or parsed at design time – is a heavy lift with", - "title": "Toronto OpenLineage Meetup at Airflow Summit, Mon, Sep 18, 2023, 5:00 PM | Meetup", - "title_link": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "service_name": "Meetup" - } - ], - "reactions": [ - { - "name": "raised_hands", - "users": [ - "U01RA9B5GG2" - ], - "count": 1 - } - ] - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1694787071.646859", - "user": "U05SXDWVA7K", - "text": "<@U05SXDWVA7K> has joined the channel" - }, - { - "client_msg_id": "6cfdc82a-896e-4963-9187-717456008417", - "type": "message", - "text": "\nIt’s hard to believe this is happening in just one week! Here’s the updated agenda:\n1. *Intros*\n2. *Evolution of spec presentation/discussion (project background/history)*\n3. *State of the community*\n4. *Integrating OpenLineage with (by special guests & )*\n5. *Spark/Column lineage update*\n6. *Airflow Provider update*\n7. *Roadmap Discussion*\n8. *Action items review/next steps*\nFind the details and RSVP .", - "user": "U02LXF3HUN7", - "ts": "1694441637.116609", - "blocks": [ - { - "type": "rich_text", - "block_id": "Go5ub", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "broadcast", - "range": "channel" - }, - { - "type": "text", - "text": "\nIt’s hard to believe this is happening in just one week! Here’s the updated agenda:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Intros", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Evolution of spec presentation/discussion (project background/history)", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "State of the community", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Integrating OpenLineage with ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://metaphor.io/", - "text": "Metaphor", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " (by special guests ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://www.linkedin.com/in/yeliu84/", - "text": "Ye", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": " & ", - "style": { - "bold": true - } - }, - { - "type": "link", - "url": "https://www.linkedin.com/in/ivanperepelitca/", - "text": "Ivan", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": ")", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Spark/Column lineage update", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Airflow Provider update", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Roadmap Discussion", - "style": { - "bold": true - } - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Action items review/next steps", - "style": { - "bold": true - } - } - ] - } - ], - "style": "ordered", - "indent": 0, - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Find the details and RSVP " - }, - { - "type": "link", - "url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "text": "here" - }, - { - "type": "text", - "text": "." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "image_url": "https://static.metaphor.io/preview.jpg", - "image_width": 719, - "image_height": 378, - "image_bytes": 122301, - "from_url": "https://metaphor.io/", - "id": 1, - "original_url": "https://metaphor.io/", - "fallback": "Metaphor - The Social Platform for Data", - "text": "Making Data Actionable, At Scale - Designed for data teams building cloud-native, self-service data platforms for their business users. Explore our Data Governance, Data Lineage, Data Discovery, and Data Trust capabilities today.", - "title": "Metaphor - The Social Platform for Data", - "title_link": "https://metaphor.io/", - "service_name": "metaphor.io" - }, - { - "image_url": "https://secure.meetupstatic.com/photos/event/5/4/2/d/600_515181549.jpeg", - "image_width": 600, - "image_height": 338, - "image_bytes": 16248, - "from_url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "service_icon": "https://secure.meetupstatic.com/next/images/general/m_swarm_120x120.png", - "id": 2, - "original_url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "fallback": "Meetup: Toronto OpenLineage Meetup at Airflow Summit, Mon, Sep 18, 2023, 5:00 PM | Meetup", - "text": "Data engineers and pipeline managers know that producing data lineage – end-to-end pipeline metadata instrumented at runtime or parsed at design time – is a heavy lift with", - "title": "Toronto OpenLineage Meetup at Airflow Summit, Mon, Sep 18, 2023, 5:00 PM | Meetup", - "title_link": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "service_name": "Meetup" - } - ], - "reactions": [ - { - "name": "raised_hands", - "users": [ - "U01DCMDFHBK" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "bdb67b53-48b3-4864-93e4-b650780c786d", - "type": "message", - "text": "Most OpenLineage regular contributors will be there. It will be fun to be all in person. Everyone is encouraged to join", - "user": "U01DCLP0GU9", - "ts": "1693624251.155569", - "blocks": [ - { - "type": "rich_text", - "block_id": "=fiF4", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Most OpenLineage regular contributors will be there. It will be fun to be all in person. Everyone is encouraged to join" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "reactions": [ - { - "name": "raised_hands", - "users": [ - "U01HNKK4XAM", - "U01RA9B5GG2", - "U02MK6YNAQ5", - "U02LXF3HUN7", - "U01DCMDFHBK", - "U05KKM07PJP" - ], - "count": 6 - } - ] - }, - { - "client_msg_id": "30dcc90d-e303-46ad-821f-b3ae4b823497", - "type": "message", - "text": "really looking forward to meeting all of you in Toronto!!", - "user": "U01HNKK4XAM", - "ts": "1692984822.264569", - "blocks": [ - { - "type": "rich_text", - "block_id": "e1s", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "really looking forward to meeting all of you in Toronto!!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "c4139590-6c9a-4a20-99f3-041ab782263e", - "type": "message", - "text": "Some belated updates on this in case you’re not aware:\n• Date: 9/18\n• Time: 5-8:00 PM ET\n• Place: Canarts, 600 Bay St., #410 (around the corner from the Airflow Summit venue)\n• Venue phone: \n• Meetup for more info and to sign up: ", - "user": "U02LXF3HUN7", - "ts": "1692984607.290939", - "blocks": [ - { - "type": "rich_text", - "block_id": "yvbeN", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Some belated updates on this in case you’re not aware:\n" - } - ] - }, - { - "type": "rich_text_list", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Date: 9/18" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Time: 5-8:00 PM ET" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Place: Canarts, 600 Bay St., #410 (around the corner from the Airflow Summit venue)" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Venue phone: " - }, - { - "type": "link", - "url": "tel:4168052286", - "text": "416-805-2286" - } - ] - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Meetup for more info and to sign up: " - }, - { - "type": "link", - "url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link" - } - ] - } - ], - "style": "bullet", - "indent": 0, - "border": 0 - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02LXF3HUN7", - "ts": "1694786010.000000" - }, - "attachments": [ - { - "from_url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "image_url": "https://secure.meetupstatic.com/photos/event/5/4/2/d/600_515181549.jpeg", - "image_width": 600, - "image_height": 338, - "image_bytes": 16248, - "service_icon": "https://secure.meetupstatic.com/next/images/general/m_swarm_120x120.png", - "id": 1, - "original_url": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "fallback": "Meetup: Toronto OpenLineage Meetup at Airflow Summit, Mon, Sep 18, 2023, 2:00 PM | Meetup", - "text": "Data engineers and pipeline managers know that producing data lineage – end-to-end pipeline metadata instrumented at runtime or parsed at design time – is a heavy lift with", - "title": "Toronto OpenLineage Meetup at Airflow Summit, Mon, Sep 18, 2023, 2:00 PM | Meetup", - "title_link": "https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link", - "service_name": "Meetup" - } - ], - "reactions": [ - { - "name": "tada", - "users": [ - "U01HNKK4XAM", - "U01RA9B5GG2", - "U02MK6YNAQ5", - "U01DCMDFHBK" - ], - "count": 4 - }, - { - "name": "raised_hands", - "users": [ - "U01HNKK4XAM", - "U01RA9B5GG2", - "U02MK6YNAQ5", - "U01DCMDFHBK" - ], - "count": 4 - } - ] - } -] \ No newline at end of file diff --git a/slack-archive/data/C05PD7VJ52S.json b/slack-archive/data/C05PD7VJ52S.json deleted file mode 100644 index 0968ddc..0000000 --- a/slack-archive/data/C05PD7VJ52S.json +++ /dev/null @@ -1,382 +0,0 @@ -[ - { - "type": "message", - "subtype": "channel_join", - "ts": "1693512257.995209", - "user": "U05QHG1NJ8J", - "text": "<@U05QHG1NJ8J> has joined the channel" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1692984861.392349", - "user": "U01RA9B5GG2", - "text": "<@U01RA9B5GG2> has joined the channel", - "inviter": "U01HNKK4XAM" - }, - { - "client_msg_id": "c1976b81-2c3c-47ac-a0a9-1563847ea465", - "type": "message", - "text": "Yes, hope so! Thank you for your interest in joining a meetup!", - "user": "U02LXF3HUN7", - "ts": "1692983998.092189", - "blocks": [ - { - "type": "rich_text", - "block_id": "7/R", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Yes, hope so! Thank you for your interest in joining a meetup!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "90b6a8f3-45be-4769-9bb3-8099697713b9", - "type": "message", - "text": "hopefully meet you soon in London", - "user": "U05HK41VCH1", - "ts": "1692982281.255129", - "blocks": [ - { - "type": "rich_text", - "block_id": "9eK", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "hopefully meet you soon in London" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "d981a3fe-45e4-4047-bc62-0fb1ac36d8c6", - "type": "message", - "text": "Thanks Michael for starting this channel", - "user": "U05HK41VCH1", - "ts": "1692982268.668049", - "blocks": [ - { - "type": "rich_text", - "block_id": "t6DYs", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Thanks Michael for starting this channel" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "2c707994-db70-4e93-8f18-b76f192b1b66", - "type": "message", - "text": "yes absolutely will give you an answer by Monday", - "user": "U05J5GRKY10", - "ts": "1692979169.558489", - "blocks": [ - { - "type": "rich_text", - "block_id": "U97Fk", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "yes absolutely will give you an answer by Monday" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "reactions": [ - { - "name": "+1", - "users": [ - "U02LXF3HUN7" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "a309f1f6-375a-4b9d-b024-a18d82bf7cb4", - "type": "message", - "text": "OK! Would you please let me know when you know, and we’ll go from there?", - "user": "U02LXF3HUN7", - "ts": "1692979126.425069", - "blocks": [ - { - "type": "rich_text", - "block_id": "yKFbq", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "OK! Would you please let me know when you know, and we’ll go from there?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "b23a7e78-ee51-4fc0-af50-a95e8352a4d8", - "type": "message", - "text": "and if that not the case i can provide personal space", - "user": "U05J5GRKY10", - "ts": "1692979080.273399", - "blocks": [ - { - "type": "rich_text", - "block_id": "/9JOs", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "and if that not the case i can provide personal space" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "489d9333-80ab-4848-9d60-f0ceec0c0214", - "type": "message", - "text": "I am pretty sure you can use our 6point6 offices or at least part of it", - "user": "U05J5GRKY10", - "ts": "1692979067.390839", - "blocks": [ - { - "type": "rich_text", - "block_id": "RoGJB", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I am pretty sure you can use our 6point6 offices or at least part of it" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "b6a2f006-3fe7-4990-a727-671422b36d3d", - "type": "message", - "text": "I will have to confirm but 99% yes", - "user": "U05J5GRKY10", - "ts": "1692978958.692039", - "blocks": [ - { - "type": "rich_text", - "block_id": "wlacd", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I will have to confirm but 99% yes" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "c1f7cbd7-f90e-4afc-a7d8-2f0b00323b55", - "type": "message", - "text": "Great! Do you happen to have space we could use?", - "user": "U02LXF3HUN7", - "ts": "1692978871.220909", - "blocks": [ - { - "type": "rich_text", - "block_id": "kTLh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Great! Do you happen to have space we could use?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "e2a13328-0735-43f7-a2e8-22a9ed79787f", - "type": "message", - "text": "thats perfect !", - "user": "U05J5GRKY10", - "ts": "1692978852.583609", - "blocks": [ - { - "type": "rich_text", - "block_id": "Cm8bi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "thats perfect !" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "2bf7b6c8-44fc-4ed4-b19f-f115bcbed49c", - "type": "message", - "text": "Hi George, nice to meet you. Thanks for asking about future meetups. Would November be too soon, or what’s a good timeframe for you all?", - "user": "U02LXF3HUN7", - "ts": "1692978834.508549", - "blocks": [ - { - "type": "rich_text", - "block_id": "hQj1", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi George, nice to meet you. Thanks for asking about future meetups. Would November be too soon, or what’s a good timeframe for you all?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "f3a13e65-b8e8-4d7d-acaa-b0b5d466029a", - "type": "message", - "text": "thanks so much !", - "user": "U05J5GRKY10", - "ts": "1692978816.416879", - "blocks": [ - { - "type": "rich_text", - "block_id": "q=t=", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "thanks so much !" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "9e029cb5-0c45-452f-af4d-03958ed1b279", - "type": "message", - "text": "Hi Michael", - "user": "U05J5GRKY10", - "ts": "1692978769.711289", - "blocks": [ - { - "type": "rich_text", - "block_id": "yR4", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hi Michael" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1692978760.556819", - "user": "U05HK41VCH1", - "text": "<@U05HK41VCH1> has joined the channel", - "inviter": "U02LXF3HUN7" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1692978735.461499", - "user": "U01HNKK4XAM", - "text": "<@U01HNKK4XAM> has joined the channel", - "inviter": "U02LXF3HUN7" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1692978735.367849", - "user": "U05J5GRKY10", - "text": "<@U05J5GRKY10> has joined the channel", - "inviter": "U02LXF3HUN7" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1692978725.076179", - "user": "U02LXF3HUN7", - "text": "<@U02LXF3HUN7> has joined the channel" - } -] \ No newline at end of file diff --git a/slack-archive/data/C05U3UC85LM.json b/slack-archive/data/C05U3UC85LM.json deleted file mode 100644 index f8d81b0..0000000 --- a/slack-archive/data/C05U3UC85LM.json +++ /dev/null @@ -1,336 +0,0 @@ -[ - { - "type": "message", - "subtype": "channel_join", - "ts": "1697734944.171989", - "user": "U0620HU51HA", - "text": "<@U0620HU51HA> has joined the channel" - }, - { - "client_msg_id": "fe3c2887-f751-445f-a711-034c74a1a00f", - "type": "message", - "text": "<@U05U9K21LSG> would be great if we could get your eyes on this PR: ", - "user": "U01HNKK4XAM", - "ts": "1697201275.793679", - "blocks": [ - { - "type": "rich_text", - "block_id": "+lkFZ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U05U9K21LSG" - }, - { - "type": "text", - "text": " would be great if we could get your eyes on this PR: " - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2134" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1695798428, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/OpenLineage/OpenLineage/pull/2134", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2134 Support more recent versions of GX", - "text": "Limit upper GX version to <0.16.0.\n\nReopening accidentally closed by me.\n\nKudos to .\n\n*Checklist*\n\n☑︎ You've your work\n☑︎ Your pull request title follows our \n☑︎ Your changes are accompanied by tests (_if relevant_)\n☑︎ Your change contains a and is self-contained\n☐ You've updated any relevant documentation (_if relevant_)\n☐ Your comment includes a one-liner for the changelog about the specific purpose of the change (_if necessary_)\n☐ You've versioned the core OpenLineage model or facets according to (_if relevant_)\n☐ You've added a to source files (_if relevant_)\n\n* * *\n\nSPDX-License-Identifier: Apache-2.0 \nCopyright 2018-2023 contributors to the OpenLineage project", - "title": "#2134 Support more recent versions of GX", - "title_link": "https://github.com/OpenLineage/OpenLineage/pull/2134", - "footer": "", - "fields": [ - { - "value": "integration/great-expectations, common", - "title": "Labels", - "short": true - }, - { - "value": "6", - "title": "Comments", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1697201275.793679", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1697223637.244249", - "reply_users": [ - "U05U9K21LSG" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "ce4bbaf7-c460-42c8-8cdc-787411c365cc", - "type": "message", - "text": "I'm a bit slammed today but can look on Tuesday.", - "user": "U05U9K21LSG", - "ts": "1697223637.244249", - "blocks": [ - { - "type": "rich_text", - "block_id": "fW0Bl", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I'm a bit slammed today but can look on Tuesday." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1697201275.793679", - "parent_user_id": "U01HNKK4XAM", - "reactions": [ - { - "name": "white_check_mark", - "users": [ - "U01HNKK4XAM" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "676cb6ce-3f9e-405c-b2d7-21d716026b35", - "type": "message", - "text": "Just seeing this, we had a company holiday yesterday. Yes, fluent data sources are our new way of connecting to data and the older \"block-style\" is deprecated and will be removed when we cut 0.18.0. I'm not sure of the timing of that but likely in the next couple months.", - "user": "U05U9K21LSG", - "ts": "1696979522.397239", - "blocks": [ - { - "type": "rich_text", - "block_id": "z1dsG", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Just seeing this, we had a company holiday yesterday. Yes, fluent data sources are our new way of connecting to data and the older \"block-style\" is deprecated and will be removed when we cut 0.18.0. I'm not sure of the timing of that but likely in the next couple months." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "reactions": [ - { - "name": "+1", - "users": [ - "U02S6F54MAB" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "ce3473ef-55b7-4544-b100-f84b90dedcc0", - "type": "message", - "text": "<@U05U9929K3N> <@U05U9K21LSG> ^^", - "user": "U01HNKK4XAM", - "ts": "1696860879.713919", - "blocks": [ - { - "type": "rich_text", - "block_id": "VrCEQ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U05U9929K3N" - }, - { - "type": "text", - "text": " " - }, - { - "type": "user", - "user_id": "U05U9K21LSG" - }, - { - "type": "text", - "text": " ^^" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "a458875f-f440-48fa-8583-cdd43dd35eae", - "type": "message", - "text": "Hello guys! I’ve been looking recently into changes in GX.\n\nis this the major change you’d like to introduce in OL<-> GX?", - "user": "U02S6F54MAB", - "ts": "1696852012.119669", - "blocks": [ - { - "type": "rich_text", - "block_id": "77ma2", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Hello guys! I’ve been looking recently into changes in GX.\n" - }, - { - "type": "link", - "url": "https://greatexpectations.io/blog/the-fluent-way-to-connect-to-data-sources-in-gx/" - }, - { - "type": "text", - "text": "\nis this the major change you’d like to introduce in OL<-> GX?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "image_url": "https://images.ctfassets.net/ycwst8v1r2x5/5Ec3CoEgrMo6EtNF64XL4Q/9ba1176e49c83e6dfaeafe089c3f326c/fluent_cover_card.png", - "image_width": 1560, - "image_height": 1040, - "image_bytes": 1828778, - "from_url": "https://greatexpectations.io/blog/the-fluent-way-to-connect-to-data-sources-in-gx/", - "service_icon": "https://greatexpectations.io/icons/icon-48x48.png?v=62b42a6c11e70607aa006eb9abc523ed", - "id": 1, - "original_url": "https://greatexpectations.io/blog/the-fluent-way-to-connect-to-data-sources-in-gx/", - "fallback": "The fluent way to connect to data sources in GX", - "text": "Creating a Datasource just got (much) easier", - "title": "The fluent way to connect to data sources in GX", - "title_link": "https://greatexpectations.io/blog/the-fluent-way-to-connect-to-data-sources-in-gx/", - "service_name": "greatexpectations.io" - } - ] - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1695916771.601439", - "user": "U05U9K21LSG", - "text": "<@U05U9K21LSG> has joined the channel" - }, - { - "client_msg_id": "854a6cc9-f46f-4955-a845-f33242ac4025", - "type": "message", - "text": "<@U05U9929K3N> it was great meeting earlier, looking forward to collaborating on this!", - "user": "U01HNKK4XAM", - "ts": "1695840258.594299", - "blocks": [ - { - "type": "rich_text", - "block_id": "vMZ5h", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U05U9929K3N" - }, - { - "type": "text", - "text": " it was great meeting earlier, looking forward to collaborating on this!" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "reactions": [ - { - "name": "heavy_plus_sign", - "users": [ - "U05U9929K3N", - "U02S6F54MAB" - ], - "count": 2 - } - ] - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1695836477.653589", - "user": "U01RA9B5GG2", - "text": "<@U01RA9B5GG2> has joined the channel", - "inviter": "U02LXF3HUN7" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1695836303.953919", - "user": "U02S6F54MAB", - "text": "<@U02S6F54MAB> has joined the channel", - "inviter": "U02LXF3HUN7" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1695836303.827639", - "user": "U01HNKK4XAM", - "text": "<@U01HNKK4XAM> has joined the channel", - "inviter": "U02LXF3HUN7" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1695836303.727049", - "user": "U05U9929K3N", - "text": "<@U05U9929K3N> has joined the channel", - "inviter": "U02LXF3HUN7" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1695836283.617309", - "user": "U02LXF3HUN7", - "text": "<@U02LXF3HUN7> has joined the channel", - "reactions": [ - { - "name": "tada", - "users": [ - "U01HNKK4XAM" - ], - "count": 1 - } - ] - } -] \ No newline at end of file diff --git a/slack-archive/data/C065PQ4TL8K.json b/slack-archive/data/C065PQ4TL8K.json deleted file mode 100644 index 9920c44..0000000 --- a/slack-archive/data/C065PQ4TL8K.json +++ /dev/null @@ -1,2466 +0,0 @@ -[ - { - "client_msg_id": "094cab1a-de83-45a2-96ff-8efdaf41e09c", - "type": "message", - "text": "Maybe move today's meeting earlier, since no one from west coast is joining? <@U01HNKK4XAM>", - "user": "U01RA9B5GG2", - "ts": "1700562211.366219", - "blocks": [ - { - "type": "rich_text", - "block_id": "4la1W", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Maybe move today's meeting earlier, since no one from west coast is joining? " - }, - { - "type": "user", - "user_id": "U01HNKK4XAM" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "c67600bf-5ef9-411c-a0d4-bd4dabef0b3c", - "type": "message", - "text": "I’m off on vacation. See you in a week", - "user": "U01DCLP0GU9", - "ts": "1700272614.735719", - "blocks": [ - { - "type": "rich_text", - "block_id": "fzif9", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I’m off on vacation. See you in a week" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "reactions": [ - { - "name": "heart", - "users": [ - "U02S6F54MAB", - "U01RA9B5GG2", - "U02MK6YNAQ5", - "U01HNKK4XAM", - "U053LLVTHRN" - ], - "count": 5 - } - ] - }, - { - "client_msg_id": "090c8b69-40d1-44f5-aaf6-89200841114e", - "type": "message", - "text": "just searching for OpenLineage in the Datahub code base. They have an “interesting” approach? ", - "user": "U01DCLP0GU9", - "ts": "1700246539.228259", - "blocks": [ - { - "type": "rich_text", - "block_id": "c5Ci8", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "just searching for OpenLineage in the Datahub code base. They have an “interesting” approach? " - }, - { - "type": "link", - "url": "https://github.com/datahub-project/datahub/blob/2b0811b9875d7d7ea11fb01d0157a21fdd67f020/metadata-ingestion-modules/airflow-plugin/src/datahub_airflow_plugin/_extractors.py#L14", - "text": "https://github.com/datahub-project/datahub/blob/2b0811b9875d7d7ea11fb01d0157a21fdd[…]odules/airflow-plugin/src/datahub_airflow_plugin/_extractors.py" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/datahub-project/datahub/blob/2b0811b9875d7d7ea11fb01d0157a21fdd67f020/metadata-ingestion-modules/airflow-plugin/src/datahub_airflow_plugin/_extractors.py#L14", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "", - "text": "```\nfrom openlineage.airflow.extractors import BaseExtractor\n```", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1700246539.228259", - "reply_count": 7, - "reply_users_count": 3, - "latest_reply": "1700467087.549929", - "reply_users": [ - "U01DCLP0GU9", - "U02S6F54MAB", - "U02MK6YNAQ5" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "26a6b41b-422e-42ba-9f3e-322422a370a7", - "type": "message", - "text": "It looks like the datahub airflow plugin uses OL. but turns it off\n\n```disable_openlineage_plugin\ttrue\tDisable the OpenLineage plugin to avoid duplicative processing.```\nThey reuse the extractors but then “patch” the behavior.", - "user": "U01DCLP0GU9", - "ts": "1700246841.718379", - "blocks": [ - { - "type": "rich_text", - "block_id": "DREhv", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "It looks like the datahub airflow plugin uses OL. but turns it off\n" - }, - { - "type": "link", - "url": "https://github.com/datahub-project/datahub/blob/2b0811b9875d7d7ea11fb01d0157a21fdd67f020/docs/lineage/airflow.md" - }, - { - "type": "text", - "text": "\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "disable_openlineage_plugin\ttrue\tDisable the OpenLineage plugin to avoid duplicative processing." - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "They reuse the extractors but then “patch” the behavior." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "image_url": "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/airflow/plugin_connection_setup.png", - "image_width": 842, - "image_height": 643, - "image_bytes": 50334, - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "color": "24292f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/datahub-project/datahub/blob/2b0811b9875d7d7ea11fb01d0157a21fdd67f020/docs/lineage/airflow.md", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "842x643px image", - "text": "*Airflow Integration*\n\n:::note\n\nIf you're looking to schedule DataHub ingestion using Airflow, see the guide on <../../metadata-ingestion/schedule_docs/airflow.md|scheduling ingestion with Airflow>.\n\n:::\n\nThe DataHub Airflow plugin supports:\n\n• Automatic column-level lineage extraction from various operators e.g. `SqlOperator`s (including `MySqlOperator`, `PostgresOperator`, `SnowflakeOperator`, and more), `S3FileTransformOperator`, and a few others.\n• Airflow DAG and tasks, including properties, ownership, and tags.\n• Task run information, including task successes and failures.\n• Manual lineage annotations using `inlets` and `outlets` on Airflow operators.\n\nThere's two actively supported implementations of the plugin, with different Airflow version support.\n\nIf you're using Airflow older than 2.1, it's possible to use the v1 plugin with older versions of `acryl-datahub-airflow-plugin`. See the for more details.\n\n*DataHub Plugin v2*\n*Installation*\n\nThe v2 plugin requires Airflow 2.3+ and Python 3.8+. If you don't meet these requirements, use the v1 plugin instead.\n\n```\npip install 'acryl-datahub-airflow-plugin[plugin-v2]'\n```\n\n*Configuration*\n\nSet up a DataHub connection in Airflow, either via command line or the Airflow UI.\n\n*Command Line*\n\n```\nairflow connections add --conn-type 'datahub-rest' 'datahub_rest_default' --conn-host '' --conn-password ''\n```\n\n*Airflow UI*\n\nOn the Airflow UI, go to Admin -> Connections and click the \"+\" symbol to create a new connection. Select \"DataHub REST Server\" from the dropdown for \"Connection Type\" and enter the appropriate values.\n\n\n\n*Optional Configurations*\n\nNo additional configuration is required to use the plugin. However, there are some optional configuration parameters that can be set in the `airflow.cfg` file.\n\n```\n[datahub]\n# Optional - additional config here.\nenabled = True # default\n```\n\n*Automatic lineage extraction*\n\nTo automatically extract lineage information, the v2 plugin builds on top of Airflow's built-in .\n\nThe SQL-related extractors have been updated to use DataHub's SQL parser, which is more robust than the built-in one and uses DataHub's metadata information to generate column-level lineage. We discussed the DataHub SQL parser, including why schema-aware parsing works better and how it performs on benchmarks, during the .\n\n*DataHub Plugin v1*\n*Installation*\n\nThe v1 plugin requires Airflow 2.1+ and Python 3.8+. If you're on older versions, it's still possible to use an older version of the plugin. See the for more details.\n\nIf you're using Airflow 2.3+, we recommend using the v2 plugin instead. If you need to use the v1 plugin with Airflow 2.3+, you must also set the environment variable `DATAHUB_AIRFLOW_PLUGIN_USE_V1_PLUGIN=true`.\n\n```\npip install 'acryl-datahub-airflow-plugin[plugin-v1]'\n\n# The DataHub rest connection type is included by default.\n# To use the DataHub Kafka connection type, install the plugin with the kafka extras.\npip install 'acryl-datahub-airflow-plugin[plugin-v1,datahub-kafka]'\n```\n\n*Configuration*\n*Disable lazy plugin loading*\n\n```\n[core]\nlazy_load_plugins = False\n```\n\nOn MWAA you should add this config to your .\n\n*Setup a DataHub connection*\n\nYou must configure an Airflow connection for Datahub. We support both a Datahub REST and a Kafka-based connections, but you only need one.\n\n```\n# For REST-based:\nairflow connections add --conn-type 'datahub_rest' 'datahub_rest_default' --conn-host '' --conn-password ''\n# For Kafka-based (standard Kafka sink config can be passed via extras):\nairflow connections add --conn-type 'datahub_kafka' 'datahub_kafka_default' --conn-host 'broker:9092' --conn-extra '{}'\n```\n\n*Configure the plugin*\n\nIf your config doesn't align with the default values, you can configure the plugin in your `airflow.cfg` file.\n\n```\n[datahub]\nenabled = true\nconn_id = datahub_rest_default # or datahub_kafka_default\n# etc.\n```\n\n*Validate that the plugin is working*\n\n1. Go and check in Airflow at Admin -> Plugins menu if you can see the DataHub plugin\n2. Run an Airflow DAG. In the task logs, you should see Datahub related log messages like:\n\n```\nEmitting DataHub ...\n```\n\n*Manual Lineage Annotation*\n*Using `inlets` and `outlets`*\n\nYou can manually annotate lineage by setting `inlets` and `outlets` on your Airflow operators. This is useful if you're using an operator that doesn't support automatic lineage extraction, or if you want to override the automatic lineage extraction.\n\nWe have a few code samples that demonstrate how to use `inlets` and `outlets`:\n\n• <../../metadata-ingestion-modules/airflow-plugin/src/datahub_airflow_plugin/example_dags/lineage_backend_demo.py|`lineage_backend_demo.py`>\n• <../../metadata-ingestion-modules/airflow-plugin/src/datahub_airflow_plugin/example_dags/lineage_backend_taskflow_demo.py|`lineage_backend_taskflow_demo.py`> - uses the \n\nFor more information, take a look at the .\n\n*Custom Operators*\n\nIf you have created a that inherits from the BaseOperator class, \nwhen overriding the `execute` function, set inlets and outlets via `context['ti'].task.inlets` and `context['ti'].task.outlets`. \nThe DataHub Airflow plugin will then pick up those inlets and outlets after the task runs.\n\n```\nclass DbtOperator(BaseOperator):\n ...\n\n def execute(self, context):\n # do something\n inlets, outlets = self._get_lineage()\n # inlets/outlets are lists of either datahub_airflow_plugin.entities.Dataset or datahub_airflow_plugin.entities.Urn\n context['ti'].task.inlets = self.inlets\n context['ti'].task.outlets = self.outlets\n\n def _get_lineage(self):\n # Do some processing to get inlets/outlets\n\n return inlets, outlets\n```\n\nIf you override the `pre_execute` and `post_execute` function, ensure they include the `@prepare_lineage` and `@apply_lineage` decorators respectively. Reference the for more details.\n\n*Emit Lineage Directly*\n\nIf you can't use the plugin or annotate inlets/outlets, you can also emit lineage using the `DatahubEmitterOperator`.\n\nReference <../../metadata-ingestion-modules/airflow-plugin/src/datahub_airflow_plugin/example_dags/lineage_emission_dag.py|`lineage_emission_dag.py`> for a full example.\n\nIn order to use this example, you must first configure the Datahub hook. Like in ingestion, we support a Datahub REST hook and a Kafka-based hook. See the plugin configuration for examples.\n\n*Debugging*\n*Missing lineage*\n\nIf you're not seeing lineage in DataHub, check the following:\n\n• Validate that the plugin is loaded in Airflow. Go to Admin -> Plugins and check that the DataHub plugin is listed.\n• With the v2 plugin, it should also print a l…", - "title": "", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1700246539.228259", - "parent_user_id": "U01DCLP0GU9" - }, - { - "client_msg_id": "59a1e940-c1fc-490c-b9c5-147d5e834b71", - "type": "message", - "text": "Of course this approach will need changing again with AF 2.7", - "user": "U01DCLP0GU9", - "ts": "1700246932.436609", - "blocks": [ - { - "type": "rich_text", - "block_id": "/POFd", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Of course this approach will need changing again with AF 2.7" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700246539.228259", - "parent_user_id": "U01DCLP0GU9" - }, - { - "client_msg_id": "8da9603b-c057-4ef0-8124-86412209192e", - "type": "message", - "text": "It’s their choice :shrug:", - "user": "U01DCLP0GU9", - "ts": "1700246942.225929", - "blocks": [ - { - "type": "rich_text", - "block_id": "XBzX4", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "It’s their choice " - }, - { - "type": "emoji", - "name": "shrug", - "unicode": "1f937" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700246539.228259", - "parent_user_id": "U01DCLP0GU9" - }, - { - "client_msg_id": "1447c2c4-00fc-4539-95c3-53e11be40529", - "type": "message", - "text": "It looks like we can possibly learn from their approach in SQL parsing: ", - "user": "U01DCLP0GU9", - "ts": "1700247083.921989", - "blocks": [ - { - "type": "rich_text", - "block_id": "GZG6c", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "It looks like we can possibly learn from their approach in SQL parsing: " - }, - { - "type": "link", - "url": "https://datahubproject.io/docs/lineage/airflow/#automatic-lineage-extraction" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://datahubproject.io/docs/lineage/airflow/#automatic-lineage-extraction", - "service_icon": "https://datahubproject.io/img/favicon.ico", - "id": 1, - "original_url": "https://datahubproject.io/docs/lineage/airflow/#automatic-lineage-extraction", - "fallback": "Airflow Integration | DataHub", - "text": "If you're looking to schedule DataHub ingestion using Airflow, see the guide on scheduling ingestion with Airflow.", - "title": "Airflow Integration | DataHub", - "title_link": "https://datahubproject.io/docs/lineage/airflow/#automatic-lineage-extraction", - "service_name": "datahubproject.io" - } - ], - "thread_ts": "1700246539.228259", - "parent_user_id": "U01DCLP0GU9" - }, - { - "client_msg_id": "17522ae0-7886-4487-a7a9-d8263e1700cf", - "type": "message", - "text": "what's that approach? I only know they have been claiming best SQL parsing capabilities", - "user": "U02S6F54MAB", - "ts": "1700257371.478449", - "blocks": [ - { - "type": "rich_text", - "block_id": "aj4Ne", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "what's that approach? I only know they have been claiming best SQL parsing capabilities" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700246539.228259", - "parent_user_id": "U01DCLP0GU9" - }, - { - "client_msg_id": "51a9f111-c022-4bbb-8948-d6a775d0ad9c", - "type": "message", - "text": "I haven’t looked in the details but I’m assuming it is in this repo. (my comment is entirely based on the claim here)", - "user": "U01DCLP0GU9", - "ts": "1700272488.934199", - "blocks": [ - { - "type": "rich_text", - "block_id": "rARhz", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I haven’t looked in the details but I’m assuming it is in this repo. (my comment is entirely based on the claim here)" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U01DCLP0GU9", - "ts": "1700272494.000000" - }, - "thread_ts": "1700246539.228259", - "parent_user_id": "U01DCLP0GU9" - }, - { - "client_msg_id": "90186ee8-27f8-4b0f-859a-0760106930b3", - "type": "message", - "text": "`` -> The interesting difference is that in order to find table schemas, they use their data catalog to evaluate column-level lineage instead of doing this on the client side.\n\nMy understanding by example is: If you do\n```create table x as select * from y```\nyou need to resolve `*` to know column level lineage. Our approach is to do that on the client side, probably with an extra call to database. Their approach is to do that based on the data catalog information.", - "user": "U02MK6YNAQ5", - "ts": "1700467087.549929", - "blocks": [ - { - "type": "rich_text", - "block_id": "j2CGA", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://www.acryldata.io/blog/extracting-column-level-lineage-from-sql", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " -> The interesting difference is that in order to find table schemas, they use their data catalog to evaluate column-level lineage instead of doing this on the client side.\n\nMy understanding by example is: If you do\n" - } - ] - }, - { - "type": "rich_text_preformatted", - "elements": [ - { - "type": "text", - "text": "create table x as select * from y" - } - ], - "border": 0 - }, - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "you need to resolve " - }, - { - "type": "text", - "text": "*", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " to know column level lineage. Our approach is to do that on the client side, probably with an extra call to database. Their approach is to do that based on the data catalog information." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700246539.228259", - "parent_user_id": "U01DCLP0GU9" - } - ] - }, - { - "client_msg_id": "290d0b95-e6bc-4529-b446-aa462954f331", - "type": "message", - "text": "CFP for Berlin Buzzwords went up: \nStill over 3 months to submit :slightly_smiling_face:", - "user": "U01RA9B5GG2", - "ts": "1700155042.082759", - "blocks": [ - { - "type": "rich_text", - "block_id": "IY1qh", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "CFP for Berlin Buzzwords went up: " - }, - { - "type": "link", - "url": "https://2024.berlinbuzzwords.de/call-for-papers/" - }, - { - "type": "text", - "text": "\nStill over 3 months to submit " - }, - { - "type": "emoji", - "name": "slightly_smiling_face", - "unicode": "1f642" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700155042.082759", - "reply_count": 3, - "reply_users_count": 2, - "latest_reply": "1700165993.306159", - "reply_users": [ - "U02LXF3HUN7", - "U02S6F54MAB" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1700165993.306159", - "replies": [ - { - "client_msg_id": "59c2bed5-ed72-4bc1-aed1-c86f76e6baf6", - "type": "message", - "text": "thanks, updated the talks board", - "user": "U02LXF3HUN7", - "ts": "1700156576.787649", - "blocks": [ - { - "type": "rich_text", - "block_id": "uZOkT", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "thanks, updated the talks board" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700155042.082759", - "parent_user_id": "U01RA9B5GG2" - }, - { - "client_msg_id": "98d5c5e2-8039-4a6e-9833-9de669c9ab47", - "type": "message", - "text": "", - "user": "U02LXF3HUN7", - "ts": "1700156590.752459", - "blocks": [ - { - "type": "rich_text", - "block_id": "WwA6M", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/orgs/OpenLineage/projects/4/views/1" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700155042.082759", - "parent_user_id": "U01RA9B5GG2" - }, - { - "client_msg_id": "b5cd14ed-3246-4722-9385-24594e0b2d77", - "type": "message", - "text": "I'm in, will think what to talk about and appreciate any advice :slightly_smiling_face:", - "user": "U02S6F54MAB", - "ts": "1700165993.306159", - "blocks": [ - { - "type": "rich_text", - "block_id": "vK7e9", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I'm in, will think what to talk about and appreciate any advice " - }, - { - "type": "emoji", - "name": "slightly_smiling_face", - "unicode": "1f642" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700155042.082759", - "parent_user_id": "U01RA9B5GG2" - } - ] - }, - { - "client_msg_id": "565052e6-10db-43e3-bf54-f56143b67702", - "type": "message", - "text": "worlds are colliding: 6point6 has been acquired by Accenture", - "user": "U02LXF3HUN7", - "ts": "1700145084.414099", - "blocks": [ - { - "type": "rich_text", - "block_id": "Phnmg", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "worlds are colliding: 6point6 has been acquired by Accenture" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700145084.414099", - "reply_count": 5, - "reply_users_count": 3, - "latest_reply": "1700151197.023279", - "reply_users": [ - "U02LXF3HUN7", - "U01RA9B5GG2", - "U01HNKK4XAM" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1700151197.023279", - "replies": [ - { - "client_msg_id": "4e2e079f-ed27-46fc-8b0c-a160525f3d6a", - "type": "message", - "text": "", - "user": "U02LXF3HUN7", - "ts": "1700145119.237459", - "blocks": [ - { - "type": "rich_text", - "block_id": "p2OqU", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://newsroom.accenture.com/news/2023/accenture-to-expand-government-transformation-capabilities-in-the-uk-with-acquisition-of-6point6" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://newsroom.accenture.com/news/2023/accenture-to-expand-government-transformation-capabilities-in-the-uk-with-acquisition-of-6point6", - "image_url": "https://newsroom.accenture.com/news/2023/media_1a9921811b72c2ade3a6d22550b92f42d782dcd12.jpeg?width=1200&format=pjpg&optimize=medium", - "image_width": 1200, - "image_height": 628, - "image_bytes": 60944, - "service_icon": "https://newsroom.accenture.com/favicon.ico", - "id": 1, - "original_url": "https://newsroom.accenture.com/news/2023/accenture-to-expand-government-transformation-capabilities-in-the-uk-with-acquisition-of-6point6", - "fallback": "Accenture to Expand Government Transformation Capabilities in the U.K. with Acquisition of 6point6", - "text": "Accenture has signed an agreement to acquire 6point6, a U.K. technology consultancy, specializing in cloud, data, and cybersecurity.", - "title": "Accenture to Expand Government Transformation Capabilities in the U.K. with Acquisition of 6point6", - "title_link": "https://newsroom.accenture.com/news/2023/accenture-to-expand-government-transformation-capabilities-in-the-uk-with-acquisition-of-6point6", - "service_name": "newsroom.accenture.com" - } - ], - "thread_ts": "1700145084.414099", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "5c7159ef-848b-44ef-b725-c7ffdb68e7a6", - "type": "message", - "text": "We should sell OL to governments", - "user": "U01RA9B5GG2", - "ts": "1700147007.152929", - "blocks": [ - { - "type": "rich_text", - "block_id": "SxApD", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "We should sell OL to governments" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700145084.414099", - "parent_user_id": "U02LXF3HUN7", - "reactions": [ - { - "name": "upside_down_face", - "users": [ - "U01HNKK4XAM" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "3b80d5c4-c44c-42b5-a8be-e29cb54d6524", - "type": "message", - "text": "we may have to rebrand to ClosedLineage", - "user": "U01HNKK4XAM", - "ts": "1700148036.394269", - "blocks": [ - { - "type": "rich_text", - "block_id": "Jxqqu", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "we may have to rebrand to ClosedLineage" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700145084.414099", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "ccf1d225-5ba1-4385-bde0-9e8170f77e0d", - "type": "message", - "text": "not in this way; just emit any event second time to secret NSA endpoint", - "user": "U01RA9B5GG2", - "ts": "1700148217.698289", - "blocks": [ - { - "type": "rich_text", - "block_id": "lCJt9", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "not in this way; just emit any event second time to secret NSA endpoint" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700145084.414099", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "d64e9f16-21eb-4907-a8b2-cce44cdcb728", - "type": "message", - "text": "we would need to improve our stock photo game", - "user": "U02LXF3HUN7", - "ts": "1700151197.023279", - "blocks": [ - { - "type": "rich_text", - "block_id": "HxE/r", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "we would need to improve our stock photo game" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700145084.414099", - "parent_user_id": "U02LXF3HUN7" - } - ] - }, - { - "client_msg_id": "79a4d1c5-4281-413d-910f-3da8d91cb07f", - "type": "message", - "text": "Any opinions about a free task management alternative to the free version of Notion (10-person limit)? Looking at Trello for keeping track of talks.", - "user": "U02LXF3HUN7", - "ts": "1700088623.669029", - "blocks": [ - { - "type": "rich_text", - "block_id": "9mtSc", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Any opinions about a free task management alternative to the free version of Notion (10-person limit)? Looking at Trello for keeping track of talks." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700088623.669029", - "reply_count": 3, - "reply_users_count": 2, - "latest_reply": "1700148214.523069", - "reply_users": [ - "U01HNKK4XAM", - "U02LXF3HUN7" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1700148214.523069", - "replies": [ - { - "client_msg_id": "9815972B-96D0-41B8-B384-BA12F3973205", - "type": "message", - "text": "What about GitHub projects?", - "user": "U01HNKK4XAM", - "ts": "1700094737.005179", - "blocks": [ - { - "type": "rich_text", - "block_id": "QcAzj", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "What about GitHub projects?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700088623.669029", - "parent_user_id": "U02LXF3HUN7", - "reactions": [ - { - "name": "+1", - "users": [ - "U02LXF3HUN7" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "a49931fb-de30-4a1d-868f-c73cfe634843", - "type": "message", - "text": "Projects is the way to go, thanks", - "user": "U02LXF3HUN7", - "ts": "1700144866.061859", - "blocks": [ - { - "type": "rich_text", - "block_id": "w9wSZ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Projects is the way to go, thanks" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700088623.669029", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "a1ce4508-002c-4ff3-91e8-27b9b2c898d0", - "type": "message", - "text": "Set up a Projects board. New projects are private by default. We could make it public. The one thing that’s missing that we could use is a built-in date field for alerting about upcoming deadlines…", - "user": "U02LXF3HUN7", - "ts": "1700148214.523069", - "blocks": [ - { - "type": "rich_text", - "block_id": "eEnTu", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Set up a Projects board. New projects are private by default. We could make it public. The one thing that’s missing that we could use is a built-in date field for alerting about upcoming deadlines…" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700088623.669029", - "parent_user_id": "U02LXF3HUN7", - "reactions": [ - { - "name": "raised_hands", - "users": [ - "U01HNKK4XAM", - "U01RA9B5GG2" - ], - "count": 2 - } - ] - } - ] - }, - { - "client_msg_id": "56357d69-1804-4184-8e84-c307a173b4d7", - "type": "message", - "text": "have we discussed adding column level lineage support to Airflow? ", - "user": "U01DCMDFHBK", - "ts": "1700087546.032789", - "blocks": [ - { - "type": "rich_text", - "block_id": "/WCoQ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "have we discussed adding column level lineage support to Airflow? " - }, - { - "type": "link", - "url": "https://marquezproject.slack.com/archives/C01E8MQGJP7/p1700087438599279?thread_ts=1700084629.245949&cid=C01E8MQGJP7" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700087546.032789", - "reply_count": 3, - "reply_users_count": 2, - "latest_reply": "1700087719.745889", - "reply_users": [ - "U02S6F54MAB", - "U01DCMDFHBK" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "b6c39a4c-39b1-4580-be60-d8a545e0e4c0", - "type": "message", - "text": "we have it in SQL operators", - "user": "U02S6F54MAB", - "ts": "1700087599.744449", - "blocks": [ - { - "type": "rich_text", - "block_id": "UED2i", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "we have it in SQL operators" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700087546.032789", - "parent_user_id": "U01DCMDFHBK" - }, - { - "client_msg_id": "e94cf9ac-c1f8-4174-afc4-c9e721d069ae", - "type": "message", - "text": "OOh any docs / code? or if you’d like to respond in the MQZ slack :pray:", - "user": "U01DCMDFHBK", - "ts": "1700087665.101029", - "blocks": [ - { - "type": "rich_text", - "block_id": "xEi1G", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "OOh any docs / code? or if you’d like to respond in the MQZ slack " - }, - { - "type": "emoji", - "name": "pray", - "unicode": "1f64f" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700087546.032789", - "parent_user_id": "U01DCMDFHBK" - }, - { - "client_msg_id": "d8406883-16b7-4722-a5a5-c4865a9d2cea", - "type": "message", - "text": "I’ll reply there", - "user": "U02S6F54MAB", - "ts": "1700087719.745889", - "blocks": [ - { - "type": "rich_text", - "block_id": "lm1RD", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I’ll reply there" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700087546.032789", - "parent_user_id": "U01DCMDFHBK", - "reactions": [ - { - "name": "heart", - "users": [ - "U01DCMDFHBK", - "U01HNKK4XAM" - ], - "count": 2 - } - ] - } - ] - }, - { - "client_msg_id": "1c768cec-a0dc-49c7-b912-53c6f5b2674b", - "type": "message", - "text": "Apparently an admin can view a Slack archive at any time at this URL: . Only public channels are available, though.", - "user": "U02LXF3HUN7", - "ts": "1700078359.877599", - "blocks": [ - { - "type": "rich_text", - "block_id": "kIAVN", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Apparently an admin can view a Slack archive at any time at this URL: " - }, - { - "type": "link", - "url": "https://openlineage.slack.com/services/export", - "text": "https://openlineage.slack.com/services/export" - }, - { - "type": "text", - "text": ". Only public channels are available, though." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700078359.877599", - "reply_count": 1, - "reply_users_count": 1, - "latest_reply": "1700085189.085139", - "reply_users": [ - "U01DCLP0GU9" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1700085189.085139", - "replies": [ - { - "client_msg_id": "345bfb5e-683e-461a-88ee-49e319b9bab1", - "type": "message", - "text": "you are now admin", - "user": "U01DCLP0GU9", - "ts": "1700085189.085139", - "blocks": [ - { - "type": "rich_text", - "block_id": "UEiAC", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "you are now admin" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700078359.877599", - "parent_user_id": "U02LXF3HUN7", - "reactions": [ - { - "name": "+1", - "users": [ - "U02LXF3HUN7" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "cd28f84f-6bdd-4095-9149-2979383a7e9c", - "type": "message", - "text": "Anyone have thoughts about how to address the question about “pain points” here? . (Listing pros is easy — it’s the cons we don’t have boilerplate for)", - "user": "U02LXF3HUN7", - "ts": "1700078230.775579", - "blocks": [ - { - "type": "rich_text", - "block_id": "MocUx", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Anyone have thoughts about how to address the question about “pain points” here? " - }, - { - "type": "link", - "url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1700064564825909", - "text": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1700064564825909" - }, - { - "type": "text", - "text": ". (Listing pros is easy — it’s the cons we don’t have boilerplate for)" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "from_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1700064564825909", - "ts": "1700064564.825909", - "author_id": "U066HKFCHUG", - "channel_id": "C01CK9T7HKR", - "channel_team": "T01CWUYP5AR", - "is_msg_unfurl": true, - "message_blocks": [ - { - "team": "T01CWUYP5AR", - "channel": "C01CK9T7HKR", - "ts": "1700064564.825909", - "message": { - "blocks": [ - { - "type": "rich_text", - "block_id": "V52kz", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Can anyone tell me why OL is better than other competitors if you can provide an analysis that would be great" - } - ] - } - ] - } - ] - } - } - ], - "id": 1, - "original_url": "https://openlineage.slack.com/archives/C01CK9T7HKR/p1700064564825909", - "fallback": "[November 15th, 2023 8:09 AM] naresh.naresh36: Can anyone tell me why OL is better than other competitors if you can provide an analysis that would be great", - "text": "Can anyone tell me why OL is better than other competitors if you can provide an analysis that would be great", - "author_name": "Naresh reddy", - "author_link": "https://openlineage.slack.com/team/U066HKFCHUG", - "author_icon": "https://avatars.slack-edge.com/2023-11-15/6192035069510_91e664fce7c3faeee1e7_48.jpg", - "author_subname": "Naresh reddy", - "mrkdwn_in": [ - "text" - ], - "footer": "Slack Conversation" - } - ], - "thread_ts": "1700078230.775579", - "reply_count": 2, - "reply_users_count": 2, - "latest_reply": "1700082291.373659", - "reply_users": [ - "U02LXF3HUN7", - "U01RA9B5GG2" - ], - "is_locked": false, - "subscribed": true, - "last_read": "1700082291.373659", - "replies": [ - { - "client_msg_id": "31c986c9-ecb0-4f3d-b0ef-11bc56dc0ea5", - "type": "message", - "text": "Maybe something like “OL has many desirable integrations, including a best-in-class Spark integration, but it’s like any other open standard in that it requires contributions in order to approach total coverage. Thankfully, we have many active contributors, and integrations are being added or improved upon all the time.”", - "user": "U02LXF3HUN7", - "ts": "1700078288.125819", - "blocks": [ - { - "type": "rich_text", - "block_id": "0bx39", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Maybe something like “OL has many desirable integrations, including a best-in-class Spark integration, but it’s like any other open standard in that it requires contributions in order to approach total coverage. Thankfully, we have many active contributors, and integrations are being added or improved upon all the time.”" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700078230.775579", - "parent_user_id": "U02LXF3HUN7" - }, - { - "client_msg_id": "16f88fa1-980e-4cac-bba8-fd2592cc3cf2", - "type": "message", - "text": "Maybe rephrase pain points to \"something we're not actively focusing on\"", - "user": "U01RA9B5GG2", - "ts": "1700082291.373659", - "blocks": [ - { - "type": "rich_text", - "block_id": "EqOgW", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Maybe rephrase pain points to \"something we're not actively focusing on\"" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700078230.775579", - "parent_user_id": "U02LXF3HUN7" - } - ] - }, - { - "type": "message", - "text": "is it time to *support hudi*?", - "files": [ - { - "id": "F065PUT9SRL", - "created": 1700068641, - "timestamp": 1700068641, - "name": "image.png", - "title": "image.png", - "mimetype": "image/png", - "filetype": "png", - "pretty_type": "PNG", - "user": "U02S6F54MAB", - "user_team": "T01CWUYP5AR", - "editable": false, - "size": 644916, - "mode": "hosted", - "is_external": false, - "external_type": "", - "is_public": true, - "public_url_shared": false, - "display_as_bot": false, - "username": "", - "url_private": "https://files.slack.com/files-pri/T01CWUYP5AR-F065PUT9SRL/image.png", - "url_private_download": "https://files.slack.com/files-pri/T01CWUYP5AR-F065PUT9SRL/download/image.png", - "media_display_type": "unknown", - "thumb_64": "https://files.slack.com/files-tmb/T01CWUYP5AR-F065PUT9SRL-cf00de4cfd/image_64.png", - "thumb_80": "https://files.slack.com/files-tmb/T01CWUYP5AR-F065PUT9SRL-cf00de4cfd/image_80.png", - "thumb_360": "https://files.slack.com/files-tmb/T01CWUYP5AR-F065PUT9SRL-cf00de4cfd/image_360.png", - "thumb_360_w": 360, - "thumb_360_h": 300, - "thumb_480": "https://files.slack.com/files-tmb/T01CWUYP5AR-F065PUT9SRL-cf00de4cfd/image_480.png", - "thumb_480_w": 480, - "thumb_480_h": 400, - "thumb_160": "https://files.slack.com/files-tmb/T01CWUYP5AR-F065PUT9SRL-cf00de4cfd/image_160.png", - "thumb_720": "https://files.slack.com/files-tmb/T01CWUYP5AR-F065PUT9SRL-cf00de4cfd/image_720.png", - "thumb_720_w": 720, - "thumb_720_h": 600, - "thumb_800": "https://files.slack.com/files-tmb/T01CWUYP5AR-F065PUT9SRL-cf00de4cfd/image_800.png", - "thumb_800_w": 800, - "thumb_800_h": 666, - "thumb_960": "https://files.slack.com/files-tmb/T01CWUYP5AR-F065PUT9SRL-cf00de4cfd/image_960.png", - "thumb_960_w": 960, - "thumb_960_h": 800, - "thumb_1024": "https://files.slack.com/files-tmb/T01CWUYP5AR-F065PUT9SRL-cf00de4cfd/image_1024.png", - "thumb_1024_w": 1024, - "thumb_1024_h": 853, - "original_w": 1174, - "original_h": 978, - "thumb_tiny": "AwAnADCmZy4QEKNi4FNZhjn9KbEQC2e444p+Pb9KAI8j1P5Uny+p/KpDnsufwpvz+h/KgBvy+p/Kg47E/lTvn9D+VNKt1IP5UAXNKOLlieyH+laYbOcSk/h0rM0r/j6P+4f6VrjdQAb19aaZkBwTj8Kd81KM0xCB1IyDxTLkg2suOfkP8qJSQRzio5CTbT5OfkP8qLBfWxm6Y225PqVOK1xJxzjP41jaf/x9D6GtWkMlMnPGMUCQd8VFRQA5XLKC6qG/Om3DqLaXoMqegoqG6/49pP8AdoA//9k=", - "permalink": "https://openlineage.slack.com/files/U02S6F54MAB/F065PUT9SRL/image.png", - "permalink_public": "https://slack-files.com/T01CWUYP5AR-F065PUT9SRL-56a680662b", - "is_starred": false, - "has_rich_preview": false, - "file_access": "visible" - } - ], - "upload": false, - "user": "U02S6F54MAB", - "display_as_bot": false, - "ts": "1700068651.517579", - "blocks": [ - { - "type": "rich_text", - "block_id": "bDAOH", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "is it time to " - }, - { - "type": "text", - "text": "support hudi", - "style": { - "bold": true - } - }, - { - "type": "text", - "text": "?" - } - ] - } - ] - } - ], - "client_msg_id": "d5da62dd-e797-486b-9aae-a5acdbe4a1c8", - "reactions": [ - { - "name": "joy", - "users": [ - "U01HNKK4XAM" - ], - "count": 1 - } - ] - }, - { - "client_msg_id": "fdc57b50-c9cf-4b13-be38-681a0a2cbfeb", - "type": "message", - "text": "Got the doc + poc for hook-level coverage: ", - "user": "U01RA9B5GG2", - "ts": "1700066684.350639", - "blocks": [ - { - "type": "rich_text", - "block_id": "gkEfy", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Got the doc + poc for hook-level coverage: " - }, - { - "type": "link", - "url": "https://docs.google.com/document/d/1q0shiUxopASO8glgMqjDn89xigJnGrQuBMbcRdolUdk/edit?usp=sharing" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700066684.350639", - "reply_count": 2, - "reply_users_count": 2, - "latest_reply": "1700069197.574369", - "reply_users": [ - "U02S6F54MAB", - "U01RA9B5GG2" - ], - "is_locked": false, - "subscribed": false, - "reactions": [ - { - "name": "eyes", - "users": [ - "U02S6F54MAB" - ], - "count": 1 - } - ], - "replies": [ - { - "client_msg_id": "d1c376d9-efa7-4a7e-9d43-4b7864c43c99", - "type": "message", - "text": "did you check if `LineageCollector` is instantiated once per process?", - "user": "U02S6F54MAB", - "ts": "1700069067.090359", - "blocks": [ - { - "type": "rich_text", - "block_id": "AbONi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "did you check if " - }, - { - "type": "text", - "text": "LineageCollector", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " is instantiated once per process?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U02S6F54MAB", - "ts": "1700069074.000000" - }, - "thread_ts": "1700066684.350639", - "parent_user_id": "U01RA9B5GG2" - }, - { - "client_msg_id": "b0881b41-f5e0-43de-83ce-0c6b28480e59", - "type": "message", - "text": "Using it only via `get_hook_lineage_collector`", - "user": "U01RA9B5GG2", - "ts": "1700069197.574369", - "blocks": [ - { - "type": "rich_text", - "block_id": "2tsBk", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Using it only via " - }, - { - "type": "text", - "text": "get_hook_lineage_collector", - "style": { - "code": true - } - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700066684.350639", - "parent_user_id": "U01RA9B5GG2" - } - ] - }, - { - "type": "message", - "subtype": "thread_broadcast", - "text": "hey look, more fun\n", - "user": "U02S6F54MAB", - "ts": "1700040937.040239", - "thread_ts": "1700004648.584649", - "root": { - "client_msg_id": "e27ec408-535d-4a9b-a41b-c848a8cb286a", - "type": "message", - "text": "\nfun PR incoming", - "user": "U02S6F54MAB", - "ts": "1700004648.584649", - "blocks": [ - { - "type": "rich_text", - "block_id": "xxBfv", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2260" - }, - { - "type": "text", - "text": "\nfun PR incoming" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700004648.584649", - "reply_count": 5, - "reply_users_count": 4, - "latest_reply": "1700066753.852469", - "reply_users": [ - "U02S6F54MAB", - "U01RA9B5GG2", - "U02MK6YNAQ5", - "U01HNKK4XAM" - ], - "is_locked": false, - "subscribed": false - }, - "blocks": [ - { - "type": "rich_text", - "block_id": "yGPSQ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "hey look, more fun\n" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2263" - } - ] - } - ] - } - ], - "client_msg_id": "9fbbabe3-ab1f-4f16-8e55-28f574dd7eb5" - }, - { - "client_msg_id": "12cbe1a1-4a86-4be0-9725-2e7a0b053ea0", - "type": "message", - "text": "also, what about this PR? ", - "user": "U01DCMDFHBK", - "ts": "1700037370.235629", - "blocks": [ - { - "type": "rich_text", - "block_id": "8bSTa", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "also, what about this PR? " - }, - { - "type": "link", - "url": "https://github.com/MarquezProject/marquez/pull/2654" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1697557750, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/MarquezProject/marquez/pull/2654", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2654 Runless events - refactor job_versions_io_mapping", - "text": "*Problem*\n\nThis is currently a draft PR which is far from being merged. It is missing few tests related to schema changes which are marked with `todo` within the code. I've created a PR to have a better discussion on adding `job_id` to `job_versions_io_mapping`. This PR should be a follow-up of .\n\nThe assumption was that it should be helpful in optimising get-lineage query. I would like first to clarify how are we going to make benefit of this extra column.\n\n*Solution*\n\nPlease describe your change as it relates to the problem, or bug fix, as well as any dependencies. If your change requires a database schema migration, please describe the schema modification(s) and whether it's a _backwards-incompatible_ or _backwards-compatible_ change.\n\n> *Note:* All database schema changes require discussion. Please for context.\n\nOne-line summary:\n\n*Checklist*\n\n☐ You've your work\n☐ Your changes are accompanied by tests (_if relevant_)\n☐ Your change contains a and is self-contained\n☐ You've updated any relevant documentation (_if relevant_)\n☐ You've included a one-line summary of your change for the (_Depending on the change, this may not be necessary_).\n☐ You've versioned your `.sql` database schema migration according to (_if relevant_)\n☐ You've included a in any source code files (_if relevant_)", - "title": "#2654 Runless events - refactor job_versions_io_mapping", - "title_link": "https://github.com/MarquezProject/marquez/pull/2654", - "footer": "", - "fields": [ - { - "value": "docs, api", - "title": "Labels", - "short": true - }, - { - "value": "4", - "title": "Comments", - "short": true - } - ], - "mrkdwn_in": [ - "text" - ] - } - ], - "thread_ts": "1700037370.235629", - "reply_count": 4, - "reply_users_count": 1, - "latest_reply": "1700037512.239479", - "reply_users": [ - "U02MK6YNAQ5" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "693813d2-d928-4d99-bd74-95c0261714d4", - "type": "message", - "text": "this is the next to go", - "user": "U02MK6YNAQ5", - "ts": "1700037393.838709", - "blocks": [ - { - "type": "rich_text", - "block_id": "8GVNn", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "this is the next to go" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700037370.235629", - "parent_user_id": "U01DCMDFHBK" - }, - { - "client_msg_id": "cc02b08a-7cec-487a-a5d1-2cbad675b35e", - "type": "message", - "text": "and i consider it ready", - "user": "U02MK6YNAQ5", - "ts": "1700037398.104819", - "blocks": [ - { - "type": "rich_text", - "block_id": "lxJ9f", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "and i consider it ready" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700037370.235629", - "parent_user_id": "U01DCMDFHBK" - }, - { - "client_msg_id": "ce36fb29-57bb-475a-b7da-4f979ab4c96f", - "type": "message", - "text": "Then we have a draft one with streaming support -> which has an integration test of lineage endpoint working for streaming jobs", - "user": "U02MK6YNAQ5", - "ts": "1700037451.758639", - "blocks": [ - { - "type": "rich_text", - "block_id": "sO/AE", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Then we have a draft one with streaming support " - }, - { - "type": "link", - "url": "https://github.com/MarquezProject/marquez/pull/2682/files" - }, - { - "type": "text", - "text": " -> which has an integration test of lineage endpoint working for streaming jobs" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700037370.235629", - "parent_user_id": "U01DCMDFHBK" - }, - { - "client_msg_id": "192e703d-06e9-4072-a252-9fd7d8cbdace", - "type": "message", - "text": "I still need to work on #2682 but you can review #2654. once you get some sleep, of course :wink:", - "user": "U02MK6YNAQ5", - "ts": "1700037512.239479", - "blocks": [ - { - "type": "rich_text", - "block_id": "jWaV5", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I still need to work on #2682 but you can review #2654. once you get some sleep, of course " - }, - { - "type": "emoji", - "name": "wink", - "unicode": "1f609" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700037370.235629", - "parent_user_id": "U01DCMDFHBK", - "reactions": [ - { - "name": "heart", - "users": [ - "U01DCMDFHBK" - ], - "count": 1 - } - ] - } - ] - }, - { - "client_msg_id": "00f5d485-d95e-4616-9c02-57c0fceaf671", - "type": "message", - "text": "`_Minor_: We can consider defining a _run_state column and eventually dropping the event_type. That is, we can consider columns prefixed with _ to be \"remappings\" of OL properties to Marquez.` -> didn't get this one. Is it for now or some future plans?", - "user": "U02MK6YNAQ5", - "ts": "1700037282.474539", - "blocks": [ - { - "type": "rich_text", - "block_id": "iPdJ9", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Minor", - "style": { - "italic": true, - "code": true - } - }, - { - "type": "text", - "text": ": We can consider defining a _run_state column and eventually dropping the event_type. That is, we can consider columns prefixed with _ to be \"remappings\" of OL properties to Marquez.", - "style": { - "code": true - } - }, - { - "type": "text", - "text": " -> didn't get this one. Is it for now or some future plans?" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700037282.474539", - "reply_count": 3, - "reply_users_count": 2, - "latest_reply": "1700037383.673919", - "reply_users": [ - "U01DCMDFHBK", - "U02MK6YNAQ5" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "client_msg_id": "8a385af8-864b-44d0-b0d7-92b750f37d59", - "type": "message", - "text": "future :wink:", - "user": "U01DCMDFHBK", - "ts": "1700037362.060349", - "blocks": [ - { - "type": "rich_text", - "block_id": "te6E7", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "future " - }, - { - "type": "emoji", - "name": "wink", - "unicode": "1f609" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700037282.474539", - "parent_user_id": "U02MK6YNAQ5" - }, - { - "client_msg_id": "53f83f97-23da-45cf-81ae-2282be94ce49", - "type": "message", - "text": "ok", - "user": "U02MK6YNAQ5", - "ts": "1700037370.551639", - "blocks": [ - { - "type": "rich_text", - "block_id": "luWBi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "ok" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700037282.474539", - "parent_user_id": "U02MK6YNAQ5" - }, - { - "client_msg_id": "602b1385-218e-4076-be7b-f2b03a428b17", - "type": "message", - "text": "I will then replace enum with string", - "user": "U02MK6YNAQ5", - "ts": "1700037383.673919", - "blocks": [ - { - "type": "rich_text", - "block_id": "xUGoP", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "I will then replace enum with string" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700037282.474539", - "parent_user_id": "U02MK6YNAQ5" - } - ] - }, - { - "client_msg_id": "5d39e88b-3350-4baa-9474-38a624950aa7", - "type": "message", - "text": "<@U02MK6YNAQ5> approved PR with minor comments, I think the is one comment we’ll need to address before merging; otherwise solid work dude :ok_hand:", - "user": "U01DCMDFHBK", - "ts": "1700037147.106879", - "blocks": [ - { - "type": "rich_text", - "block_id": "ktmE8", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "user", - "user_id": "U02MK6YNAQ5" - }, - { - "type": "text", - "text": " approved PR " - }, - { - "type": "link", - "url": "https://github.com/MarquezProject/marquez/pull/2661", - "text": "#2661", - "unsafe": true - }, - { - "type": "text", - "text": " with minor comments, I think the " - }, - { - "type": "link", - "url": "https://github.com/MarquezProject/marquez/pull/2661/files#r1393820409", - "text": "enum defined in the db layer" - }, - { - "type": "text", - "text": " is one comment we’ll need to address before merging; otherwise solid work dude " - }, - { - "type": "emoji", - "name": "ok_hand", - "unicode": "1f44c" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "edited": { - "user": "U01DCMDFHBK", - "ts": "1700037165.000000" - }, - "attachments": [ - { - "id": 1, - "footer_icon": "https://slack.github.com/static/img/favicon-neutral.png", - "ts": 1698327731, - "color": "36a64f", - "bot_id": "B01VA0FB340", - "app_unfurl_url": "https://github.com/MarquezProject/marquez/pull/2661", - "is_app_unfurl": true, - "app_id": "A01BP7R4KNY", - "fallback": "#2661 Runless events - consume job event", - "title": "#2661 Runless events - consume job event", - "title_link": "https://github.com/MarquezProject/marquez/pull/2661", - "footer": "", - "mrkdwn_in": [ - "text" - ] - } - ], - "reactions": [ - { - "name": "raised_hands", - "users": [ - "U02MK6YNAQ5", - "U01HNKK4XAM" - ], - "count": 2 - } - ] - }, - { - "client_msg_id": "e27ec408-535d-4a9b-a41b-c848a8cb286a", - "type": "message", - "text": "\nfun PR incoming", - "user": "U02S6F54MAB", - "ts": "1700004648.584649", - "blocks": [ - { - "type": "rich_text", - "block_id": "xxBfv", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2260" - }, - { - "type": "text", - "text": "\nfun PR incoming" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700004648.584649", - "reply_count": 5, - "reply_users_count": 4, - "latest_reply": "1700066753.852469", - "reply_users": [ - "U02S6F54MAB", - "U01RA9B5GG2", - "U02MK6YNAQ5", - "U01HNKK4XAM" - ], - "is_locked": false, - "subscribed": false, - "replies": [ - { - "type": "message", - "subtype": "thread_broadcast", - "text": "hey look, more fun\n", - "user": "U02S6F54MAB", - "ts": "1700040937.040239", - "thread_ts": "1700004648.584649", - "root": { - "client_msg_id": "e27ec408-535d-4a9b-a41b-c848a8cb286a", - "type": "message", - "text": "\nfun PR incoming", - "user": "U02S6F54MAB", - "ts": "1700004648.584649", - "blocks": [ - { - "type": "rich_text", - "block_id": "xxBfv", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2260" - }, - { - "type": "text", - "text": "\nfun PR incoming" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700004648.584649", - "reply_count": 5, - "reply_users_count": 4, - "latest_reply": "1700066753.852469", - "reply_users": [ - "U02S6F54MAB", - "U01RA9B5GG2", - "U02MK6YNAQ5", - "U01HNKK4XAM" - ], - "is_locked": false, - "subscribed": false - }, - "blocks": [ - { - "type": "rich_text", - "block_id": "yGPSQ", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "hey look, more fun\n" - }, - { - "type": "link", - "url": "https://github.com/OpenLineage/OpenLineage/pull/2263" - } - ] - } - ] - } - ], - "client_msg_id": "9fbbabe3-ab1f-4f16-8e55-28f574dd7eb5" - }, - { - "client_msg_id": "789d839b-2827-4542-b060-d72e2c135e3d", - "type": "message", - "text": "nice to have fun with you Jakub", - "user": "U01RA9B5GG2", - "ts": "1700042638.003809", - "blocks": [ - { - "type": "rich_text", - "block_id": "L8iP2", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "nice to have fun with you Jakub" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700004648.584649", - "parent_user_id": "U02S6F54MAB", - "reactions": [ - { - "name": "slightly_smiling_face", - "users": [ - "U02S6F54MAB", - "U01HNKK4XAM", - "U01DCMDFHBK" - ], - "count": 3 - } - ] - }, - { - "client_msg_id": "c388a688-15b4-4720-98c2-da976acec0a5", - "type": "message", - "text": "Can't wait to see it on the 1st January.", - "user": "U02MK6YNAQ5", - "ts": "1700044954.083129", - "blocks": [ - { - "type": "rich_text", - "block_id": "zN7Xj", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Can't wait to see it on the 1st January." - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700004648.584649", - "parent_user_id": "U02S6F54MAB" - }, - { - "client_msg_id": "1340B438-D041-4EAF-ADB3-604A3F1D302F", - "type": "message", - "text": "Ain’t no party like a dev ex improvement party", - "user": "U01HNKK4XAM", - "ts": "1700049363.338799", - "blocks": [ - { - "type": "rich_text", - "block_id": "6Yj06", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Ain’t" - }, - { - "type": "text", - "text": " no party like a dev ex improvement party" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700004648.584649", - "parent_user_id": "U02S6F54MAB" - }, - { - "client_msg_id": "35ba5050-45f2-49f6-9126-aaa37a682593", - "type": "message", - "text": "Gentoo installation party is in similar category of fun", - "user": "U01RA9B5GG2", - "ts": "1700066753.852469", - "blocks": [ - { - "type": "rich_text", - "block_id": "vIT3H", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "text", - "text": "Gentoo installation party is in similar category of fun" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR", - "thread_ts": "1700004648.584649", - "parent_user_id": "U02S6F54MAB" - } - ] - }, - { - "client_msg_id": "60159e06-de81-4b43-9edf-f404c32ddd38", - "type": "message", - "text": ":wave:", - "user": "U01RA9B5GG2", - "ts": "1699987988.642529", - "blocks": [ - { - "type": "rich_text", - "block_id": "YMxTi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "emoji", - "name": "wave", - "unicode": "1f44b" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "c329e984-0446-4e18-8fcf-74d020e30699", - "type": "message", - "text": ":ocean:", - "user": "U053LLVTHRN", - "ts": "1699982987.125549", - "blocks": [ - { - "type": "rich_text", - "block_id": "2BlqX", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "emoji", - "name": "ocean", - "unicode": "1f30a" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "2aa5fdfa-5543-407e-a314-7623ef0c0763", - "type": "message", - "text": ":wave:", - "user": "U01DCMDFHBK", - "ts": "1699982333.485469", - "blocks": [ - { - "type": "rich_text", - "block_id": "YMxTi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "emoji", - "name": "wave", - "unicode": "1f44b" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "818CA217-10D2-492F-B387-499128576860", - "type": "message", - "text": ":wave: ", - "user": "U01DCLP0GU9", - "ts": "1699982322.990799", - "blocks": [ - { - "type": "rich_text", - "block_id": "msKVM", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "emoji", - "name": "wave", - "unicode": "1f44b" - }, - { - "type": "text", - "text": " " - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "client_msg_id": "4225176b-1766-4f0f-9d34-e2c808164b02", - "type": "message", - "text": ":wave:", - "user": "U02LXF3HUN7", - "ts": "1699982179.651129", - "blocks": [ - { - "type": "rich_text", - "block_id": "YMxTi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "emoji", - "name": "wave", - "unicode": "1f44b" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1699982042.414079", - "user": "U053LLVTHRN", - "text": "<@U053LLVTHRN> has joined the channel", - "inviter": "U01HNKK4XAM" - }, - { - "client_msg_id": "0bea58f3-10b1-4972-ad6a-0a002ae61854", - "type": "message", - "text": ":wave:", - "user": "U02S6F54MAB", - "ts": "1699982037.267109", - "blocks": [ - { - "type": "rich_text", - "block_id": "YMxTi", - "elements": [ - { - "type": "rich_text_section", - "elements": [ - { - "type": "emoji", - "name": "wave", - "unicode": "1f44b" - } - ] - } - ] - } - ], - "team": "T01CWUYP5AR" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1699982026.646699", - "user": "U05KKM07PJP", - "text": "<@U05KKM07PJP> has joined the channel", - "inviter": "U01HNKK4XAM" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1699982026.554329", - "user": "U01DCMDFHBK", - "text": "<@U01DCMDFHBK> has joined the channel", - "inviter": "U01HNKK4XAM" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1699982026.462039", - "user": "U02LXF3HUN7", - "text": "<@U02LXF3HUN7> has joined the channel", - "inviter": "U01HNKK4XAM" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1699982026.350149", - "user": "U02S6F54MAB", - "text": "<@U02S6F54MAB> has joined the channel", - "inviter": "U01HNKK4XAM" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1699982026.266479", - "user": "U02MK6YNAQ5", - "text": "<@U02MK6YNAQ5> has joined the channel", - "inviter": "U01HNKK4XAM" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1699982026.191829", - "user": "U01DCLP0GU9", - "text": "<@U01DCLP0GU9> has joined the channel", - "inviter": "U01HNKK4XAM" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1699981990.850589", - "user": "U01RA9B5GG2", - "text": "<@U01RA9B5GG2> has joined the channel", - "inviter": "U01HNKK4XAM" - }, - { - "type": "message", - "subtype": "channel_join", - "ts": "1699981986.459199", - "user": "U01HNKK4XAM", - "text": "<@U01HNKK4XAM> has joined the channel" - } -] \ No newline at end of file diff --git a/slack-archive/data/channels.json b/slack-archive/data/channels.json deleted file mode 100644 index 4248056..0000000 --- a/slack-archive/data/channels.json +++ /dev/null @@ -1,570 +0,0 @@ -[ - { - "id": "C01CK9T7HKR", - "name": "general", - "is_channel": true, - "is_group": false, - "is_im": false, - "is_mpim": false, - "is_private": false, - "created": 1603242062, - "is_archived": false, - "is_general": true, - "unlinked": 0, - "name_normalized": "general", - "is_shared": false, - "is_org_shared": false, - "is_pending_ext_shared": false, - "pending_shared": [], - "context_team_id": "T01CWUYP5AR", - "updated": 1690953251304, - "parent_conversation": null, - "creator": "U01DCLP0GU9", - "is_ext_shared": false, - "shared_team_ids": [ - "T01CWUYP5AR" - ], - "pending_connected_team_ids": [], - "is_member": true, - "topic": { - "value": "OpenLineage spec discussion.\nSee also: \n2023 Ecosystem Survey: ", - "creator": "U02LXF3HUN7", - "last_set": 1684872900 - }, - "purpose": { - "value": "This is the one channel that will always include everyone. It’s a great spot for announcements and team-wide conversations.", - "creator": "U01DCLP0GU9", - "last_set": 1603242062 - }, - "properties": { - "canvas": { - "file_id": "F05KFAKSTHV", - "is_empty": true, - "quip_thread_id": "BZB9AAVq0C6" - } - }, - "previous_names": [], - "num_members": 1189 - }, - { - "id": "C01NAFMBVEY", - "name": "mark-grover", - "is_channel": true, - "is_group": false, - "is_im": false, - "is_mpim": false, - "is_private": false, - "created": 1613391683, - "is_archived": true, - "is_general": false, - "unlinked": 1613391683, - "name_normalized": "mark-grover", - "is_shared": false, - "is_org_shared": false, - "is_pending_ext_shared": false, - "pending_shared": [], - "context_team_id": "T01CWUYP5AR", - "updated": 1613391683292, - "frozen_reason": "connection_severed", - "parent_conversation": null, - "creator": "U01HNSHB2H4", - "is_ext_shared": false, - "shared_team_ids": [ - "T01CWUYP5AR" - ], - "pending_connected_team_ids": [], - "is_member": false, - "topic": { - "value": "", - "creator": "", - "last_set": 0 - }, - "purpose": { - "value": "", - "creator": "", - "last_set": 0 - }, - "previous_names": [], - "num_members": 0 - }, - { - "id": "C030F1J0264", - "name": "dagster-integration", - "is_channel": true, - "is_group": false, - "is_im": false, - "is_mpim": false, - "is_private": false, - "created": 1642818372, - "is_archived": false, - "is_general": false, - "unlinked": 0, - "name_normalized": "dagster-integration", - "is_shared": false, - "is_org_shared": false, - "is_pending_ext_shared": false, - "pending_shared": [], - "context_team_id": "T01CWUYP5AR", - "updated": 1642818373220, - "parent_conversation": null, - "creator": "U025D1JDTRB", - "is_ext_shared": false, - "shared_team_ids": [ - "T01CWUYP5AR" - ], - "pending_connected_team_ids": [], - "is_member": true, - "topic": { - "value": "", - "creator": "", - "last_set": 0 - }, - "purpose": { - "value": "OpenLineage-Dagster integration discussion", - "creator": "U025D1JDTRB", - "last_set": 1642818373 - }, - "previous_names": [], - "num_members": 47 - }, - { - "id": "C04E3Q18RR9", - "name": "open-lineage-plus-bacalhau", - "is_channel": true, - "is_group": false, - "is_im": false, - "is_mpim": false, - "is_private": false, - "created": 1670404449, - "is_archived": false, - "is_general": false, - "unlinked": 0, - "name_normalized": "open-lineage-plus-bacalhau", - "is_shared": false, - "is_org_shared": false, - "is_pending_ext_shared": false, - "pending_shared": [], - "context_team_id": "T01CWUYP5AR", - "updated": 1670404449626, - "parent_conversation": null, - "creator": "U044VPCNMDX", - "is_ext_shared": false, - "shared_team_ids": [ - "T01CWUYP5AR" - ], - "pending_connected_team_ids": [], - "is_member": false, - "topic": { - "value": "", - "creator": "", - "last_set": 0 - }, - "purpose": { - "value": "", - "creator": "", - "last_set": 0 - }, - "previous_names": [], - "num_members": 10 - }, - { - "id": "C04JPTTC876", - "name": "spec-compliance", - "is_channel": true, - "is_group": false, - "is_im": false, - "is_mpim": false, - "is_private": false, - "created": 1673531750, - "is_archived": false, - "is_general": false, - "unlinked": 0, - "name_normalized": "spec-compliance", - "is_shared": false, - "is_org_shared": false, - "is_pending_ext_shared": false, - "pending_shared": [], - "context_team_id": "T01CWUYP5AR", - "updated": 1673550344241, - "parent_conversation": null, - "creator": "U01DCLP0GU9", - "is_ext_shared": false, - "shared_team_ids": [ - "T01CWUYP5AR" - ], - "pending_connected_team_ids": [], - "is_member": true, - "topic": { - "value": "Key points/decisions -\n", - "creator": "U0323HG8C8H", - "last_set": 1673550344 - }, - "purpose": { - "value": "", - "creator": "", - "last_set": 0 - }, - "previous_names": [], - "num_members": 25 - }, - { - "id": "C04QSV0GG23", - "name": "providence-meetup", - "is_channel": true, - "is_group": false, - "is_im": false, - "is_mpim": false, - "is_private": false, - "created": 1677267471, - "is_archived": true, - "is_general": false, - "unlinked": 0, - "name_normalized": "providence-meetup", - "is_shared": false, - "is_org_shared": false, - "is_pending_ext_shared": false, - "pending_shared": [], - "context_team_id": "T01CWUYP5AR", - "updated": 1680640801888, - "parent_conversation": null, - "creator": "U02LXF3HUN7", - "is_ext_shared": false, - "shared_team_ids": [ - "T01CWUYP5AR" - ], - "pending_connected_team_ids": [], - "is_member": false, - "topic": { - "value": "", - "creator": "", - "last_set": 0 - }, - "purpose": { - "value": "More info: \nSign up: ", - "creator": "U02LXF3HUN7", - "last_set": 1677269229 - }, - "previous_names": [], - "num_members": 0 - }, - { - "id": "C04THH1V90X", - "name": "data-council-meetup", - "is_channel": true, - "is_group": false, - "is_im": false, - "is_mpim": false, - "is_private": false, - "created": 1678822970, - "is_archived": false, - "is_general": false, - "unlinked": 0, - "name_normalized": "data-council-meetup", - "is_shared": false, - "is_org_shared": false, - "is_pending_ext_shared": false, - "pending_shared": [], - "context_team_id": "T01CWUYP5AR", - "updated": 1678822970441, - "parent_conversation": null, - "creator": "U02LXF3HUN7", - "is_ext_shared": false, - "shared_team_ids": [ - "T01CWUYP5AR" - ], - "pending_connected_team_ids": [], - "is_member": false, - "topic": { - "value": "", - "creator": "", - "last_set": 0 - }, - "purpose": { - "value": "A channel for syncing up about travel plans, etc.", - "creator": "U02LXF3HUN7", - "last_set": 1678822970 - }, - "previous_names": [], - "num_members": 4 - }, - { - "id": "C051C93UZK9", - "name": "nyc-meetup", - "is_channel": true, - "is_group": false, - "is_im": false, - "is_mpim": false, - "is_private": false, - "created": 1680631804, - "is_archived": false, - "is_general": false, - "unlinked": 0, - "name_normalized": "nyc-meetup", - "is_shared": false, - "is_org_shared": false, - "is_pending_ext_shared": false, - "pending_shared": [], - "context_team_id": "T01CWUYP5AR", - "updated": 1680631804998, - "parent_conversation": null, - "creator": "U02LXF3HUN7", - "is_ext_shared": false, - "shared_team_ids": [ - "T01CWUYP5AR" - ], - "pending_connected_team_ids": [], - "is_member": true, - "topic": { - "value": "", - "creator": "", - "last_set": 0 - }, - "purpose": { - "value": "For coordinating on travel, etc.", - "creator": "U02LXF3HUN7", - "last_set": 1680631804 - }, - "previous_names": [], - "num_members": 7 - }, - { - "id": "C055GGUFMHQ", - "name": "boston-meetup", - "is_channel": true, - "is_group": false, - "is_im": false, - "is_mpim": false, - "is_private": false, - "created": 1682700563, - "is_archived": false, - "is_general": false, - "unlinked": 0, - "name_normalized": "boston-meetup", - "is_shared": false, - "is_org_shared": false, - "is_pending_ext_shared": false, - "pending_shared": [], - "context_team_id": "T01CWUYP5AR", - "updated": 1682700563376, - "parent_conversation": null, - "creator": "U02LXF3HUN7", - "is_ext_shared": false, - "shared_team_ids": [ - "T01CWUYP5AR" - ], - "pending_connected_team_ids": [], - "is_member": true, - "topic": { - "value": "", - "creator": "", - "last_set": 0 - }, - "purpose": { - "value": "", - "creator": "", - "last_set": 0 - }, - "previous_names": [], - "num_members": 7 - }, - { - "id": "C056YHEU680", - "name": "sf-meetup", - "is_channel": true, - "is_group": false, - "is_im": false, - "is_mpim": false, - "is_private": false, - "created": 1683210950, - "is_archived": false, - "is_general": false, - "unlinked": 0, - "name_normalized": "sf-meetup", - "is_shared": false, - "is_org_shared": false, - "is_pending_ext_shared": false, - "pending_shared": [], - "context_team_id": "T01CWUYP5AR", - "updated": 1683211020224, - "parent_conversation": null, - "creator": "U02LXF3HUN7", - "is_ext_shared": false, - "shared_team_ids": [ - "T01CWUYP5AR" - ], - "pending_connected_team_ids": [], - "is_member": true, - "topic": { - "value": "", - "creator": "", - "last_set": 0 - }, - "purpose": { - "value": "Please join the meetup group: ", - "creator": "U02LXF3HUN7", - "last_set": 1683211020 - }, - "previous_names": [], - "num_members": 11 - }, - { - "id": "C05N442RQUA", - "name": "toronto-meetup", - "is_channel": true, - "is_group": false, - "is_im": false, - "is_mpim": false, - "is_private": false, - "created": 1692210102, - "is_archived": false, - "is_general": false, - "unlinked": 0, - "name_normalized": "toronto-meetup", - "is_shared": false, - "is_org_shared": false, - "is_pending_ext_shared": false, - "pending_shared": [], - "context_team_id": "T01CWUYP5AR", - "updated": 1692210102912, - "parent_conversation": null, - "creator": "U02LXF3HUN7", - "is_ext_shared": false, - "shared_team_ids": [ - "T01CWUYP5AR" - ], - "pending_connected_team_ids": [], - "is_member": true, - "topic": { - "value": "", - "creator": "", - "last_set": 0 - }, - "purpose": { - "value": "", - "creator": "", - "last_set": 0 - }, - "previous_names": [], - "num_members": 9 - }, - { - "id": "C05PD7VJ52S", - "name": "london-meetup", - "is_channel": true, - "is_group": false, - "is_im": false, - "is_mpim": false, - "is_private": false, - "created": 1692978724, - "is_archived": false, - "is_general": false, - "unlinked": 0, - "name_normalized": "london-meetup", - "is_shared": false, - "is_org_shared": false, - "is_pending_ext_shared": false, - "pending_shared": [], - "context_team_id": "T01CWUYP5AR", - "updated": 1692978724912, - "parent_conversation": null, - "creator": "U02LXF3HUN7", - "is_ext_shared": false, - "shared_team_ids": [ - "T01CWUYP5AR" - ], - "pending_connected_team_ids": [], - "is_member": true, - "topic": { - "value": "", - "creator": "", - "last_set": 0 - }, - "purpose": { - "value": "", - "creator": "", - "last_set": 0 - }, - "previous_names": [], - "num_members": 6 - }, - { - "id": "C05U3UC85LM", - "name": "gx-integration", - "is_channel": true, - "is_group": false, - "is_im": false, - "is_mpim": false, - "is_private": false, - "created": 1695836283, - "is_archived": false, - "is_general": false, - "unlinked": 0, - "name_normalized": "gx-integration", - "is_shared": false, - "is_org_shared": false, - "is_pending_ext_shared": false, - "pending_shared": [], - "context_team_id": "T01CWUYP5AR", - "updated": 1695836283474, - "parent_conversation": null, - "creator": "U02LXF3HUN7", - "is_ext_shared": false, - "shared_team_ids": [ - "T01CWUYP5AR" - ], - "pending_connected_team_ids": [], - "is_member": true, - "topic": { - "value": "", - "creator": "", - "last_set": 0 - }, - "purpose": { - "value": "", - "creator": "", - "last_set": 0 - }, - "previous_names": [], - "num_members": 7 - }, - { - "id": "C065PQ4TL8K", - "name": "dev-discuss", - "is_channel": true, - "is_group": false, - "is_im": false, - "is_mpim": false, - "is_private": false, - "created": 1699981986, - "is_archived": false, - "is_general": false, - "unlinked": 0, - "name_normalized": "dev-discuss", - "is_shared": false, - "is_org_shared": false, - "is_pending_ext_shared": false, - "pending_shared": [], - "context_team_id": "T01CWUYP5AR", - "updated": 1699981986353, - "parent_conversation": null, - "creator": "U01HNKK4XAM", - "is_ext_shared": false, - "shared_team_ids": [ - "T01CWUYP5AR" - ], - "pending_connected_team_ids": [], - "is_member": true, - "topic": { - "value": "", - "creator": "", - "last_set": 0 - }, - "purpose": { - "value": "", - "creator": "", - "last_set": 0 - }, - "previous_names": [], - "num_members": 9 - } -] \ No newline at end of file diff --git a/slack-archive/data/emojis.json b/slack-archive/data/emojis.json deleted file mode 100644 index 8934469..0000000 --- a/slack-archive/data/emojis.json +++ /dev/null @@ -1,30 +0,0 @@ -{ - "bowtie": "https://emoji.slack-edge.com/T01CWUYP5AR/bowtie/f3ec6f2bb0.png", - "squirrel": "https://emoji.slack-edge.com/T01CWUYP5AR/squirrel/465f40c0e0.png", - "glitch_crab": "https://emoji.slack-edge.com/T01CWUYP5AR/glitch_crab/db049f1f9c.png", - "piggy": "https://emoji.slack-edge.com/T01CWUYP5AR/piggy/b7762ee8cd.png", - "cubimal_chick": "https://emoji.slack-edge.com/T01CWUYP5AR/cubimal_chick/85961c43d7.png", - "dusty_stick": "https://emoji.slack-edge.com/T01CWUYP5AR/dusty_stick/6177a62312.png", - "slack": "https://emoji.slack-edge.com/T01CWUYP5AR/slack/7d462d2443.png", - "pride": "https://emoji.slack-edge.com/T01CWUYP5AR/pride/56b1bd3388.png", - "thumbsup_all": "https://emoji.slack-edge.com/T01CWUYP5AR/thumbsup_all/50096a1020.png", - "slack_call": "https://emoji.slack-edge.com/T01CWUYP5AR/slack_call/b81fffd6dd.png", - "shipit": "alias:squirrel", - "white_square": "alias:white_large_square", - "black_square": "alias:black_large_square", - "simple_smile": "https://a.slack-edge.com/80588/img/emoji_2017_12_06/apple/simple_smile.png", - "raito": "https://emoji.slack-edge.com/T01CWUYP5AR/raito/99218314dd778726.png", - "gratitude-arigatou-gozaimasu": "https://emoji.slack-edge.com/T01CWUYP5AR/gratitude-arigatou-gozaimasu/d6322e811f6c8071.png", - "gratitude-asl": "https://emoji.slack-edge.com/T01CWUYP5AR/gratitude-asl/d6322e811f6c8071.gif", - "gratitude-danke": "https://emoji.slack-edge.com/T01CWUYP5AR/gratitude-danke/d6322e811f6c8071.png", - "gratitude-gamsahamnida": "https://emoji.slack-edge.com/T01CWUYP5AR/gratitude-gamsahamnida/d6322e811f6c8071.png", - "gratitude-gracias": "https://emoji.slack-edge.com/T01CWUYP5AR/gratitude-gracias/d6322e811f6c8071.png", - "gratitude-grazie": "https://emoji.slack-edge.com/T01CWUYP5AR/gratitude-grazie/d6322e811f6c8071.png", - "gratitude-merci": "https://emoji.slack-edge.com/T01CWUYP5AR/gratitude-merci/d6322e811f6c8071.png", - "gratitude-obrigada": "https://emoji.slack-edge.com/T01CWUYP5AR/gratitude-obrigada/d6322e811f6c8071.png", - "gratitude-obrigado": "https://emoji.slack-edge.com/T01CWUYP5AR/gratitude-obrigado/d6322e811f6c8071.png", - "gratitude-spasibo": "https://emoji.slack-edge.com/T01CWUYP5AR/gratitude-spasibo/d6322e811f6c8071.png", - "gratitude-thank-you": "https://emoji.slack-edge.com/T01CWUYP5AR/gratitude-thank-you/d6322e811f6c8071.png", - "gratitude-xiexie": "https://emoji.slack-edge.com/T01CWUYP5AR/gratitude-xiexie/d6322e811f6c8071.png", - "gratitude-xiexieni": "https://emoji.slack-edge.com/T01CWUYP5AR/gratitude-xiexieni/d6322e811f6c8071.png" -} \ No newline at end of file diff --git a/slack-archive/data/search.js b/slack-archive/data/search.js deleted file mode 100644 index 8fe7696..0000000 --- a/slack-archive/data/search.js +++ /dev/null @@ -1 +0,0 @@ -window.search_data = {"channels":{"C01CK9T7HKR":"general","C01NAFMBVEY":"mark-grover","C030F1J0264":"dagster-integration","C04E3Q18RR9":"open-lineage-plus-bacalhau","C04JPTTC876":"spec-compliance","C04QSV0GG23":"providence-meetup","C04THH1V90X":"data-council-meetup","C051C93UZK9":"nyc-meetup","C055GGUFMHQ":"boston-meetup","C056YHEU680":"sf-meetup","C05N442RQUA":"toronto-meetup","C05PD7VJ52S":"london-meetup","C05U3UC85LM":"gx-integration","C065PQ4TL8K":"dev-discuss"},"users":{"U066S97A90C":"rwojcik","U066CNW85D3":"karthik.nandagiri","U066HKFCHUG":"naresh.naresh36","U05T8BJD4DU":"jasonyip","U02LXF3HUN7":"michael282","U05TU0U224A":"rodrigo.maia","U05NMJ0NBUK":"lance.dacey2","U05J9LZ355L":"yannick.libert.partne","U05JBHLPY8K":"athityakumar","U0635GK8Y14":"david.goss","U062Q95A1FG":"n.priya88","U063YP6UJJ0":"fangmik","U04AZ7992SU":"john490","U062WLFMRTP":"hloomba","U05CAULTYG2":"kkandaswamy","U06315TMT61":"splicer9904","U0625RZ7KR9":"kpraveen420","U05KCF3EEUR":"savansharan_navalgi","U03D8K119LJ":"matthewparas2020","U0616K9TSTZ":"ankit.goods10","U04EZ2LPDV4":"anirudh.shrinivason","U05HK41VCH1":"madhav.kakumani","U05QL7LN2GH":"jeevan","U021QJMRP47":"drew215","U01HVNU6A4C":"mars","U01DCLP0GU9":"julien","U05FLJE4GDU":"damien.hawes","U05TZE47F2S":"slack1950","U05A1D80QKF":"suraj.gupta","U05SMTVPPL3":"sangeeta","U05HFGKEYVB":"juan_luis_cano","U05SQGH8DV4":"sarathch","U05K8F1T887":"terese","U055N2GRT4P":"tatiana.alchueyr","U05QNRSQW1E":"sarwatfatimam","U0595Q78HUG":"gaborjbernat","U05HBLE7YPL":"abdallah","U0323HG8C8H":"sheeri.cabral","U05NGJ8AM8X":"yunhe52203334","U05EC8WB74N":"mbarrien","U05JY6MN8MS":"githubopenlineageissu","U05PVS8GRJ6":"josdotso","U05J5GRKY10":"george.polychronopoul","U05TQPZ4R4L":"aaruna6","U05Q3HT6PBR":"kevin","U05SXDWVA7K":"kgkwiz","U01HNKK4XAM":"harel.shein","U05QHG1NJ8J":"mike474","U01RA9B5GG2":"maciej.obuchowski","U0620HU51HA":"sicotte.jason","U05U9K21LSG":"bill","U02S6F54MAB":"jakub.dardzinski","U05U9929K3N":"don","U01DCMDFHBK":"willy","U02MK6YNAQ5":"pawel.leszczynski","U053LLVTHRN":"ross769","U05KKM07PJP":"peter.hicks"},"messages":{"C01CK9T7HKR":[{"m":"Hi Everyone, first of all - big shout to all contributors - You do amazing job here.\nI want to use OpenLineage in our project - to do so I want to setup some POC and experiment with possibilities library provides - I start working on sample from the conference talk: but when I go into spark transformation after staring context with openlineage I have issues with _SessionHiveMetaStoreClient on section 3_- does anyone has other plain sample to play with, to not setup everything from scratch?","u":"U066S97A90C","t":"1700568128.192669"},{"m":"Hi So we can use openlineage to identify column level lineage with Airflow , Spark? will it also allow to connect to Power BI and derive the downstream column lineage ?","u":"U066CNW85D3","t":"1700456258.614309"},{"m":"what are the pros and cons of OL. we often talk about positives to market it but what are the pain points using OL,how it's addressing user issues?","u":"U066HKFCHUG","t":"1700064658.956769"},{"m":"Can anyone tell me why OL is better than other competitors if you can provide an analysis that would be great","u":"U066HKFCHUG","t":"1700064564.825909"},{"m":"Hi\nCan anyone point me to the deck on how Airflow can be integrated using Openlineage?","u":"U066HKFCHUG","t":"1700050644.509419"},{"m":"<@U02MK6YNAQ5> I diff CreateReplaceDatasetBuilder.java and CreateReplaceOutputDatasetBuilder.java and they are the same except for the class name, so I am not sure what is causing the change. I also realize you don't have a test case for ADLS","u":"U05T8BJD4DU","t":"1699863517.394909"},{"m":"<@U02MK6YNAQ5> I went back to 1.4.1, output does show adls location. But environment facet is gone in 1.4.1. It shows up in 1.5.0 but namespace is back to dbfs....","u":"U05T8BJD4DU","t":"1699862442.876219"},{"m":"Databricks needs to be re-written in a way that supports Databricks it seems like","u":"U05T8BJD4DU","t":"1699691444.226989"},{"m":"<@U02MK6YNAQ5> this is why if create a table with adls location it won't show input and output:\n\n\n\nBecause the catalog object is not there.","u":"U05T8BJD4DU","t":"1699691373.531469"},{"m":"<@U02MK6YNAQ5> regarding to , OL is parsing out the table location in Hive metastore, it is the location of the table in the catalog and not the physical location of the data. It is both right and wrong because it is a table, just it is an external table.\n\n","u":"U05T8BJD4DU","t":"1699647945.224489"},{"m":"\nFriendly reminder: this month’s TSC meeting, open to all, is tomorrow at 10 am PT: ","u":"U02LXF3HUN7","t":"1699465494.687309"},{"m":"Has anyone here tried OpenLineage with Spark on Amazon EMR?","u":"U05TU0U224A","t":"1699465132.534889"},{"m":"if I have a dataset on adls gen2 which synapse connects to as an external delta table, is that the use case of a symlink dataset? the delta table is connected to by PBI and by Synapse, but the underlying data is exactly the same","u":"U05NMJ0NBUK","t":"1699372165.804069"},{"m":"Hi all, we (I work with <@U05VDHJJ9T7> and <@U05HBLE7YPL>) have a quick question regarding the spark integration:\nif a spark app contains several jobs, they will be named \"my_spark_app_name.job1\" and \"my_spark_app_name.job2\"\neg:\nspark_job.collect_limit\nspark_job.map_partitions_parallel_collection\n\nIf I understood correctly, the spark integration maps one Spark job to a single OpenLineage Job, and the application itself should be assigned a Run id at startup and each job that executes will report the application's Run id as its parent job run (taken from: ).\n\nIn our case, the app Run Id is never created, and the jobs runs don't contain any parent facets. We tested it with a recent integration version in 1.4.1 and also an older one (0.26.0).\nDid we miss something in the OL spark integration config?","u":"U05J9LZ355L","t":"1699355029.839029"},{"m":"Hey team! :wave:\n\nWe're trying to use openlineage-flink, and would like provide the `openlineage.transport.type=http` and configure other transport configs, but we're not able to find sufficient docs (tried ) on where/how these configs can be provided.\n\nFor example, in spark, the changes mostly were delegated to the spark-submit command like\n```spark-submit --conf \"spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener\" \\\n --packages \"io.openlineage:openlineage-spark:<spark-openlineage-version>\" \\\n --conf \"spark.openlineage.transport.url=http://{openlineage.client.host}/api/v1/namespaces/spark_integration/\" \\\n --class com.mycompany.MySparkApp my_application.jar```\nAnd the `OpenLineageSparkListener` has a method to retrieve the provided spark confs as an object in the ArgumentParser. Similarly, looking for some pointers on how the openlineage.transport configs can be provided to `OpenLineageFlinkJobListener` & how the flink listener parses/uses these configs\n\nTIA! :smile:","u":"U05JBHLPY8K","t":"1699266123.453379"},{"m":":wave: I raised a PR off the back of some Marquez conversations a while back to try and clarify how names of Snowflake objects should be expressed in OL events. I used as a guide, but also I appreciate there are other OL producers that involve Snowflake too (Airflow? dbt?). Any feedback on this would be appreciated!","u":"U0635GK8Y14","t":"1699261422.618719"},{"m":"Hi Team , we are trying to customize the events by writing custom lineage listener extending OpenLineageSparkListener, but would need some direction how to capture the events","u":"U062Q95A1FG","t":"1699096090.087359"},{"m":"\nThis month’s TSC meeting (open to all) is next Thursday the 9th at 10am PT. On the agenda:\n• announcements\n• recent releases\n• recent additions to the Flink integration by <@U05QA2D1XNV> \n• recent additions to the Spark integration by <@U02MK6YNAQ5> \n• updates on proposals by <@U01DCLP0GU9> \n• discussion topics\n• open discussion\nMore info and the meeting link can be found on the . All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda.","u":"U02LXF3HUN7","t":"1699027207.361229"},{"m":"actually, it shows up in one of the RUNNING now... behavior is consistent between 11.3 and 13.3, thanks for fixing this issue","u":"U05T8BJD4DU","t":"1698999491.798599"},{"m":"<@U02MK6YNAQ5> I tested 1.5.0, it works great now, but the environment facets is gone in START... which I very much want it.. any thoughts?","u":"U05T8BJD4DU","t":"1698950958.157459"},{"m":"\nWe released OpenLineage 1.5.0, including:\n• by <@U05QA2D1XNV> \n• by <@U02MK6YNAQ5> \n• `rdd``toDF` by <@U02MK6YNAQ5> \n• by <@U02S6F54MAB> \n• by <@U02S6F54MAB> \n• bug fixes, tests, infra fixes, doc changes, and more.\nThanks to all the contributors, including new contributor <@U05VDHJJ9T7>!\n*Release:* \n*Changelog:* \n*Commit history:* \n*Maven:* \n*PyPI:* ","u":"U02LXF3HUN7","t":"1698940800.306129"},{"m":"I am looking to send OpenLineage events to an AWS API Gateway endpoint from an AWS MWAA instance. The problem is that all requests to AWS services need to be signed with SigV4, and using API Gateway with IAM authentication would require requests to API Gateway be signed with SigV4. Would the best way to do so be to just modify the python client HTTP transport to include a new config option for signing emitted OpenLineage events with SigV4? Are there any alternatives?","u":"U063YP6UJJ0","t":"1698885038.172079"},{"m":"Hi team :wave: , we’re finding that for our Spark jobs we are almost always getting some junk characters in our dataset names. We’ve pushed the regex filter to its limits and would like to extend the logic of deriving the dataset name in openlineage-spark (currently on `1.4.1`). I seem to recall hearing we could do this by implementing our own `LogicalPlanVisitor` or something along those lines? Is that still the recommended approach and if so would this be possible to implement in Scala vs. Java (scala noob here :simple_smile:)","u":"U04AZ7992SU","t":"1698882039.335099"},{"m":"\nThe October 2023 issue of is available now! to get in directly in your inbox each month.","u":"U02LXF3HUN7","t":"1698859749.531699"},{"m":"\nI’m opening a vote to release OpenLineage 1.5.0, including:\n• support for Cassandra Connectors lineage in the Flink integration\n• support for Databricks Runtime 13.3 in the Spark integration\n• support for `rdd` and `toDF` operations from the Spark Scala API in Spark\n• lowered requirements for attrs and requests packages in the Airflow integration\n• lazy rendering of yaml configs in the dbt integration\n• bug fixes, tests, infra fixes, doc changes, and more.\nThree +1s from committers will authorize an immediate release.","u":"U02LXF3HUN7","t":"1698852883.658009"},{"m":"one question if someone is around - when im keeping both `openlineage-airflow` and `apache-airflow-providers-openlineage` in my requirement file, i see the following error -\n``` from openlineage.airflow.extractors import Extractors\nModuleNotFoundError: No module named 'openlineage.airflow'```\nany thoughts?","u":"U062WLFMRTP","t":"1698778838.540239"},{"m":":wave: Hi team, cross-posting from the Marquez Channel in case anyone here has a better idea of the spec\n\n> For most of our lineage extractors in airflow, we are using the rust sql parser from openlineage-sql to extract table lineage via sql statements. When errors occur we are adding an `extractionError` run facet similar to what is being done . I’m finding in the case that multiple statements were extracted but one failed to parse while many others were successful, the lineage for these runs doesn’t appear as expected in Marquez. Is there any logic around the `extractionError` run facet that could be causing this? It seems reasonable to assume that we might take this to mean the entire run event is invalid if we have any extraction errors. \n> \n> I would still expect to see the other lineage we sent for the run but am instead just seeing the `extractionError` in the marquez UI, in the database, runs with an `extractionError` facet don’t seem to make it to the `job_versions_io_mapping` table","u":"U04AZ7992SU","t":"1698706303.956579"},{"m":"I realize in Spark 3.4+, some job ids don't have a start event. What part of the code is responsible for triggering the START and COMPLETE event","u":"U05T8BJD4DU","t":"1698563188.319939"},{"m":"Hello, has anyone run into similar error as posted in this github open issues[] while setting up marquez on an EC2 Instance, would appreciate any help to get past the errors","u":"U05CAULTYG2","t":"1698440472.145489"},{"m":"referencing to conversation - what it takes to move to openlineage provider package from openlineage-airflow. Im updating Airflow to 2.7.2 but moving off of openlineage-airflow to provider package Im trying to estimate the amount of work it takes, any thoughts? reading change_logs I dont think its too much of a change but please share your thoughts and if somewhere its drafted please do share that as well","u":"U062WLFMRTP","t":"1698429543.349989"},{"m":"Hi People, actually I want to intercept the OpenLineage spark events right after the job ends and before they are emitted, so that I can add some extra information to the events or remove some information that I don't want.\nIs there any way of doing this? Can someone please help me","u":"U06315TMT61","t":"1698408752.647169"},{"m":"*Spark Integration Logs*\nHey There\nAre these events skipped because it's not supported or it's configured somewhere?\n`23/10/27 08:25:58 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart`\n`23/10/27 08:25:58 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd`","u":"U05TU0U224A","t":"1698400165.662489"},{"m":"Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated","u":"U062WLFMRTP","t":"1698340358.557159"},{"m":"Hello Team","u":"U062WLFMRTP","t":"1698340277.847709"},{"m":"Hi I want to customise the events which comes from Openlineage spark . Can some one give some information","u":"U062Q95A1FG","t":"1698315220.142929"},{"m":"Hi,\n\nWe are using openlineage spark connector. We have used spark 3.2 and scala 2.12 so far. We have triggered a new job with Spark 3.4 and scala 2.13 and faced below exception.\n\n\n```java.lang.NoSuchMethodError: 'scala.collection.Seq org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.map(scala.Function1)'\n\tat io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.lambda$buildInputDatasets$6(OpenLineageRunEventBuilder.java:341)\n\tat java.base/java.util.Optional.map(Optional.java:265)\n\tat io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildInputDatasets(OpenLineageRunEventBuilder.java:339)\n\tat io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.populateRun(OpenLineageRunEventBuilder.java:295)\n\tat io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:279)\n\tat io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:222)\n\tat io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:72)\n\tat io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:91)```","u":"U0625RZ7KR9","t":"1697840317.080859"},{"m":"<@U05JY6MN8MS>\nI am trying to contribute to Integration tests which is listed here as \nthe mentions that i can trigger CI for integration tests from forked branch.\n.\nbut i am unable to do so, is there a way to trigger CI from forked brach or do i have to get permission from someone to run the CI?\n\ni am getting this error when i run this command `sudo git-push-fork-to-upstream-branch upstream savannavalgi:hacktober`\n> ```Username for '': savannavalgi\n> Password for '': \n> remote: Permission to OpenLineage/OpenLineage.git denied to savannavalgi.\n> fatal: unable to access '': The requested URL returned error: 403```\ni have tried to configure ssh key\nalso tried to trigger CI from another brach,\nand tried all of this after fetching the latest upstream\n\ncc: <@U05JBHLPY8K> <@U01RA9B5GG2> <@U05HD9G5T17>","u":"U05KCF3EEUR","t":"1697805105.047909"},{"m":"Hey all - we've been noticing that some events go unreported by openlineage (spark) when the `AsyncEventQueue` fills up and starts dropping events. Wondering if anyone has experienced this before, and knows why it is happening? We've expanded the event queue capacity and thrown more hardware at the problem but no dice\n\nAlso as a note, the query plans from this job are pretty big - could the listener just be choking up? Happy to open a github issue as well if we suspect that it could be the listener itself having issues","u":"U03D8K119LJ","t":"1697742042.953399"},{"m":"Hello All, I am completely new for Openlineage, I have to setup the lab to conduct POC on various aspects like Lineage, metadata management , etc. As per openlineage site, i tried downloading Ubuntu, docker and binary files for Marquez. But I am lost somewhere and unable to configure whole setup. Can someone please assist in steps to start from scratch so that i can delve into the Openlineage capabilities. Many thanks","u":"U0616K9TSTZ","t":"1697597823.663129"},{"m":"Hello All :wave:!\nWe are currently trying to work the the *spark integration for OpenLineage in our Databricks instance*. The general setup is done and working with a few hicups here and there.\nBut one thing we are still struggling is how to link all spark jobs events with a Databricks job or a notebook run.\nWe´ve recently noticed that some of the events produced by OL have the \"environment-properties\" attribute with information (for our context) regarding notebook path (if it is a notebook run), or the the job run ID (if its a databricks job run). But the thing is that _these attributes are not always present._\nI ran some samples yesterday for a job with 4 notebook tasks. From all 20 json payload sent by the OL listener, only 3 presented the \"environment-properties\" attribute. Its not only happening with Databricks jobs. When i run single notebooks and each cell has its onw set of spark jobs, not all json events presented that property either.\n\nSo my question is *what is the criteria to have this attributes present or not in the event json file*? Or maybe this in an issue? <@U05T8BJD4DU> did you find out anything about this?\n\n:gear: Spark 3.4 / OL-Spark 1.4.1","u":"U05TU0U224A","t":"1697527077.180169"},{"m":"Hi team, I am running the following pyspark code in a cell:\n```print(\"SELECTING 100 RECORDS FROM METADATA TABLE\")\ndf = spark.sql(\"\"\"select * from
limit 100\"\"\")\n\nprint(\"WRITING (1) 100 RECORDS FROM METADATA TABLE\")\ndf.write.mode(\"overwrite\").format('delta').save(\"\")\ndf.createOrReplaceTempView(\"temp_metadata\")\n\nprint(\"WRITING (2) 100 RECORDS FROM METADATA TABLE\")\ndf.write.mode(\"overwrite\").format(\"delta\").save(\"\")\n\nprint(\"READING (1) 100 RECORDS FROM METADATA TABLE\")\ndf_read = spark.read.format('delta').load(\"\")\ndf_read.createOrReplaceTempView(\"metadata_1\")\n\nprint(\"DOING THE MERGE INTO SQL STEP!\")\ndf_new = spark.sql(\"\"\"\n MERGE INTO metadata_1\n USING
\n ON metadata_1.id = temp_metadata.id\n WHEN MATCHED THEN UPDATE SET \n metadata_1.id = temp_metadata.id,\n metadata_1.aspect = temp_metadata.aspect\n WHEN NOT MATCHED THEN INSERT (id, aspect) \n VALUES (temp_metadata.id, temp_metadata.aspect)\n\"\"\")```\nI am running with debug log levels. I actually don't see any of the events being logged for `SaveIntoDataSourceCommand` or the `MergeIntoCommand`, but OL is in fact emitting events to the backend. It seems like the events are just not being logged... I actually observe this for all delta table related spark sql queries...","u":"U04EZ2LPDV4","t":"1697179720.032079"},{"m":"This might be a dumb question, I guess I need to setup local Spark in order for the Spark tests to run successfully?","u":"U05T8BJD4DU","t":"1697137714.503349"},{"m":"\nFriendly reminder: this month’s TSC meeting, open to all, is tomorrow at 10 am PT: ","u":"U02LXF3HUN7","t":"1697043601.182719"},{"m":"Hi @there, I am trying to make API call to get column-lineage information could you please let me know the url construct to retrieve the same? As per the API documentation I am passing the following url to GET column-lineage: but getting error code:400. Thanks","u":"U05HK41VCH1","t":"1697040264.029839"},{"m":" When i am running this sql as part of a databricks notebook, i am recieving an OL event where i see only an output dataset and there is no input dataset or a symlink facet inside the dataset to map it to the underlying azure storage object. Can anyone kindly help on this\n```spark.sql(f\"CREATE TABLE IF NOT EXISTS covid_research.uscoviddata USING delta LOCATION ''\")\n{\n \"eventTime\": \"2023-10-11T10:47:36.296Z\",\n \"producer\": \"\",\n \"schemaURL\": \"\",\n \"eventType\": \"COMPLETE\",\n \"run\": {\n \"runId\": \"d0f40be9-b921-4c84-ac9f-f14a86c29ff7\",\n \"facets\": {\n \"spark.logicalPlan\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"plan\": [\n {\n \"class\": \"org.apache.spark.sql.catalyst.plans.logical.CreateTable\",\n \"num-children\": 1,\n \"name\": 0,\n \"tableSchema\": [],\n \"partitioning\": [],\n \"tableSpec\": null,\n \"ignoreIfExists\": true\n },\n {\n \"class\": \"org.apache.spark.sql.catalyst.analysis.ResolvedIdentifier\",\n \"num-children\": 0,\n \"catalog\": null,\n \"identifier\": null\n }\n ]\n },\n \"spark_version\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"spark-version\": \"3.3.0\",\n \"openlineage-spark-version\": \"1.2.2\"\n },\n \"processing_engine\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"version\": \"3.3.0\",\n \"name\": \"spark\",\n \"openlineageAdapterVersion\": \"1.2.2\"\n }\n }\n },\n \"job\": {\n \"namespace\": \"default\",\n \"name\": \"adb-3942203504488904.4.azuredatabricks.net.create_table.covid_research_db_uscoviddata\",\n \"facets\": {}\n },\n \"inputs\": [],\n \"outputs\": [\n {\n \"namespace\": \"dbfs\",\n \"name\": \"/user/hive/warehouse/covid_research.db/uscoviddata\",\n \"facets\": {\n \"dataSource\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"name\": \"dbfs\",\n \"uri\": \"dbfs\"\n },\n \"schema\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"fields\": []\n },\n \"storage\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"storageLayer\": \"unity\",\n \"fileFormat\": \"parquet\"\n },\n \"symlinks\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"identifiers\": [\n {\n \"namespace\": \"/user/hive/warehouse/covid_research.db\",\n \"name\": \"covid_research.uscoviddata\",\n \"type\": \"TABLE\"\n }\n ]\n },\n \"lifecycleStateChange\": {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"lifecycleStateChange\": \"CREATE\"\n }\n },\n \"outputFacets\": {}\n }\n ]\n}```","u":"U05QL7LN2GH","t":"1697021758.073929"},{"m":" i am trying out the databricks spark integration and in one of the events i am getting a openlineage event where the output dataset is having a facet called `symlinks` , the statement that generated this event is this sql\n```CREATE TABLE IF NOT EXISTS covid_research.covid_data \nUSING CSV\nLOCATION '' \nOPTIONS (header \"true\", inferSchema \"true\");```\nCan someone kindly let me know what this `symlinks` facet is. i tried seeing the spec but did not get it completely","u":"U05QL7LN2GH","t":"1696995819.546399"},{"m":"example:\n\n```{\"environment-properties\":{\"spark.databricks.clusterUsageTags.clusterName\":\"'s Cluster\",\"spark.databricks.job.runId\":\"\",\"spark.databricks.job.type\":\"\",\"spark.databricks.clusterUsageTags.azureSubscriptionId\":\"a4f54399-8db8-4849-adcc-a42aed1fb97f\",\"spark.databricks.notebook.path\":\"/Repos/jason.yip@tredence.com/segmentation/01_Data Prep\",\"spark.databricks.clusterUsageTags.clusterOwnerOrgId\":\"4679476628690204\",\"MountPoints\":[{\"MountPoint\":\"/databricks-datasets\",\"Source\":\"databricks-datasets\"},{\"MountPoint\":\"/Volumes\",\"Source\":\"UnityCatalogVolumes\"},{\"MountPoint\":\"/databricks/mlflow-tracking\",\"Source\":\"databricks/mlflow-tracking\"},{\"MountPoint\":\"/databricks-results\",\"Source\":\"databricks-results\"},{\"MountPoint\":\"/databricks/mlflow-registry\",\"Source\":\"databricks/mlflow-registry\"},{\"MountPoint\":\"/Volume\",\"Source\":\"DbfsReserved\"},{\"MountPoint\":\"/volumes\",\"Source\":\"DbfsReserved\"},{\"MountPoint\":\"/\",\"Source\":\"DatabricksRoot\"},{\"MountPoint\":\"/volume\",\"Source\":\"DbfsReserved\"}],\"User\":\"\",\"UserId\":\"4768657035718622\",\"OrgId\":\"4679476628690204\"}}```","u":"U05T8BJD4DU","t":"1696985639.868119"},{"m":"Any idea why \"environment-properties\" is gone in Spark 3.4+ in StartEvent?","u":"U05T8BJD4DU","t":"1696914311.793789"},{"m":"Hello. I am getting started with OL and Marquez with dbt. I am using dbt-ol. The namespace of the dataset showing up in Marquez is not the namespace I provide using OPENLINEAGE_NAMESPACE. It happens to be the same as the source in Marquez which is the snowflake account uri. It's obviously picking up the other env variable OPENLINEAGE_URL so i am pretty sure its not the environment. Is this expected?","u":"U021QJMRP47","t":"1696884935.692409"},{"m":"\n*We released OpenLineage 1.4.1!*\n*Additions:*\n• *Client:* *allow setting client’s endpoint via environment variable* <@U01HVNU6A4C> \n• *Flink: expand Iceberg source types* <@U05QA2D1XNV> \n• *Spark: add debug facet* <@U02MK6YNAQ5> \n• *Spark: enable Nessie REST catalog* \nThanks to all the contributors, especially new contributors <@U05QA2D1XNV> and !\n*Release:* \n*Changelog:* \n*Commit history:* \n*Maven:* \n*PyPI:* ","u":"U02LXF3HUN7","t":"1696879514.895109"},{"m":" I am trying out the openlineage integration of spark on databricks. There is no event getting emitted from Openlineage, I see logs saying OpenLineage Event Skipped. I am attaching the Notebook that i am trying to run and the cluster logs. Kindly can someone help me on this","u":"U05QL7LN2GH","t":"1696823976.297949"},{"m":"<@U02LXF3HUN7> can we cut a new release to include this change?\n• ","u":"U01HVNU6A4C","t":"1696591141.778179"},{"m":"The Marquez meetup in San Francisco is happening right now!\n","u":"U01DCLP0GU9","t":"1696552840.350759"},{"m":"I have created a ticket to make this easier to find. Once I get more feedback I’ll turn it into a md file in the repo: \n","u":"U01DCLP0GU9","t":"1696541652.452819"},{"m":"**\nThis month’s TSC meeting is next Thursday the 12th at 10am PT. On the tentative agenda:\n• announcements\n• recent releases\n• Airflow Summit recap\n• tutorial: migrating to the Airflow Provider\n• discussion topic: observability for OpenLineage/Marquez\n• open discussion\n• more (TBA)\nMore info and the meeting link can be found on the . All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda.","u":"U02LXF3HUN7","t":"1696531454.431629"},{"m":"I have cleaned up the registry proposal.\n\nIn particular:\n• I clarified that option 2 is preferred at this point.\n• I moved discussion notes to the bottom. they will go away at some point\n• Once it is stable, I’ll create a with the preferred option.\n• we need a good proposal for the core facets prefix. My suggestion is to move core facets to `core` in the registry. The drawback is prefix would be inconsistent.\n","u":"U01DCLP0GU9","t":"1696379615.265919"},{"m":"Hey everyone - does anyone have a good mechanism for alerting on issues with open lineage? For example, maybe alerting when an event times out - perhaps to prometheus or some other kind of generic endpoint? Not sure the best approach here (if the meta inf extension would be able to achieve it)","u":"U03D8K119LJ","t":"1696350897.139129"},{"m":"\n*We released OpenLineage 1.3.1!*\n*Added:*\n• Airflow: add some basic stats to the Airflow integration `#1845` \n• Airflow: add columns as schema facet for `airflow.lineage.Table` (if defined) `#2138` \n• DBT: add SQLSERVER to supported dbt profile types `#2136` \n• Spark: support for latest 3.5 `#2118` \n*Fixed:*\n• Airflow: fix find-links path in tox `#2139` \n• Airflow: add more graceful logging when no OpenLineage provider installed `#2141` \n• Spark: fix bug in PathUtils’ `prepareDatasetIdentifierFromDefaultTablePath` (CatalogTable) to correctly preserve scheme from `CatalogTable`’s location `#2142` \nThanks to all the contributors, including new contributor <@U05TZE47F2S>!\n*Release:* \n*Changelog:* \n*Commit history:* \n*Maven:* \n*PyPI:* ","u":"U02LXF3HUN7","t":"1696344963.496819"},{"m":"Hi folks - I'm wondering if its just me, but does `io.openlineage:openlineage-sql-java:1.2.2` ship with the `arm64.dylib` binary? When i try and run code that uses the Java package on an Apple M1, the binary isn't found, The workaround is to checkout 1.2.2 and then build and publish it locally.","u":"U05FLJE4GDU","t":"1696319076.770719"},{"m":"\nThe September issue of is here! This issue covers the big news about OpenLineage coming out of Airflow Summit, progress on the Airflow Provider, highlights from our meetup in Toronto, and much more.\nTo get the newsletter directly in your inbox each month, sign up .","u":"U02LXF3HUN7","t":"1696264108.497989"},{"m":"\nHello all, I’d like to open a vote to release OpenLineage 1.3.0, including:\n• support for Spark 3.5 in the Spark integration\n• scheme preservation bug fix in the Spark integration\n• find-links path in tox bug in the Airflow integration fix\n• more graceful logging when no OL provider is installed in the Airflow integration\n• columns as schema facet for airflow.lineage.Table addition\n• SQLSERVER to supported dbt profile types addition\nThree +1s from committers will authorize. Thanks in advance.","u":"U02LXF3HUN7","t":"1696262312.791719"},{"m":"Are you located in the Brussels area or within commutable distance? Interested in attending a meetup between October 16-20? If so, please DM <@U0323HG8C8H> or myself. TIA","u":"U02LXF3HUN7","t":"1695932184.205159"},{"m":"Hello community\nFirst time poster - bear with me :)\n\nI am looking to make a minor PR on the airflow integration (fixing github #2130), and the code change is easy enough, but I fail to install the python environment. I have tried the simple ones\n`OpenLineage/integration/airflow > pip install -e .`\n or\n`OpenLineage/integration/airflow > pip install -r dev-requirements.txt`\nbut they both fail on\n`ERROR: No matching distribution found for openlineage-sql==1.3.0`\n\n(which I think is an unreleased version in the git project)\n\nHow would I go about to install the requirements?\n\n//Erik\n\nPS. Sorry for posting this in general if there is a specific integration or contribution channel - I didnt find a better channel","u":"U05TZE47F2S","t":"1695883240.832669"},{"m":"Hi folks, am I correct in my observations that the Spark integration does not generate inputs and outputs for Kafka-to-Kafka pipelines?\n\n*EDIT:* Removed the crazy wall of text. Relevant GitHub issue is .","u":"U05FLJE4GDU","t":"1695831785.042079"},{"m":"*Meetup recap: Toronto Meetup @ Airflow Summit, September 18, 2023*\nIt was great to see so many members of our community at this event! I counted 32 total attendees, with all but a handful being first-timers.\n*Topics included:*\n• Presentation on the history, architecture and roadmap of the project by <@U01DCLP0GU9> and <@U01HNKK4XAM> \n• Discussion of OpenLineage support in by <@U01DCMDFHBK> \n• Presentation by *Ye Liu* and *Ivan Perepelitca* from , the social platform for data, about their integration\n• Presentation by <@U02MK6YNAQ5> about the Spark integration\n• Presentation by <@U01RA9B5GG2> about the Apache Airflow Provider\nThanks to all the presenters and attendees with a shout out to <@U01HNKK4XAM> for the help with organizing and day-of logistics, <@U02S6F54MAB> for the help with set up/clean up, and <@U0323HG8C8H> for the crucial assist with the signup sheet.\nThis was our first meetup in Toronto, and we learned some valuable lessons about planning events in new cities — the first and foremost being to ask for a pic of the building! :slightly_smiling_face: But it seemed like folks were undeterred, and the space itself lived up to expectations.\nFor a recording and clips from the meetup, head over to our .\n*Upcoming events:*\n• October 5th in San Francisco: Marquez Meetup @ Astronomer (sign up )\n• November: Warsaw meetup (details, date TBA)\n• January: London meetup (details, date TBA)\nAre you interested in hosting or co-hosting an OpenLineage or Marquez meetup? DM me!","u":"U02LXF3HUN7","t":"1695827956.140429"},{"m":" In Airflow Integration we send across a lineage Event for Dag start and complete, but that is not the case with spark integration…we don’t receive any event for the application start and complete in spark…is this expected behaviour or am i missing something?","u":"U05QL7LN2GH","t":"1695703890.171789"},{"m":"I'm using the Spark OpenLineage integration. In the `outputStatistics` output dataset facet we receive `rowCount` and `size`.\nThe Job performs a SQL insert into a MySQL table and I'm receiving the `size` as 0.\n```{\n \"outputStatistics\":\n {\n \"_producer\": \"\",\n \"_schemaURL\": \"\",\n \"rowCount\": 1,\n \"size\": 0\n }\n}```\nI'm not sure what the size means here. Does this mean number of bytes inserted/updated?\nAlso, do we have any documentation for Spark specific Job and Run facets?","u":"U05A1D80QKF","t":"1695663385.834539"},{"m":" I'm presently addressing a particular scenario that pertains to Openlineage authentication, specifically involving the use of an access key and secret.\n\nI've implemented a custom token provider called AccessKeySecretKeyTokenProvider, which extends the TokenProvider class. This token provider communicates with another service, obtaining a token and an expiration time based on the provided access key, secret, and client ID.\n\nMy goal is to retain this token in a cache prior to its expiration, thereby eliminating the need for network calls to the third-party service. Is it possible without relying on an external caching system.","u":"U05SMTVPPL3","t":"1695633110.066819"},{"m":"I am attaching the log4j, there is no openlineagecontext","u":"U05T8BJD4DU","t":"1695352570.560639"},{"m":"I installed 1.2.2 on Databricks, followed the below init script: \n\nmy cluster config looks like this:\n\nspark.openlineage.version v1\nspark.openlineage.namespace adb-5445974573286168.8#default\nspark.openlineage.endpoint v1/lineage\nspark.openlineage.url.param.code 8kZl0bo2TJfnbpFxBv-R2v7xBDj-PgWMol3yUm5iP1vaAzFu9kIZGg==\nspark.openlineage.url \n\nBut it is not calling the API, it works fine with 0.18 version","u":"U05T8BJD4DU","t":"1695347501.889769"},{"m":"I am using this accelerator that leverages OpenLineage on Databricks to publish lineage info to Purview, but it's using a rather old version of OpenLineage aka 0.18, anybody has tried it on a newer version of OpenLineage? I am facing some issues with the inputs and outputs for the same object is having different json\n","u":"U05T8BJD4DU","t":"1695335777.852519"},{"m":"Hi, we're using Custom Operators in airflow(2.5) and are planning to expose lineage via default extractors: \n*Question*: Now if we upgrade our Airflow version to 2.7 in the future, would our code be backward compatible?\nSince OpenLineage has now moved inside airflow and I think there is no concept of extractors in the latest version.","u":"U05A1D80QKF","t":"1695276670.439269"},{"m":"\nWe released OpenLineage 1.2.2!\nAdded\n• Spark: publish the `ProcessingEngineRunFacet` as part of the normal operation of the `OpenLineageSparkEventListener` `#2089` \n• Spark: capture and emit `spark.databricks.clusterUsageTags.clusterAllTags` variable from databricks environment `#2099` \nFixed\n• Common: support parsing dbt_project.yml without target-path `#2106` \n• Proxy: fix Proxy chart `#2091` \n• Python: fix serde filtering `#2044` \n• Python: use non-deprecated `apiKey` if loading it from env variables `@2029` \n• Spark: Improve RDDs on S3 integration. `#2039` \n• Flink: prevent sending `running` events after job completes `#2075` \n• Spark & Flink: Unify dataset naming from URI objects `#2083` \n• Spark: Databricks improvements `#2076` \nRemoved\n• SQL: remove sqlparser dependency from iface-java and iface-py `#2090` \nThanks to all the contributors, including new contributors <@U055N2GRT4P>, , and !\n*Release:* \n*Changelog:* \n*Commit history:* \n*Maven:* \n*PyPI:* ","u":"U02LXF3HUN7","t":"1695244138.650089"},{"m":"congrats folks :partying_face: ","u":"U05HFGKEYVB","t":"1695217014.549799"},{"m":"Hi I need help in extracting OpenLineage for PostgresOperator in json format.\nany suggestions or comments would be greatly appreciated","u":"U05SQGH8DV4","t":"1695106067.665469"},{"m":"Hi! I'm in need of help with wrapping my head around OpenLineage. My team have the goal of collecting metadata from the Airflow operators GreatExpectationsOperator, PythonOperator, MsSqlOperator and BashOperator (for dbt). Where can I see the sourcecode for what is collected for each operator, and is there support for these in the new provider *apache-airflow-providers-openlineage*? I am super confused and feel lost in the docs. :exploding_head: We are using MSSQL/ODBC to connect to our db, and this data does not seem to appear as datasets in Marquez, do I need to configure this? If so, HOW and WHERE? :smiling_face_with_tear:\n\nHappy for any help, big or small! :pray:","u":"U05K8F1T887","t":"1695039754.591479"},{"m":"It doesn't seem like there's a way to override the OL endpoint from the default (`/api/v1/lineage`) in Airflow? I tried setting the `OPENLINEAGE_ENDPOINT` environment to no avail. Based on , it seems that only `OPENLINEAGE_URL` was used to construct `HttpConfig` ?","u":"U01HVNU6A4C","t":"1694956061.909169"},{"m":" is there a way by which we could add custom headers to openlineage client in airflow, i see that provision is there for spark integration via these properties --> abcdef","u":"U05QL7LN2GH","t":"1694907627.974239"},{"m":" we have dataproc operator getting called from a dag which submits a spark job, we wanted to maintain that continuity of parent job in the spark job so according to the documentation we can acheive that by using a macro called lineage_run_id that requires task and task_instance as the parameters. The problem we are facing is that our client’s have 1000's of dags, so asking them to change this everywhere it is used is not feasible, so we thought of using the task_policy feature in the airflow…but the problem is that task_policy gives you access to only the task/operator, but we don’t have the access to the task instance..that is required as a parameter to the lineage_run_id function. Can anyone kindly help us on how should we go about this one\n```t1 = DataProcPySparkOperator(\n task_id=job_name,\n #required pyspark configuration,\n job_name=job_name,\n dataproc_pyspark_properties={\n 'spark.driver.extraJavaOptions':\n f\"-javaagent:{jar}={os.environ.get('OPENLINEAGE_URL')}/api/v1/namespaces/{os.getenv('OPENLINEAGE_NAMESPACE', 'default')}/jobs/{job_name}/runs/{{{{macros.OpenLineagePlugin.lineage_run_id(task, task_instance)}}}}?api_key={os.environ.get('OPENLINEAGE_API_KEY')}\"\n dag=dag)```","u":"U05QL7LN2GH","t":"1694849427.228709"},{"m":"\nFriendly reminder: the next OpenLineage meetup, our first in Toronto, is happening this coming Monday at 5 PM ET ","u":"U02LXF3HUN7","t":"1694793807.376729"},{"m":"Per discussion in the OpenLineage sync today here is a very early strawman proposal for an OpenLineage registry that producers and consumers could be registered in.\nFeedback or alternate proposals welcome\n\nOnce this is sufficiently fleshed out, I’ll create an actual proposal on github","u":"U01DCLP0GU9","t":"1694737381.437569"},{"m":"Hey everyone,\nAny chance we could have a *openlineage-integration-common* 1.1.1 release with the following changes..?\n• \n• ","u":"U055N2GRT4P","t":"1694700221.242579"},{"m":"Context:\n\nWe use Spark with YARN, running on Hadoop 2.x (I can't remember the exact minor version) with Hive support.\n\nProblem:\n\nI'm noticed that `CreateDataSourceAsSelectCommand` objects are _always_ transformed to an `OutputDataset` with a _namespace_ value set to `file` - which is curious, because the inputs always have a (correct) namespace of `hdfs://<name-node>` - is this a known issue? A flaw with Apache Spark? A bug in the resolution logic?\n\nFor reference:\n\n```public class CreateDataSourceTableCommandVisitor\n extends QueryPlanVisitor<CreateDataSourceTableCommand, OpenLineage.OutputDataset> {\n\n public CreateDataSourceTableCommandVisitor(OpenLineageContext context) {\n super(context);\n }\n\n @Override\n public List<OpenLineage.OutputDataset> apply(LogicalPlan x) {\n CreateDataSourceTableCommand command = (CreateDataSourceTableCommand) x;\n CatalogTable catalogTable = command.table();\n\n return Collections.singletonList(\n outputDataset()\n .getDataset(\n PathUtils.fromCatalogTable(catalogTable),\n catalogTable.schema(),\n OpenLineage.LifecycleStateChangeDatasetFacet.LifecycleStateChange.CREATE));\n }\n}```\nRunning this: `cat events.log | jq '{eventTime: .eventTime, eventType: .eventType, runId: .run.runId, jobNamespace: .job.namespace, jobName: .job.name, outputs: .outputs[] | {namespace: .namespace, name: .name}, inputs: .inputs[] | {namespace: .namespace, name: .name}}'`\n\nThis is an output:\n```{\n \"eventTime\": \"2023-09-13T16:01:27.059Z\",\n \"eventType\": \"START\",\n \"runId\": \"bbbb5763-3615-46c0-95ca-1fc398c91d5d\",\n \"jobNamespace\": \"spark.cluster-1\",\n \"jobName\": \"ol_hadoop_test.execute_create_data_source_table_as_select_command.dhawes_db_ol_test_hadoop_tgt\",\n \"outputs\": {\n \"namespace\": \"file\",\n \"name\": \"/user/hive/warehouse/dhawes.db/ol_test_hadoop_tgt\"\n },\n \"inputs\": {\n \"namespace\": \"\",\n \"name\": \"/user/hive/warehouse/dhawes.db/ol_test_hadoop_src\"\n }\n}```","u":"U05FLJE4GDU","t":"1694686815.337029"},{"m":"\nThis month’s TSC meeting, open to all, is tomorrow: ","u":"U02LXF3HUN7","t":"1694629232.934029"},{"m":"I am exploring Spark - OpenLineage integration (using the latest PySpark and OL versions). I tested a simple pipeline which:\n• Reads JSON data into PySpark DataFrame\n• Apply data transformations\n• Write transformed data to MySQL database\nObserved that we receive 4 events (2 `START` and 2 `COMPLETE`) for the same job name. The events are almost identical with a small diff in the facets. All the events share the same `runId`, and we don't get any `parentRunId`.\nTeam, can you please confirm if this behaviour is expected? Seems to be different from the Airflow integration where we relate jobs to Parent Jobs.","u":"U05A1D80QKF","t":"1694583867.900909"},{"m":" has anyone succeded in getting a custom extractor to work in GCP Cloud Composer or AWS MWAA, seems like there is no way","u":"U05QL7LN2GH","t":"1694553961.188719"},{"m":"I am trying to run Google Cloud Composer where i have added the openlineage-airflow pypi packagae as a dependency and have added the env OPENLINEAGE_EXTRACTORS to point to my custom extractor. I have added a folder by name dependencies and inside that i have placed my extractor file, and the path given to OPENLINEAGE_EXTRACTORS is dependencies.<file_name>.<extractor_class_name>…still it fails with the exception saying No module named ‘dependencies’. Can anyone kindly help me out on correcting my mistake","u":"U05QL7LN2GH","t":"1694545905.974339"},{"m":"This particular code in docker-compose exits with code 1 because it is unable to find wait-for-it.sh, file in the container. I have checked the mounting path from the local machine, It is correct and the path on the container for Marquez is also correct i.e. /usr/src/app but it is unable to mount the wait-for-it.sh. Does anyone know why is this? This code exists in the open lineage repository as well \n```# Marquez as an OpenLineage Client\n api:\n image: marquezproject/marquez\n container_name: marquez-api\n ports:\n - \"5000:5000\"\n - \"5001:5001\"\n volumes:\n - ./docker/wait-for-it.sh:/usr/src/app/wait-for-it.sh\n links:\n - \"db:postgres\"\n depends_on:\n - db\n entrypoint: [ \"./wait-for-it.sh\", \"db:5432\", \"--\", \"./entrypoint.sh\" ]```","u":"U05QNRSQW1E","t":"1694520846.519609"},{"m":"Opened a PR for this here: ","u":"U04AZ7992SU","t":"1694468262.274069"},{"m":"I’m seeing some odd behavior with my http transport when upgrading airflow/openlineage-airflow from 2.3.2 -> 2.6.3 and 0.24.0 -> 0.28.0. Previously I had a config like this that let me provide my own auth tokens. However, after upgrading I’m getting a 401 from the endpoint and further debugging seems to reveal that we’re not using the token provided in my TokenProvider. Does anyone know if something changed between these versions that could be causing this? (more details in :thread: )\n```transport:\n type: http\n url: \n auth:\n type: some.fully.qualified.classpath```","u":"U04AZ7992SU","t":"1694466446.599329"},{"m":"\nThe first Toronto OpenLineage Meetup, featuring a presentation by recent adopter , is just one week away. On the agenda:\n1. *Evolution of spec presentation/discussion (project background/history)*\n2. *State of the community*\n3. *Integrating OpenLineage with (by special guests & )*\n4. *Spark/Column lineage update*\n5. *Airflow Provider update*\n6. *Roadmap Discussion*\n*Find more details and RSVP *.","u":"U02LXF3HUN7","t":"1694441261.486759"},{"m":"\nThis month’s TSC meeting is next Thursday the 14th at 10am PT. On the tentative agenda:\n• announcements\n• recent releases\n• demo: Spark integration tests in Databricks runtime\n• open discussion\n• more (TBA)\nMore info and the meeting link can be found on the . All are welcome! Also, feel free to reply or DM me with discussion topics, agenda items, etc.","u":"U02LXF3HUN7","t":"1694113940.400549"},{"m":"Has there been any conversation on the extensibility of facets/concepts? E.g.:\n• how does one extends the list of run states to add a paused/resumed state?\n• how does one extend to add a created at field?","u":"U0595Q78HUG","t":"1694036652.124299"},{"m":"Hello Everyone,\n\nI've been diving into the Marquez codebase and found a performance bottleneck in `JobDao.java` for the query related to `namespaceName=MyNameSpace` `limit=10` and 12s with `limit=25`. I managed to optimize it using CTEs, and the execution times dropped dramatically to 300ms (for `limit=100`) and under 100ms (for `limit=25` ) on the same cluster.\nIssue link : \n\nI believe there's even more room for optimization, especially if we adjust the `job_facets_view` to include the `namespace_name` column.\n\nWould the team be open to a PR where I share the optimized query and discuss potential further refinements? I believe these changes could significantly enhance the Marquez web UI experience.\n\nPR link : \n\nLooking forward to your feedback.","u":"U05HBLE7YPL","t":"1694032987.624809"},{"m":"it looks like my dynamic task mapping in Airflow has the same run ID in marquez, so even if I am processing 100 files, there is only one version of the data. is there a way to have a separate version of each dynamic task so I can track the filename etc?","u":"U05NMJ0NBUK","t":"1693877705.781699"},{"m":"Also, another small clarification is that when using `MergeIntoCommand`, I'm receiving the lineage events on the backend, but I cannot seem to find any logging of the payload when I enable debug mode in openlineage. I remember there was a similar issue reported by another user in the past. May I check if it might be possible to help with this? It's making debugging quite hard for these cases. Thanks!","u":"U04EZ2LPDV4","t":"1693823945.734419"},{"m":"Hi guys, I'd like to capture the `spark.databricks.clusterUsageTags.clusterAllTags` property from databricks. However, the value of this is a list of keys, and therefore cannot be supported by custom environment facet builder.\nI was thinking that capturing this property might be useful for most databricks workloads, and whether it might make sense to auto-capture it along with other databricks variables, similar to how we capture mount points for the databricks jobs.\nDoes this sound okay? If so, then I can help to contribute this functionality","u":"U04EZ2LPDV4","t":"1693813108.356499"},{"m":"\nThe is out now! Please to get it directly in your inbox each month.","u":"U02LXF3HUN7","t":"1693602981.025489"},{"m":"It sounds like there have been a few announcements at Google Next:\n\n","u":"U01DCLP0GU9","t":"1693519820.292119"},{"m":"Will the August meeting be put up at soon? (usually it’s up in a few days :slightly_smiling_face:","u":"U0323HG8C8H","t":"1693510399.153829"},{"m":"ok,it`s my first use thie lineage tool. first,I added dependences in my pom.xml like this:\n<dependency>\n <groupId>io.openlineage</groupId>\n <artifactId>openlineage-java</artifactId>\n <version>0.12.0</version>\n </dependency>\n <dependency>\n <groupId>org.apache.logging.log4j</groupId>\n <artifactId>log4j-api</artifactId>\n <version>2.7</version>\n </dependency>\n <dependency>\n <groupId>org.apache.logging.log4j</groupId>\n <artifactId>log4j-core</artifactId>\n <version>2.7</version>\n </dependency>\n <dependency>\n <groupId>org.apache.logging.log4j</groupId>\n <artifactId>log4j-slf4j-impl</artifactId>\n <version>2.7</version>\n </dependency>\n <dependency>\n <groupId>io.openlineage</groupId>\n <artifactId>openlineage-spark</artifactId>\n <version>0.30.1</version>\n </dependency>\n\nmy spark version is 3.3.1 and the version can not change\n\nsecond, in file Openlineage/intergration/spark I enter command : docker-compose up and follow the steps in this doc:\n\nthere is no erro when i use notebook to execute pyspark for openlineage and I could get json message.\nbut after I enter \"docker-compose up\" ,I want to use my Idea tool to execute scala code like above,the erro happend like above. It seems that I does not configure the environment correctly. so how can i fix the problem .","u":"U05NGJ8AM8X","t":"1693468312.450209"},{"m":"hello,everyone,i can run openLineage spark code in my notebook with python,but when use my idea to execute scala code like this:\nimport org.apache.spark.internal.Logging\nimport org.apache.spark.sql.SparkSession\nimport io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml\nimport org.apache.spark.scheduler.{SparkListener, SparkListenerApplicationEnd, SparkListenerApplicationStart}\nimport sun.java2d.marlin.MarlinUtils.logInfo\nobject Test {\n def main(args: Array[String]): Unit = {\n\n val spark = SparkSession\n .builder()\n .master(\"local\")\n .appName(\"test\")\n .config(\"spark.jars.packages\",\"io.openlineage:openlineage-spark:0.12.0\")\n .config(\"spark.extraListeners\",\"io.openlineage.spark.agent.OpenLineageSparkListener\")\n .config(\"spark.openlineage.transport.type\",\"console\")\n .getOrCreate()\n\n spark.sparkContext.setLogLevel(\"INFO\")\n\n //spark.sparkContext.addSparkListener(new MySparkAppListener)\n import spark.implicits._\n val input = Seq((1, \"zs\", 2020), (2, \"ls\", 2023)).toDF(\"id\", \"name\", \"year\")\n\n input.select(\"id\", \"name\").orderBy(\"id\").show()\n\n }\n\n}\n\nthere is something wrong:\nException in thread \"spark-listener-group-shared\" java.lang.NoSuchMethodError: io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml(Ljava/io/InputStream;)Lio/openlineage/client/OpenLineageYaml;\n\tat io.openlineage.spark.agent.ArgumentParser.extractOpenlineageConfFromSparkConf(ArgumentParser.java:114)\n\tat io.openlineage.spark.agent.ArgumentParser.parse(ArgumentParser.java:78)\n\tat io.openlineage.spark.agent.OpenLineageSparkListener.initializeContextFactoryIfNotInitialized(OpenLineageSparkListener.java:277)\n\tat io.openlineage.spark.agent.OpenLineageSparkListener.onApplicationStart(OpenLineageSparkListener.java:267)\n\tat org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:55)\n\tat org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)\n\tat org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)\n\tat org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)\n\tat org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)\n\tat org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)\n\tat org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)\n\tat org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)\n\tat scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)\n\tat scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)\n\tat $apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)\n\tat org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)\n\tat org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446)\n\tat org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)\n\ni want to know how can i set idea scala environment correctly","u":"U05NGJ8AM8X","t":"1693463508.522729"},{"m":"Can anyone let 3 people stuck downstairs into the 7th floor?","u":"U05EC8WB74N","t":"1693445911.744069"},{"m":"\nFriendly reminder: there’s a meetup at Astronomer’s offices in SF!","u":"U02LXF3HUN7","t":"1693410605.894959"},{"m":"Hi, Will really appreciate if someone can guide me or provide me any pointer - if they have been able to implement authentication/authorization for access to Marquez. Have not seen much info around it. Any pointers greatly appreciated. Thanks in advance.","u":"U05JY6MN8MS","t":"1693397729.978309"},{"m":"for namespaces, if my data is moving between sources (SFTP -> GCS -> Azure Blob (synapse connects to parquet datasets) then should my namespace be based on the client I am working with? my current namespace has been to refer to the bucket, but that falls apart when considering the data sources and some destinations. perhaps I should just add a field for client-name instead to have a consolidated view?","u":"U05NMJ0NBUK","t":"1693329152.193929"},{"m":"hi folks, for now I'm producing `.jsonl` (or `.ndjson` ) files with one event per line, do you know if there's any way to validate those? would standard JSON Schema tools work?","u":"U05HFGKEYVB","t":"1693300839.710459"},{"m":"Hello, I'm currently in the process of following the instructions outlined in the provided getting started guide at . However, I've encountered a problem while attempting to complete **Step 1** of the guide. Unfortunately, I'm encountering an internal server error at this stage. I did manage to successfully run Marquez, but it appears that there might be an issue that needs to be addressed. I have attached screen shots.","u":"U05QNRSQW1E","t":"1693293484.701439"},{"m":"New on the OpenLineage blog: , including:\n• the critical improvements it brings to the integration\n• the high-level design\n• implementation details\n• an example operator\n• planned enhancements\n• a list of supported operators\n• more.\nThe post, by <@U01RA9B5GG2>, <@U01DCLP0GU9> and myself is live now on the OpenLineage blog.","u":"U02LXF3HUN7","t":"1693267537.810959"},{"m":"\nThe agenda for the on 9/18 has been updated. This promises to be an exciting, richly productive discussion. Don’t miss it if you’ll be in the area!\n1. Intros\n2. Evolution of spec presentation/discussion (project background/history)\n3. State of the community\n4. Spark/Column lineage update\n5. Airflow Provider update \n6. Roadmap Discussion\n7. Action items review/next steps\n","u":"U02LXF3HUN7","t":"1693258111.112809"},{"m":"Hi folks. I have some pure golang jobs from which I need to emit OL events to Marquez. Is the right way to go about this to generate a Golang client from the Marquez OpenAPI spec and use that client from my go jobs?","u":"U05PVS8GRJ6","t":"1693243558.640159"},{"m":"and something else: I understand that Marquez does not yet support the 2.0 spec, hence it's incompatible with static metadata right? I tried to emit a list of `DatasetEvent` s and got `HTTPError: 422 Client Error: Unprocessable Entity for url: ` (I'm using a `FileTransport` for now)","u":"U05HFGKEYVB","t":"1693212561.369509"},{"m":"hi folks, I'm looking into exporting static metadata, and found that `DatasetEvent` requires a `eventTime`, which in my mind doesn't make sense for static events. I'm setting it to `None` and the Python client seems to work, but wanted to ask if I'm missing something.","u":"U05HFGKEYVB","t":"1693212473.810659"},{"m":"hi Openlineage team , we would like to join one of your meetups(me and <@U05HK41VCH1> nad @Phil Rolph and we're wondering if you are hosting any meetups after the 18/9 ? We are trying to join this but air - tickets are quite expensive","u":"U05J5GRKY10","t":"1692975450.380969"},{"m":"\nFriendly reminder: our next in-person meetup is next Wednesday, August 30th in San Francisco at Astronomer’s offices in the Financial District. You can sign up and find the details on the .","u":"U02LXF3HUN7","t":"1692973763.570629"},{"m":"\nWe released OpenLineage 1.1.0, including:\nAdditions:\n• Flink: create Openlineage configuration based on Flink configuration `#2033` \n• Java: add Javadocs to the Java client `#2004` \n• Spark: append output dataset name to a job name `#2036` \n• Spark: support Spark 3.4.1 `#2057` \nFixes:\n• Flink: fix a bug when getting schema for `KafkaSink` `#2042` \n• Spark: fix ignored event `adaptive_spark_plan` in Databricks `#2061` \nPlus additional bug fixes, doc changes and more.\nThanks to all the contributors, especially new contributors @pentium3 and <@U05HBLE7YPL>!\n*Release:* \n*Changelog:* \n*Commit history:* \n*Maven:* \n*PyPI:* ","u":"U02LXF3HUN7","t":"1692817450.338859"},{"m":"Hey folks! Do we have clear step-by-step documentation on how we can leverage the `ServiceLoader` based approach for injecting specific OpenLineage customisations for tweaking the transport type with defaults / tweaking column level lineage etc?","u":"U05JBHLPY8K","t":"1692810528.463669"},{"m":"Approve a new release please :slightly_smiling_face:\n• Fix spark integration filtering Databricks events. ","u":"U05HBLE7YPL","t":"1692802510.386629"}],"C01NAFMBVEY":[],"C030F1J0264":[],"C04E3Q18RR9":[],"C04JPTTC876":[],"C04QSV0GG23":[],"C04THH1V90X":[],"C051C93UZK9":[],"C055GGUFMHQ":[],"C056YHEU680":[{"m":"<@U05TQPZ4R4L> has joined the channel","u":"U05TQPZ4R4L","t":"1695502057.851259"},{"m":"Some pictures from last night","u":"U01DCLP0GU9","t":"1693538290.446329"},{"m":"<@U05Q3HT6PBR> has joined the channel","u":"U05Q3HT6PBR","t":"1693520941.967439"},{"m":"Time: 5:30-8:30 pm\nAddress: 8 California St., San Francisco, CA, seventh floor\nGetting in: someone from Astronomer will be in the lobby to direct you","u":"U02LXF3HUN7","t":"1693422775.587509"},{"m":"Adding the venue info in case it’s more convenient than the meetup page:","u":"U02LXF3HUN7","t":"1693422678.406409"}],"C05N442RQUA":[{"m":"Hi, if you’re wondering if you’re in the right place: look for Uncle Tetsu’s Cheesecake nextdoor and for the address (600 Bay St) above the door. The building is an older one (unlike the meeting space itself, which is modern and well-appointed)","u":"U02LXF3HUN7","t":"1695068433.208409"},{"m":"Looking forward to seeing you on Monday! Here’s the time/place info again for your convenience:\n• Date: 9/18\n• Time: 5-8:00 PM ET\n• Place: Canarts, 600 Bay St., #410 (around the corner from the Airflow Summit venue)\n• Venue phone: \n• Meetup page with more info and signup: \nPlease send a message if you find yourself stuck in the lobby, etc.","u":"U02LXF3HUN7","t":"1694794649.751439"},{"m":"<@U05SXDWVA7K> has joined the channel","u":"U05SXDWVA7K","t":"1694787071.646859"},{"m":"\nIt’s hard to believe this is happening in just one week! Here’s the updated agenda:\n1. *Intros*\n2. *Evolution of spec presentation/discussion (project background/history)*\n3. *State of the community*\n4. *Integrating OpenLineage with (by special guests & )*\n5. *Spark/Column lineage update*\n6. *Airflow Provider update*\n7. *Roadmap Discussion*\n8. *Action items review/next steps*\nFind the details and RSVP .","u":"U02LXF3HUN7","t":"1694441637.116609"},{"m":"Most OpenLineage regular contributors will be there. It will be fun to be all in person. Everyone is encouraged to join","u":"U01DCLP0GU9","t":"1693624251.155569"},{"m":"really looking forward to meeting all of you in Toronto!!","u":"U01HNKK4XAM","t":"1692984822.264569"},{"m":"Some belated updates on this in case you’re not aware:\n• Date: 9/18\n• Time: 5-8:00 PM ET\n• Place: Canarts, 600 Bay St., #410 (around the corner from the Airflow Summit venue)\n• Venue phone: \n• Meetup for more info and to sign up: ","u":"U02LXF3HUN7","t":"1692984607.290939"}],"C05PD7VJ52S":[{"m":"<@U05QHG1NJ8J> has joined the channel","u":"U05QHG1NJ8J","t":"1693512257.995209"},{"m":"<@U01RA9B5GG2> has joined the channel","u":"U01RA9B5GG2","t":"1692984861.392349"},{"m":"Yes, hope so! Thank you for your interest in joining a meetup!","u":"U02LXF3HUN7","t":"1692983998.092189"},{"m":"hopefully meet you soon in London","u":"U05HK41VCH1","t":"1692982281.255129"},{"m":"Thanks Michael for starting this channel","u":"U05HK41VCH1","t":"1692982268.668049"},{"m":"yes absolutely will give you an answer by Monday","u":"U05J5GRKY10","t":"1692979169.558489"},{"m":"OK! Would you please let me know when you know, and we’ll go from there?","u":"U02LXF3HUN7","t":"1692979126.425069"},{"m":"and if that not the case i can provide personal space","u":"U05J5GRKY10","t":"1692979080.273399"},{"m":"I am pretty sure you can use our 6point6 offices or at least part of it","u":"U05J5GRKY10","t":"1692979067.390839"},{"m":"I will have to confirm but 99% yes","u":"U05J5GRKY10","t":"1692978958.692039"},{"m":"Great! Do you happen to have space we could use?","u":"U02LXF3HUN7","t":"1692978871.220909"},{"m":"thats perfect !","u":"U05J5GRKY10","t":"1692978852.583609"},{"m":"Hi George, nice to meet you. Thanks for asking about future meetups. Would November be too soon, or what’s a good timeframe for you all?","u":"U02LXF3HUN7","t":"1692978834.508549"},{"m":"thanks so much !","u":"U05J5GRKY10","t":"1692978816.416879"},{"m":"Hi Michael","u":"U05J5GRKY10","t":"1692978769.711289"},{"m":"<@U05HK41VCH1> has joined the channel","u":"U05HK41VCH1","t":"1692978760.556819"},{"m":"<@U01HNKK4XAM> has joined the channel","u":"U01HNKK4XAM","t":"1692978735.461499"},{"m":"<@U05J5GRKY10> has joined the channel","u":"U05J5GRKY10","t":"1692978735.367849"},{"m":"<@U02LXF3HUN7> has joined the channel","u":"U02LXF3HUN7","t":"1692978725.076179"}],"C05U3UC85LM":[{"m":"<@U0620HU51HA> has joined the channel","u":"U0620HU51HA","t":"1697734944.171989"},{"m":"<@U05U9K21LSG> would be great if we could get your eyes on this PR: ","u":"U01HNKK4XAM","t":"1697201275.793679"},{"m":"Just seeing this, we had a company holiday yesterday. Yes, fluent data sources are our new way of connecting to data and the older \"block-style\" is deprecated and will be removed when we cut 0.18.0. I'm not sure of the timing of that but likely in the next couple months.","u":"U05U9K21LSG","t":"1696979522.397239"},{"m":"<@U05U9929K3N> <@U05U9K21LSG> ^^","u":"U01HNKK4XAM","t":"1696860879.713919"},{"m":"Hello guys! I’ve been looking recently into changes in GX.\n\nis this the major change you’d like to introduce in OL<-> GX?","u":"U02S6F54MAB","t":"1696852012.119669"},{"m":"<@U05U9K21LSG> has joined the channel","u":"U05U9K21LSG","t":"1695916771.601439"},{"m":"<@U05U9929K3N> it was great meeting earlier, looking forward to collaborating on this!","u":"U01HNKK4XAM","t":"1695840258.594299"},{"m":"<@U01RA9B5GG2> has joined the channel","u":"U01RA9B5GG2","t":"1695836477.653589"},{"m":"<@U02S6F54MAB> has joined the channel","u":"U02S6F54MAB","t":"1695836303.953919"},{"m":"<@U01HNKK4XAM> has joined the channel","u":"U01HNKK4XAM","t":"1695836303.827639"},{"m":"<@U05U9929K3N> has joined the channel","u":"U05U9929K3N","t":"1695836303.727049"},{"m":"<@U02LXF3HUN7> has joined the channel","u":"U02LXF3HUN7","t":"1695836283.617309"}],"C065PQ4TL8K":[{"m":"Maybe move today's meeting earlier, since no one from west coast is joining? <@U01HNKK4XAM>","u":"U01RA9B5GG2","t":"1700562211.366219"},{"m":"I’m off on vacation. See you in a week","u":"U01DCLP0GU9","t":"1700272614.735719"},{"m":"just searching for OpenLineage in the Datahub code base. They have an “interesting” approach? ","u":"U01DCLP0GU9","t":"1700246539.228259"},{"m":"CFP for Berlin Buzzwords went up: \nStill over 3 months to submit :slightly_smiling_face:","u":"U01RA9B5GG2","t":"1700155042.082759"},{"m":"worlds are colliding: 6point6 has been acquired by Accenture","u":"U02LXF3HUN7","t":"1700145084.414099"},{"m":"Any opinions about a free task management alternative to the free version of Notion (10-person limit)? Looking at Trello for keeping track of talks.","u":"U02LXF3HUN7","t":"1700088623.669029"},{"m":"have we discussed adding column level lineage support to Airflow? ","u":"U01DCMDFHBK","t":"1700087546.032789"},{"m":"Apparently an admin can view a Slack archive at any time at this URL: . Only public channels are available, though.","u":"U02LXF3HUN7","t":"1700078359.877599"},{"m":"Anyone have thoughts about how to address the question about “pain points” here? . (Listing pros is easy — it’s the cons we don’t have boilerplate for)","u":"U02LXF3HUN7","t":"1700078230.775579"},{"m":"is it time to *support hudi*?","u":"U02S6F54MAB","t":"1700068651.517579"},{"m":"Got the doc + poc for hook-level coverage: ","u":"U01RA9B5GG2","t":"1700066684.350639"},{"m":"hey look, more fun\n","u":"U02S6F54MAB","t":"1700040937.040239"},{"m":"also, what about this PR? ","u":"U01DCMDFHBK","t":"1700037370.235629"},{"m":"`_Minor_: We can consider defining a _run_state column and eventually dropping the event_type. That is, we can consider columns prefixed with _ to be \"remappings\" of OL properties to Marquez.` -> didn't get this one. Is it for now or some future plans?","u":"U02MK6YNAQ5","t":"1700037282.474539"},{"m":"<@U02MK6YNAQ5> approved PR with minor comments, I think the is one comment we’ll need to address before merging; otherwise solid work dude :ok_hand:","u":"U01DCMDFHBK","t":"1700037147.106879"},{"m":"\nfun PR incoming","u":"U02S6F54MAB","t":"1700004648.584649"},{"m":":wave:","u":"U01RA9B5GG2","t":"1699987988.642529"},{"m":":ocean:","u":"U053LLVTHRN","t":"1699982987.125549"},{"m":":wave:","u":"U01DCMDFHBK","t":"1699982333.485469"},{"m":":wave: ","u":"U01DCLP0GU9","t":"1699982322.990799"},{"m":":wave:","u":"U02LXF3HUN7","t":"1699982179.651129"},{"m":"<@U053LLVTHRN> has joined the channel","u":"U053LLVTHRN","t":"1699982042.414079"},{"m":":wave:","u":"U02S6F54MAB","t":"1699982037.267109"},{"m":"<@U05KKM07PJP> has joined the channel","u":"U05KKM07PJP","t":"1699982026.646699"},{"m":"<@U01DCMDFHBK> has joined the channel","u":"U01DCMDFHBK","t":"1699982026.554329"},{"m":"<@U02LXF3HUN7> has joined the channel","u":"U02LXF3HUN7","t":"1699982026.462039"},{"m":"<@U02S6F54MAB> has joined the channel","u":"U02S6F54MAB","t":"1699982026.350149"},{"m":"<@U02MK6YNAQ5> has joined the channel","u":"U02MK6YNAQ5","t":"1699982026.266479"},{"m":"<@U01DCLP0GU9> has joined the channel","u":"U01DCLP0GU9","t":"1699982026.191829"},{"m":"<@U01RA9B5GG2> has joined the channel","u":"U01RA9B5GG2","t":"1699981990.850589"},{"m":"<@U01HNKK4XAM> has joined the channel","u":"U01HNKK4XAM","t":"1699981986.459199"}]},"pages":{"C01CK9T7HKR":["1692802510.386629","1692802510.386629"],"C056YHEU680":["1693422678.406409","1693422678.406409"],"C05N442RQUA":["1692984607.290939","1692984607.290939"],"C05PD7VJ52S":["1692978725.076179","1692978725.076179"],"C05U3UC85LM":["1695836283.617309","1695836283.617309"],"C065PQ4TL8K":["1699981986.459199","1699981986.459199"]}}; \ No newline at end of file diff --git a/slack-archive/data/slack-archive.json b/slack-archive/data/slack-archive.json deleted file mode 100644 index a0b334e..0000000 --- a/slack-archive/data/slack-archive.json +++ /dev/null @@ -1,76 +0,0 @@ -{ - "channels": { - "C01CK9T7HKR": { - "messages": 118 - }, - "C01NAFMBVEY": { - "fullyDownloaded": true, - "messages": 0 - }, - "C030F1J0264": { - "messages": 0 - }, - "C04E3Q18RR9": { - "messages": 0 - }, - "C04JPTTC876": { - "messages": 0 - }, - "C04QSV0GG23": { - "fullyDownloaded": true, - "messages": 0 - }, - "C04THH1V90X": { - "messages": 0 - }, - "C051C93UZK9": { - "messages": 0 - }, - "C055GGUFMHQ": { - "messages": 0 - }, - "C056YHEU680": { - "messages": 5 - }, - "C05N442RQUA": { - "messages": 7 - }, - "C05PD7VJ52S": { - "messages": 19 - }, - "C05U3UC85LM": { - "messages": 12 - }, - "C065PQ4TL8K": { - "messages": 31 - } - }, - "auth": { - "ok": true, - "url": "https://openlineage.slack.com/", - "team": "OpenLineage", - "user": "michael282", - "team_id": "T01CWUYP5AR", - "user_id": "U02LXF3HUN7", - "is_enterprise_install": false, - "response_metadata": { - "scopes": [ - "read", - "client", - "identify", - "post", - "channels:history", - "groups:history", - "im:history", - "mpim:history", - "channels:read", - "files:read", - "groups:read", - "im:read", - "mpim:read", - "users:read", - "remote_files:read" - ] - } - } -} \ No newline at end of file diff --git a/slack-archive/data/users.json b/slack-archive/data/users.json deleted file mode 100644 index 1cc5e37..0000000 --- a/slack-archive/data/users.json +++ /dev/null @@ -1,2918 +0,0 @@ -{ - "U066S97A90C": { - "id": "U066S97A90C", - "team_id": "T01CWUYP5AR", - "name": "rwojcik", - "deleted": false, - "color": "2b6836", - "real_name": "Rafał Wójcik", - "tz": "Europe/Warsaw", - "tz_label": "Central European Time", - "tz_offset": 3600, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Rafał Wójcik", - "real_name_normalized": "Rafal Wojcik", - "display_name": "Rafał Wójcik", - "display_name_normalized": "Rafal Wojcik", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "09ba2780a8ae", - "image_original": "https://avatars.slack-edge.com/2023-11-21/6227762962771_09ba2780a8aeaf7658c6_original.png", - "is_custom_image": true, - "email": "rwojcik@griddynamics.com", - "first_name": "Rafał", - "last_name": "Wójcik", - "image_24": "https://avatars.slack-edge.com/2023-11-21/6227762962771_09ba2780a8aeaf7658c6_24.png", - "image_32": "https://avatars.slack-edge.com/2023-11-21/6227762962771_09ba2780a8aeaf7658c6_32.png", - "image_48": "https://avatars.slack-edge.com/2023-11-21/6227762962771_09ba2780a8aeaf7658c6_48.png", - "image_72": "https://avatars.slack-edge.com/2023-11-21/6227762962771_09ba2780a8aeaf7658c6_72.png", - "image_192": "https://avatars.slack-edge.com/2023-11-21/6227762962771_09ba2780a8aeaf7658c6_192.png", - "image_512": "https://avatars.slack-edge.com/2023-11-21/6227762962771_09ba2780a8aeaf7658c6_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-11-21/6227762962771_09ba2780a8aeaf7658c6_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1700567815, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U066CNW85D3": { - "id": "U066CNW85D3", - "team_id": "T01CWUYP5AR", - "name": "karthik.nandagiri", - "deleted": false, - "color": "684b6c", - "real_name": "karthik nandagiri", - "tz": "Asia/Kolkata", - "tz_label": "India Standard Time", - "tz_offset": 19800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "karthik nandagiri", - "real_name_normalized": "karthik nandagiri", - "display_name": "karthik nandagiri", - "display_name_normalized": "karthik nandagiri", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "e8d0a2a05e71", - "image_original": "https://avatars.slack-edge.com/2023-11-19/6205176178887_e8d0a2a05e71471d1c78_original.jpg", - "is_custom_image": true, - "email": "karthik.nandagiri@gmail.com", - "first_name": "karthik", - "last_name": "nandagiri", - "image_24": "https://avatars.slack-edge.com/2023-11-19/6205176178887_e8d0a2a05e71471d1c78_24.jpg", - "image_32": "https://avatars.slack-edge.com/2023-11-19/6205176178887_e8d0a2a05e71471d1c78_32.jpg", - "image_48": "https://avatars.slack-edge.com/2023-11-19/6205176178887_e8d0a2a05e71471d1c78_48.jpg", - "image_72": "https://avatars.slack-edge.com/2023-11-19/6205176178887_e8d0a2a05e71471d1c78_72.jpg", - "image_192": "https://avatars.slack-edge.com/2023-11-19/6205176178887_e8d0a2a05e71471d1c78_192.jpg", - "image_512": "https://avatars.slack-edge.com/2023-11-19/6205176178887_e8d0a2a05e71471d1c78_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2023-11-19/6205176178887_e8d0a2a05e71471d1c78_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1700456127, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U066HKFCHUG": { - "id": "U066HKFCHUG", - "team_id": "T01CWUYP5AR", - "name": "naresh.naresh36", - "deleted": false, - "color": "73769d", - "real_name": "Naresh reddy", - "tz": "Asia/Kolkata", - "tz_label": "India Standard Time", - "tz_offset": 19800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Naresh reddy", - "real_name_normalized": "Naresh reddy", - "display_name": "Naresh reddy", - "display_name_normalized": "Naresh reddy", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "91e664fce7c3", - "image_original": "https://avatars.slack-edge.com/2023-11-15/6192035069510_91e664fce7c3faeee1e7_original.jpg", - "is_custom_image": true, - "email": "naresh.naresh36@gmail.com", - "first_name": "Naresh", - "last_name": "reddy", - "image_24": "https://avatars.slack-edge.com/2023-11-15/6192035069510_91e664fce7c3faeee1e7_24.jpg", - "image_32": "https://avatars.slack-edge.com/2023-11-15/6192035069510_91e664fce7c3faeee1e7_32.jpg", - "image_48": "https://avatars.slack-edge.com/2023-11-15/6192035069510_91e664fce7c3faeee1e7_48.jpg", - "image_72": "https://avatars.slack-edge.com/2023-11-15/6192035069510_91e664fce7c3faeee1e7_72.jpg", - "image_192": "https://avatars.slack-edge.com/2023-11-15/6192035069510_91e664fce7c3faeee1e7_192.jpg", - "image_512": "https://avatars.slack-edge.com/2023-11-15/6192035069510_91e664fce7c3faeee1e7_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2023-11-15/6192035069510_91e664fce7c3faeee1e7_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1700049396, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05T8BJD4DU": { - "id": "U05T8BJD4DU", - "team_id": "T01CWUYP5AR", - "name": "jasonyip", - "deleted": false, - "color": "e96699", - "real_name": "Jason Yip", - "tz": "America/Los_Angeles", - "tz_label": "Pacific Standard Time", - "tz_offset": -28800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Jason Yip", - "real_name_normalized": "Jason Yip", - "display_name": "", - "display_name_normalized": "", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "gec3e494f453", - "email": "jasonyip@gmail.com", - "first_name": "Jason", - "last_name": "Yip", - "image_24": "https://secure.gravatar.com/avatar/ec3e494f453bc030f8732cda718b8ac5.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0025-24.png", - "image_32": "https://secure.gravatar.com/avatar/ec3e494f453bc030f8732cda718b8ac5.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0025-32.png", - "image_48": "https://secure.gravatar.com/avatar/ec3e494f453bc030f8732cda718b8ac5.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0025-48.png", - "image_72": "https://secure.gravatar.com/avatar/ec3e494f453bc030f8732cda718b8ac5.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0025-72.png", - "image_192": "https://secure.gravatar.com/avatar/ec3e494f453bc030f8732cda718b8ac5.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0025-192.png", - "image_512": "https://secure.gravatar.com/avatar/ec3e494f453bc030f8732cda718b8ac5.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0025-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1695335591, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U02LXF3HUN7": { - "id": "U02LXF3HUN7", - "team_id": "T01CWUYP5AR", - "name": "michael282", - "deleted": false, - "color": "50a0cf", - "real_name": "Michael Robinson", - "tz": "America/New_York", - "tz_label": "Eastern Standard Time", - "tz_offset": -18000, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Michael Robinson", - "real_name_normalized": "Michael Robinson", - "display_name": "", - "display_name_normalized": "", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "66fea720e950", - "image_original": "https://avatars.slack-edge.com/2022-01-25/3019716733729_66fea720e9504dc08144_original.jpg", - "is_custom_image": true, - "email": "michael.robinson@astronomer.io", - "huddle_state": "default_unset", - "huddle_state_expiration_ts": 0, - "first_name": "Michael", - "last_name": "Robinson", - "image_24": "https://avatars.slack-edge.com/2022-01-25/3019716733729_66fea720e9504dc08144_24.jpg", - "image_32": "https://avatars.slack-edge.com/2022-01-25/3019716733729_66fea720e9504dc08144_32.jpg", - "image_48": "https://avatars.slack-edge.com/2022-01-25/3019716733729_66fea720e9504dc08144_48.jpg", - "image_72": "https://avatars.slack-edge.com/2022-01-25/3019716733729_66fea720e9504dc08144_72.jpg", - "image_192": "https://avatars.slack-edge.com/2022-01-25/3019716733729_66fea720e9504dc08144_192.jpg", - "image_512": "https://avatars.slack-edge.com/2022-01-25/3019716733729_66fea720e9504dc08144_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2022-01-25/3019716733729_66fea720e9504dc08144_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": true, - "is_owner": true, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1700183405, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05TU0U224A": { - "id": "U05TU0U224A", - "team_id": "T01CWUYP5AR", - "name": "rodrigo.maia", - "deleted": false, - "color": "684b6c", - "real_name": "Rodrigo Maia", - "tz": "Europe/Belgrade", - "tz_label": "Central European Time", - "tz_offset": 3600, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Rodrigo Maia", - "real_name_normalized": "Rodrigo Maia", - "display_name": "Rodrigo Maia", - "display_name_normalized": "Rodrigo Maia", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "1373b2f5d8ba", - "image_original": "https://avatars.slack-edge.com/2023-09-25/5943104201445_1373b2f5d8babe2b8c2a_original.png", - "is_custom_image": true, - "email": "rodrigo.maia@manta.io", - "first_name": "Rodrigo", - "last_name": "Maia", - "image_24": "https://avatars.slack-edge.com/2023-09-25/5943104201445_1373b2f5d8babe2b8c2a_24.png", - "image_32": "https://avatars.slack-edge.com/2023-09-25/5943104201445_1373b2f5d8babe2b8c2a_32.png", - "image_48": "https://avatars.slack-edge.com/2023-09-25/5943104201445_1373b2f5d8babe2b8c2a_48.png", - "image_72": "https://avatars.slack-edge.com/2023-09-25/5943104201445_1373b2f5d8babe2b8c2a_72.png", - "image_192": "https://avatars.slack-edge.com/2023-09-25/5943104201445_1373b2f5d8babe2b8c2a_192.png", - "image_512": "https://avatars.slack-edge.com/2023-09-25/5943104201445_1373b2f5d8babe2b8c2a_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-09-25/5943104201445_1373b2f5d8babe2b8c2a_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1699862761, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05NMJ0NBUK": { - "id": "U05NMJ0NBUK", - "team_id": "T01CWUYP5AR", - "name": "lance.dacey2", - "deleted": false, - "color": "bb86b7", - "real_name": "ldacey", - "tz": "America/Los_Angeles", - "tz_label": "Pacific Standard Time", - "tz_offset": -28800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "ldacey", - "real_name_normalized": "ldacey", - "display_name": "ldacey", - "display_name_normalized": "ldacey", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "gc77c4db9ebb", - "email": "lance.dacey2@sutherlandglobal.com", - "first_name": "ldacey", - "last_name": "", - "image_24": "https://secure.gravatar.com/avatar/c77c4db9ebb850f6b5974815f40575b8.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0002-24.png", - "image_32": "https://secure.gravatar.com/avatar/c77c4db9ebb850f6b5974815f40575b8.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0002-32.png", - "image_48": "https://secure.gravatar.com/avatar/c77c4db9ebb850f6b5974815f40575b8.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0002-48.png", - "image_72": "https://secure.gravatar.com/avatar/c77c4db9ebb850f6b5974815f40575b8.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0002-72.png", - "image_192": "https://secure.gravatar.com/avatar/c77c4db9ebb850f6b5974815f40575b8.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0002-192.png", - "image_512": "https://secure.gravatar.com/avatar/c77c4db9ebb850f6b5974815f40575b8.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0002-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1696099450, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05J9LZ355L": { - "id": "U05J9LZ355L", - "team_id": "T01CWUYP5AR", - "name": "yannick.libert.partne", - "deleted": false, - "color": "e23f99", - "real_name": "Yannick Libert", - "tz": "Europe/Brussels", - "tz_label": "Central European Time", - "tz_offset": 3600, - "profile": { - "title": "Tech Lead @ Decathlon", - "phone": "", - "skype": "", - "real_name": "Yannick Libert", - "real_name_normalized": "Yannick Libert", - "display_name": "Yannick Libert", - "display_name_normalized": "Yannick Libert", - "fields": { - "Xf03UL3S8CP8": { - "value": "Tech Lead @ Decathlon", - "alt": "" - } - }, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "2d7ad30e24c5", - "image_original": "https://avatars.slack-edge.com/2023-07-24/5613838546967_2d7ad30e24c5e74bfdbb_original.png", - "is_custom_image": true, - "email": "yannick.libert.partner@decathlon.com", - "first_name": "Yannick", - "last_name": "Libert", - "image_24": "https://avatars.slack-edge.com/2023-07-24/5613838546967_2d7ad30e24c5e74bfdbb_24.png", - "image_32": "https://avatars.slack-edge.com/2023-07-24/5613838546967_2d7ad30e24c5e74bfdbb_32.png", - "image_48": "https://avatars.slack-edge.com/2023-07-24/5613838546967_2d7ad30e24c5e74bfdbb_48.png", - "image_72": "https://avatars.slack-edge.com/2023-07-24/5613838546967_2d7ad30e24c5e74bfdbb_72.png", - "image_192": "https://avatars.slack-edge.com/2023-07-24/5613838546967_2d7ad30e24c5e74bfdbb_192.png", - "image_512": "https://avatars.slack-edge.com/2023-07-24/5613838546967_2d7ad30e24c5e74bfdbb_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-07-24/5613838546967_2d7ad30e24c5e74bfdbb_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1693830902, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05JBHLPY8K": { - "id": "U05JBHLPY8K", - "team_id": "T01CWUYP5AR", - "name": "athityakumar", - "deleted": false, - "color": "84b22f", - "real_name": "Athitya Kumar", - "tz": "Asia/Kolkata", - "tz_label": "India Standard Time", - "tz_offset": 19800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Athitya Kumar", - "real_name_normalized": "Athitya Kumar", - "display_name": "Athitya Kumar", - "display_name_normalized": "Athitya Kumar", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "055f51bd1a05", - "image_original": "https://avatars.slack-edge.com/2023-07-23/5623598828611_055f51bd1a0583028e25_original.jpg", - "is_custom_image": true, - "email": "athityakumar@gmail.com", - "first_name": "Athitya", - "last_name": "Kumar", - "image_24": "https://avatars.slack-edge.com/2023-07-23/5623598828611_055f51bd1a0583028e25_24.jpg", - "image_32": "https://avatars.slack-edge.com/2023-07-23/5623598828611_055f51bd1a0583028e25_32.jpg", - "image_48": "https://avatars.slack-edge.com/2023-07-23/5623598828611_055f51bd1a0583028e25_48.jpg", - "image_72": "https://avatars.slack-edge.com/2023-07-23/5623598828611_055f51bd1a0583028e25_72.jpg", - "image_192": "https://avatars.slack-edge.com/2023-07-23/5623598828611_055f51bd1a0583028e25_192.jpg", - "image_512": "https://avatars.slack-edge.com/2023-07-23/5623598828611_055f51bd1a0583028e25_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2023-07-23/5623598828611_055f51bd1a0583028e25_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1690133734, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U0635GK8Y14": { - "id": "U0635GK8Y14", - "team_id": "T01CWUYP5AR", - "name": "david.goss", - "deleted": false, - "color": "e475df", - "real_name": "David Goss", - "tz": "Europe/London", - "tz_label": "Greenwich Mean Time", - "tz_offset": 0, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "David Goss", - "real_name_normalized": "David Goss", - "display_name": "David Goss", - "display_name_normalized": "David Goss", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "18447d963fbf", - "image_original": "https://avatars.slack-edge.com/2023-10-30/6126846084449_18447d963fbf13826099_original.jpg", - "is_custom_image": true, - "email": "david.goss@matillion.com", - "first_name": "David", - "last_name": "Goss", - "image_24": "https://avatars.slack-edge.com/2023-10-30/6126846084449_18447d963fbf13826099_24.jpg", - "image_32": "https://avatars.slack-edge.com/2023-10-30/6126846084449_18447d963fbf13826099_32.jpg", - "image_48": "https://avatars.slack-edge.com/2023-10-30/6126846084449_18447d963fbf13826099_48.jpg", - "image_72": "https://avatars.slack-edge.com/2023-10-30/6126846084449_18447d963fbf13826099_72.jpg", - "image_192": "https://avatars.slack-edge.com/2023-10-30/6126846084449_18447d963fbf13826099_192.jpg", - "image_512": "https://avatars.slack-edge.com/2023-10-30/6126846084449_18447d963fbf13826099_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2023-10-30/6126846084449_18447d963fbf13826099_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1698656974, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U062Q95A1FG": { - "id": "U062Q95A1FG", - "team_id": "T01CWUYP5AR", - "name": "n.priya88", - "deleted": false, - "color": "827327", - "real_name": "priya narayana", - "tz": "Asia/Kolkata", - "tz_label": "India Standard Time", - "tz_offset": 19800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "priya narayana", - "real_name_normalized": "priya narayana", - "display_name": "priya narayana", - "display_name_normalized": "priya narayana", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "2017b9ef7939", - "image_original": "https://avatars.slack-edge.com/2023-10-26/6084416738247_2017b9ef79397fadc4f2_original.png", - "is_custom_image": true, - "email": "n.priya88@gmail.com", - "first_name": "priya", - "last_name": "narayana", - "image_24": "https://avatars.slack-edge.com/2023-10-26/6084416738247_2017b9ef79397fadc4f2_24.png", - "image_32": "https://avatars.slack-edge.com/2023-10-26/6084416738247_2017b9ef79397fadc4f2_32.png", - "image_48": "https://avatars.slack-edge.com/2023-10-26/6084416738247_2017b9ef79397fadc4f2_48.png", - "image_72": "https://avatars.slack-edge.com/2023-10-26/6084416738247_2017b9ef79397fadc4f2_72.png", - "image_192": "https://avatars.slack-edge.com/2023-10-26/6084416738247_2017b9ef79397fadc4f2_192.png", - "image_512": "https://avatars.slack-edge.com/2023-10-26/6084416738247_2017b9ef79397fadc4f2_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-10-26/6084416738247_2017b9ef79397fadc4f2_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1698314037, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U063YP6UJJ0": { - "id": "U063YP6UJJ0", - "team_id": "T01CWUYP5AR", - "name": "fangmik", - "deleted": false, - "color": "dd8527", - "real_name": "Mike Fang", - "tz": "America/Los_Angeles", - "tz_label": "Pacific Standard Time", - "tz_offset": -28800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Mike Fang", - "real_name_normalized": "Mike Fang", - "display_name": "Mike Fang", - "display_name_normalized": "Mike Fang", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "ge39c36e2357", - "email": "fangmik@amazon.com", - "first_name": "Mike", - "last_name": "Fang", - "image_24": "https://secure.gravatar.com/avatar/e39c36e2357d636e6fc6fe040be50b8a.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0016-24.png", - "image_32": "https://secure.gravatar.com/avatar/e39c36e2357d636e6fc6fe040be50b8a.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0016-32.png", - "image_48": "https://secure.gravatar.com/avatar/e39c36e2357d636e6fc6fe040be50b8a.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0016-48.png", - "image_72": "https://secure.gravatar.com/avatar/e39c36e2357d636e6fc6fe040be50b8a.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0016-72.png", - "image_192": "https://secure.gravatar.com/avatar/e39c36e2357d636e6fc6fe040be50b8a.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0016-192.png", - "image_512": "https://secure.gravatar.com/avatar/e39c36e2357d636e6fc6fe040be50b8a.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0016-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1698884598, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U04AZ7992SU": { - "id": "U04AZ7992SU", - "team_id": "T01CWUYP5AR", - "name": "john490", - "deleted": false, - "color": "d1707d", - "real_name": "John Lukenoff", - "tz": "America/Los_Angeles", - "tz_label": "Pacific Standard Time", - "tz_offset": -28800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "John Lukenoff", - "real_name_normalized": "John Lukenoff", - "display_name": "John Lukenoff", - "display_name_normalized": "John Lukenoff", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "bbc3847f5026", - "image_original": "https://avatars.slack-edge.com/2022-11-09/4373247371600_bbc3847f5026244d5ddd_original.jpg", - "is_custom_image": true, - "email": "john@jlukenoff.com", - "first_name": "John", - "last_name": "Lukenoff", - "image_24": "https://avatars.slack-edge.com/2022-11-09/4373247371600_bbc3847f5026244d5ddd_24.jpg", - "image_32": "https://avatars.slack-edge.com/2022-11-09/4373247371600_bbc3847f5026244d5ddd_32.jpg", - "image_48": "https://avatars.slack-edge.com/2022-11-09/4373247371600_bbc3847f5026244d5ddd_48.jpg", - "image_72": "https://avatars.slack-edge.com/2022-11-09/4373247371600_bbc3847f5026244d5ddd_72.jpg", - "image_192": "https://avatars.slack-edge.com/2022-11-09/4373247371600_bbc3847f5026244d5ddd_192.jpg", - "image_512": "https://avatars.slack-edge.com/2022-11-09/4373247371600_bbc3847f5026244d5ddd_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2022-11-09/4373247371600_bbc3847f5026244d5ddd_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1683517767, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U062WLFMRTP": { - "id": "U062WLFMRTP", - "team_id": "T01CWUYP5AR", - "name": "hloomba", - "deleted": false, - "color": "84b22f", - "real_name": "harsh loomba", - "tz": "America/Los_Angeles", - "tz_label": "Pacific Standard Time", - "tz_offset": -28800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "harsh loomba", - "real_name_normalized": "harsh loomba", - "display_name": "harsh loomba", - "display_name_normalized": "harsh loomba", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "gd4df32d7eed", - "email": "hloomba@upgrade.com", - "first_name": "harsh", - "last_name": "loomba", - "image_24": "https://secure.gravatar.com/avatar/d4df32d7eed4ad9e92731404104c2b0a.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-24.png", - "image_32": "https://secure.gravatar.com/avatar/d4df32d7eed4ad9e92731404104c2b0a.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-32.png", - "image_48": "https://secure.gravatar.com/avatar/d4df32d7eed4ad9e92731404104c2b0a.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-48.png", - "image_72": "https://secure.gravatar.com/avatar/d4df32d7eed4ad9e92731404104c2b0a.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-72.png", - "image_192": "https://secure.gravatar.com/avatar/d4df32d7eed4ad9e92731404104c2b0a.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-192.png", - "image_512": "https://secure.gravatar.com/avatar/d4df32d7eed4ad9e92731404104c2b0a.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1698340239, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05CAULTYG2": { - "id": "U05CAULTYG2", - "team_id": "T01CWUYP5AR", - "name": "kkandaswamy", - "deleted": false, - "color": "de5f24", - "real_name": "Kavitha", - "tz": "America/New_York", - "tz_label": "Eastern Standard Time", - "tz_offset": -18000, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Kavitha", - "real_name_normalized": "Kavitha", - "display_name": "Kavitha", - "display_name_normalized": "Kavitha", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "g359455742a1", - "email": "kkandaswamy@cardinalcommerce.com", - "first_name": "Kavitha", - "last_name": "", - "image_24": "https://secure.gravatar.com/avatar/359455742a1a5fda53ba2d6d5c0b9c69.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0011-24.png", - "image_32": "https://secure.gravatar.com/avatar/359455742a1a5fda53ba2d6d5c0b9c69.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0011-32.png", - "image_48": "https://secure.gravatar.com/avatar/359455742a1a5fda53ba2d6d5c0b9c69.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0011-48.png", - "image_72": "https://secure.gravatar.com/avatar/359455742a1a5fda53ba2d6d5c0b9c69.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0011-72.png", - "image_192": "https://secure.gravatar.com/avatar/359455742a1a5fda53ba2d6d5c0b9c69.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0011-192.png", - "image_512": "https://secure.gravatar.com/avatar/359455742a1a5fda53ba2d6d5c0b9c69.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0011-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1686672269, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U06315TMT61": { - "id": "U06315TMT61", - "team_id": "T01CWUYP5AR", - "name": "splicer9904", - "deleted": false, - "color": "4ec0d6", - "real_name": "Hitesh", - "tz": "Asia/Kolkata", - "tz_label": "India Standard Time", - "tz_offset": 19800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Hitesh", - "real_name_normalized": "Hitesh", - "display_name": "Hitesh", - "display_name_normalized": "Hitesh", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "ge6f8bfec602", - "email": "splicer9904@gmail.com", - "first_name": "Hitesh", - "last_name": "", - "image_24": "https://secure.gravatar.com/avatar/e6f8bfec6021c5b6b108906268cd65ea.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0001-24.png", - "image_32": "https://secure.gravatar.com/avatar/e6f8bfec6021c5b6b108906268cd65ea.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0001-32.png", - "image_48": "https://secure.gravatar.com/avatar/e6f8bfec6021c5b6b108906268cd65ea.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0001-48.png", - "image_72": "https://secure.gravatar.com/avatar/e6f8bfec6021c5b6b108906268cd65ea.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0001-72.png", - "image_192": "https://secure.gravatar.com/avatar/e6f8bfec6021c5b6b108906268cd65ea.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0001-192.png", - "image_512": "https://secure.gravatar.com/avatar/e6f8bfec6021c5b6b108906268cd65ea.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0001-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1699161704, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U0625RZ7KR9": { - "id": "U0625RZ7KR9", - "team_id": "T01CWUYP5AR", - "name": "kpraveen420", - "deleted": false, - "color": "43761b", - "real_name": "praveen kanamarlapudi", - "tz": "America/Los_Angeles", - "tz_label": "Pacific Standard Time", - "tz_offset": -28800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "praveen kanamarlapudi", - "real_name_normalized": "praveen kanamarlapudi", - "display_name": "praveen kanamarlapudi", - "display_name_normalized": "praveen kanamarlapudi", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "ac6b9c556b2a", - "image_original": "https://avatars.slack-edge.com/2023-10-20/6067337928646_ac6b9c556b2a0741b284_original.jpg", - "is_custom_image": true, - "email": "kpraveen420@gmail.com", - "first_name": "praveen", - "last_name": "kanamarlapudi", - "image_24": "https://avatars.slack-edge.com/2023-10-20/6067337928646_ac6b9c556b2a0741b284_24.jpg", - "image_32": "https://avatars.slack-edge.com/2023-10-20/6067337928646_ac6b9c556b2a0741b284_32.jpg", - "image_48": "https://avatars.slack-edge.com/2023-10-20/6067337928646_ac6b9c556b2a0741b284_48.jpg", - "image_72": "https://avatars.slack-edge.com/2023-10-20/6067337928646_ac6b9c556b2a0741b284_72.jpg", - "image_192": "https://avatars.slack-edge.com/2023-10-20/6067337928646_ac6b9c556b2a0741b284_192.jpg", - "image_512": "https://avatars.slack-edge.com/2023-10-20/6067337928646_ac6b9c556b2a0741b284_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2023-10-20/6067337928646_ac6b9c556b2a0741b284_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1697840200, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05KCF3EEUR": { - "id": "U05KCF3EEUR", - "team_id": "T01CWUYP5AR", - "name": "savansharan_navalgi", - "deleted": false, - "color": "a72f79", - "real_name": "savan", - "tz": "Asia/Kolkata", - "tz_label": "India Standard Time", - "tz_offset": 19800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "savan", - "real_name_normalized": "savan", - "display_name": "savan", - "display_name_normalized": "savan", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "g9ec77e3c487", - "email": "SavanSharan_Navalgi@intuit.com", - "first_name": "savan", - "last_name": "", - "image_24": "https://secure.gravatar.com/avatar/9ec77e3c48747d0e4ba028cd952008b7.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0009-24.png", - "image_32": "https://secure.gravatar.com/avatar/9ec77e3c48747d0e4ba028cd952008b7.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0009-32.png", - "image_48": "https://secure.gravatar.com/avatar/9ec77e3c48747d0e4ba028cd952008b7.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0009-48.png", - "image_72": "https://secure.gravatar.com/avatar/9ec77e3c48747d0e4ba028cd952008b7.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0009-72.png", - "image_192": "https://secure.gravatar.com/avatar/9ec77e3c48747d0e4ba028cd952008b7.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0009-192.png", - "image_512": "https://secure.gravatar.com/avatar/9ec77e3c48747d0e4ba028cd952008b7.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0009-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1690434999, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U03D8K119LJ": { - "id": "U03D8K119LJ", - "team_id": "T01CWUYP5AR", - "name": "matthewparas2020", - "deleted": false, - "color": "5b89d5", - "real_name": "Matthew Paras", - "tz": "America/Los_Angeles", - "tz_label": "Pacific Standard Time", - "tz_offset": -28800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Matthew Paras", - "real_name_normalized": "Matthew Paras", - "display_name": "Matthew Paras", - "display_name_normalized": "Matthew Paras", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "e96b6372b16b", - "image_original": "https://avatars.slack-edge.com/2022-04-27/3447754077317_e96b6372b16bba3317b8_original.png", - "is_custom_image": true, - "email": "matthewparas2020@u.northwestern.edu", - "first_name": "Matthew", - "last_name": "Paras", - "image_24": "https://avatars.slack-edge.com/2022-04-27/3447754077317_e96b6372b16bba3317b8_24.png", - "image_32": "https://avatars.slack-edge.com/2022-04-27/3447754077317_e96b6372b16bba3317b8_32.png", - "image_48": "https://avatars.slack-edge.com/2022-04-27/3447754077317_e96b6372b16bba3317b8_48.png", - "image_72": "https://avatars.slack-edge.com/2022-04-27/3447754077317_e96b6372b16bba3317b8_72.png", - "image_192": "https://avatars.slack-edge.com/2022-04-27/3447754077317_e96b6372b16bba3317b8_192.png", - "image_512": "https://avatars.slack-edge.com/2022-04-27/3447754077317_e96b6372b16bba3317b8_512.png", - "image_1024": "https://avatars.slack-edge.com/2022-04-27/3447754077317_e96b6372b16bba3317b8_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1651098588, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U0616K9TSTZ": { - "id": "U0616K9TSTZ", - "team_id": "T01CWUYP5AR", - "name": "ankit.goods10", - "deleted": false, - "color": "385a86", - "real_name": "ankit jain", - "tz": "Asia/Muscat", - "tz_label": "Gulf Standard Time", - "tz_offset": 14400, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "ankit jain", - "real_name_normalized": "ankit jain", - "display_name": "ankit jain", - "display_name_normalized": "ankit jain", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "76e78615ea7a", - "image_original": "https://avatars.slack-edge.com/2023-10-17/6057639084036_76e78615ea7aa1e80e98_original.jpg", - "is_custom_image": true, - "email": "ankit.goods10@gmail.com", - "first_name": "ankit", - "last_name": "jain", - "image_24": "https://avatars.slack-edge.com/2023-10-17/6057639084036_76e78615ea7aa1e80e98_24.jpg", - "image_32": "https://avatars.slack-edge.com/2023-10-17/6057639084036_76e78615ea7aa1e80e98_32.jpg", - "image_48": "https://avatars.slack-edge.com/2023-10-17/6057639084036_76e78615ea7aa1e80e98_48.jpg", - "image_72": "https://avatars.slack-edge.com/2023-10-17/6057639084036_76e78615ea7aa1e80e98_72.jpg", - "image_192": "https://avatars.slack-edge.com/2023-10-17/6057639084036_76e78615ea7aa1e80e98_192.jpg", - "image_512": "https://avatars.slack-edge.com/2023-10-17/6057639084036_76e78615ea7aa1e80e98_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2023-10-17/6057639084036_76e78615ea7aa1e80e98_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1697597129, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U04EZ2LPDV4": { - "id": "U04EZ2LPDV4", - "team_id": "T01CWUYP5AR", - "name": "anirudh.shrinivason", - "deleted": false, - "color": "9f69e7", - "real_name": "Anirudh Shrinivason", - "tz": "Asia/Kuala_Lumpur", - "tz_label": "Singapore Standard Time", - "tz_offset": 28800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Anirudh Shrinivason", - "real_name_normalized": "Anirudh Shrinivason", - "display_name": "Anirudh Shrinivason", - "display_name_normalized": "Anirudh Shrinivason", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 1683043199, - "avatar_hash": "0d8f6e5a3065", - "image_original": "https://avatars.slack-edge.com/2023-05-22/5302055381746_0d8f6e5a3065f9038939_original.jpg", - "is_custom_image": true, - "email": "anirudh.shrinivason@grabtaxi.com", - "huddle_state": "default_unset", - "first_name": "Anirudh", - "last_name": "Shrinivason", - "image_24": "https://avatars.slack-edge.com/2023-05-22/5302055381746_0d8f6e5a3065f9038939_24.jpg", - "image_32": "https://avatars.slack-edge.com/2023-05-22/5302055381746_0d8f6e5a3065f9038939_32.jpg", - "image_48": "https://avatars.slack-edge.com/2023-05-22/5302055381746_0d8f6e5a3065f9038939_48.jpg", - "image_72": "https://avatars.slack-edge.com/2023-05-22/5302055381746_0d8f6e5a3065f9038939_72.jpg", - "image_192": "https://avatars.slack-edge.com/2023-05-22/5302055381746_0d8f6e5a3065f9038939_192.jpg", - "image_512": "https://avatars.slack-edge.com/2023-05-22/5302055381746_0d8f6e5a3065f9038939_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2023-05-22/5302055381746_0d8f6e5a3065f9038939_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1684763470, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05HK41VCH1": { - "id": "U05HK41VCH1", - "team_id": "T01CWUYP5AR", - "name": "madhav.kakumani", - "deleted": false, - "color": "e06b56", - "real_name": "Madhav Kakumani", - "tz": "Europe/London", - "tz_label": "Greenwich Mean Time", - "tz_offset": 0, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Madhav Kakumani", - "real_name_normalized": "Madhav Kakumani", - "display_name": "Madhav Kakumani", - "display_name_normalized": "Madhav Kakumani", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "558c36409436", - "image_original": "https://avatars.slack-edge.com/2023-07-18/5597252400930_558c3640943668a0198e_original.png", - "is_custom_image": true, - "email": "madhav.kakumani@6point6.co.uk", - "first_name": "Madhav", - "last_name": "Kakumani", - "image_24": "https://avatars.slack-edge.com/2023-07-18/5597252400930_558c3640943668a0198e_24.png", - "image_32": "https://avatars.slack-edge.com/2023-07-18/5597252400930_558c3640943668a0198e_32.png", - "image_48": "https://avatars.slack-edge.com/2023-07-18/5597252400930_558c3640943668a0198e_48.png", - "image_72": "https://avatars.slack-edge.com/2023-07-18/5597252400930_558c3640943668a0198e_72.png", - "image_192": "https://avatars.slack-edge.com/2023-07-18/5597252400930_558c3640943668a0198e_192.png", - "image_512": "https://avatars.slack-edge.com/2023-07-18/5597252400930_558c3640943668a0198e_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-07-18/5597252400930_558c3640943668a0198e_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1689687879, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05QL7LN2GH": { - "id": "U05QL7LN2GH", - "team_id": "T01CWUYP5AR", - "name": "jeevan", - "deleted": false, - "color": "8d4b84", - "real_name": "Guntaka Jeevan Paul", - "tz": "Asia/Kolkata", - "tz_label": "India Standard Time", - "tz_offset": 19800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Guntaka Jeevan Paul", - "real_name_normalized": "Guntaka Jeevan Paul", - "display_name": "Guntaka Jeevan Paul", - "display_name_normalized": "Guntaka Jeevan Paul", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "5ef61b937b8b", - "image_original": "https://avatars.slack-edge.com/2023-08-30/5820708969061_5ef61b937b8b8e9b2d6e_original.png", - "is_custom_image": true, - "email": "jeevan@acceldata.io", - "first_name": "Guntaka", - "last_name": "Jeevan Paul", - "image_24": "https://avatars.slack-edge.com/2023-08-30/5820708969061_5ef61b937b8b8e9b2d6e_24.png", - "image_32": "https://avatars.slack-edge.com/2023-08-30/5820708969061_5ef61b937b8b8e9b2d6e_32.png", - "image_48": "https://avatars.slack-edge.com/2023-08-30/5820708969061_5ef61b937b8b8e9b2d6e_48.png", - "image_72": "https://avatars.slack-edge.com/2023-08-30/5820708969061_5ef61b937b8b8e9b2d6e_72.png", - "image_192": "https://avatars.slack-edge.com/2023-08-30/5820708969061_5ef61b937b8b8e9b2d6e_192.png", - "image_512": "https://avatars.slack-edge.com/2023-08-30/5820708969061_5ef61b937b8b8e9b2d6e_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-08-30/5820708969061_5ef61b937b8b8e9b2d6e_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1693392815, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U021QJMRP47": { - "id": "U021QJMRP47", - "team_id": "T01CWUYP5AR", - "name": "drew215", - "deleted": false, - "color": "e0a729", - "real_name": "Drew Bittenbender", - "tz": "America/New_York", - "tz_label": "Eastern Standard Time", - "tz_offset": -18000, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Drew Bittenbender", - "real_name_normalized": "Drew Bittenbender", - "display_name": "Drew Bittenbender", - "display_name_normalized": "Drew Bittenbender", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "g557c9e0a7ba", - "email": "drew@salt.io", - "image_24": "https://secure.gravatar.com/avatar/557c9e0a7ba1b608e34ffe64ce219a49.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0017-24.png", - "image_32": "https://secure.gravatar.com/avatar/557c9e0a7ba1b608e34ffe64ce219a49.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0017-32.png", - "image_48": "https://secure.gravatar.com/avatar/557c9e0a7ba1b608e34ffe64ce219a49.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0017-48.png", - "image_72": "https://secure.gravatar.com/avatar/557c9e0a7ba1b608e34ffe64ce219a49.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0017-72.png", - "image_192": "https://secure.gravatar.com/avatar/557c9e0a7ba1b608e34ffe64ce219a49.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0017-192.png", - "image_512": "https://secure.gravatar.com/avatar/557c9e0a7ba1b608e34ffe64ce219a49.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0017-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1660506044, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U01HVNU6A4C": { - "id": "U01HVNU6A4C", - "team_id": "T01CWUYP5AR", - "name": "mars", - "deleted": false, - "color": "9e3997", - "real_name": "Mars Lan", - "tz": "America/Los_Angeles", - "tz_label": "Pacific Standard Time", - "tz_offset": -28800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Mars Lan", - "real_name_normalized": "Mars Lan", - "display_name": "Mars Lan", - "display_name_normalized": "Mars Lan", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "e70f0b60ad8c", - "image_original": "https://avatars.slack-edge.com/2020-12-28/1598525158821_e70f0b60ad8c72e52398_original.jpg", - "is_custom_image": true, - "email": "mars@metaphor.io", - "image_24": "https://avatars.slack-edge.com/2020-12-28/1598525158821_e70f0b60ad8c72e52398_24.jpg", - "image_32": "https://avatars.slack-edge.com/2020-12-28/1598525158821_e70f0b60ad8c72e52398_32.jpg", - "image_48": "https://avatars.slack-edge.com/2020-12-28/1598525158821_e70f0b60ad8c72e52398_48.jpg", - "image_72": "https://avatars.slack-edge.com/2020-12-28/1598525158821_e70f0b60ad8c72e52398_72.jpg", - "image_192": "https://avatars.slack-edge.com/2020-12-28/1598525158821_e70f0b60ad8c72e52398_192.jpg", - "image_512": "https://avatars.slack-edge.com/2020-12-28/1598525158821_e70f0b60ad8c72e52398_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2020-12-28/1598525158821_e70f0b60ad8c72e52398_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1609167164, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U01DCLP0GU9": { - "id": "U01DCLP0GU9", - "team_id": "T01CWUYP5AR", - "name": "julien", - "deleted": false, - "color": "9f69e7", - "real_name": "Julien Le Dem", - "tz": "America/Los_Angeles", - "tz_label": "Pacific Standard Time", - "tz_offset": -28800, - "profile": { - "title": "Astronomer Chief Architect and Datakin co-founder, OpenLineage project lead", - "phone": "", - "skype": "", - "real_name": "Julien Le Dem", - "real_name_normalized": "Julien Le Dem", - "display_name": "", - "display_name_normalized": "", - "fields": { - "Xf03UL3S8CP8": { - "value": "Astronomer Chief Architect and Datakin co-founder, OpenLineage project lead", - "alt": "" - } - }, - "status_text": "Vacationing", - "status_emoji": ":palm_tree:", - "status_emoji_display_info": [ - { - "emoji_name": "palm_tree", - "display_url": "https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-large/1f334.png", - "unicode": "1f334" - } - ], - "status_expiration": 0, - "avatar_hash": "60dbc1781564", - "image_original": "https://avatars.slack-edge.com/2020-11-03/1467961410374_60dbc1781564d114f9e1_original.jpg", - "is_custom_image": true, - "email": "julien@apache.org", - "first_name": "Julien", - "last_name": "Le Dem", - "image_24": "https://avatars.slack-edge.com/2020-11-03/1467961410374_60dbc1781564d114f9e1_24.jpg", - "image_32": "https://avatars.slack-edge.com/2020-11-03/1467961410374_60dbc1781564d114f9e1_32.jpg", - "image_48": "https://avatars.slack-edge.com/2020-11-03/1467961410374_60dbc1781564d114f9e1_48.jpg", - "image_72": "https://avatars.slack-edge.com/2020-11-03/1467961410374_60dbc1781564d114f9e1_72.jpg", - "image_192": "https://avatars.slack-edge.com/2020-11-03/1467961410374_60dbc1781564d114f9e1_192.jpg", - "image_512": "https://avatars.slack-edge.com/2020-11-03/1467961410374_60dbc1781564d114f9e1_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2020-11-03/1467961410374_60dbc1781564d114f9e1_1024.jpg", - "status_text_canonical": "Vacationing", - "team": "T01CWUYP5AR" - }, - "is_admin": true, - "is_owner": true, - "is_primary_owner": true, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1700183282, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05FLJE4GDU": { - "id": "U05FLJE4GDU", - "team_id": "T01CWUYP5AR", - "name": "damien.hawes", - "deleted": false, - "color": "e96699", - "real_name": "Damien Hawes", - "tz": "Europe/Amsterdam", - "tz_label": "Central European Time", - "tz_offset": 3600, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Damien Hawes", - "real_name_normalized": "Damien Hawes", - "display_name": "Damien Hawes", - "display_name_normalized": "Damien Hawes", - "fields": { - "Xf01NDQXPW6A": { - "value": "https://github.com/d-m-h", - "alt": "GitHub Profile" - } - }, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "6f58b97494c9", - "image_original": "https://avatars.slack-edge.com/2023-07-06/5530626222838_6f58b97494c98ee81c7d_original.png", - "is_custom_image": true, - "email": "damien.hawes@booking.com", - "first_name": "Damien", - "last_name": "Hawes", - "image_24": "https://avatars.slack-edge.com/2023-07-06/5530626222838_6f58b97494c98ee81c7d_24.png", - "image_32": "https://avatars.slack-edge.com/2023-07-06/5530626222838_6f58b97494c98ee81c7d_32.png", - "image_48": "https://avatars.slack-edge.com/2023-07-06/5530626222838_6f58b97494c98ee81c7d_48.png", - "image_72": "https://avatars.slack-edge.com/2023-07-06/5530626222838_6f58b97494c98ee81c7d_72.png", - "image_192": "https://avatars.slack-edge.com/2023-07-06/5530626222838_6f58b97494c98ee81c7d_192.png", - "image_512": "https://avatars.slack-edge.com/2023-07-06/5530626222838_6f58b97494c98ee81c7d_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-07-06/5530626222838_6f58b97494c98ee81c7d_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1688644886, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05TZE47F2S": { - "id": "U05TZE47F2S", - "team_id": "T01CWUYP5AR", - "name": "slack1950", - "deleted": false, - "color": "2b6836", - "real_name": "Erik Alfthan", - "tz": "Europe/Amsterdam", - "tz_label": "Central European Time", - "tz_offset": 3600, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Erik Alfthan", - "real_name_normalized": "Erik Alfthan", - "display_name": "Erik Alfthan", - "display_name_normalized": "Erik Alfthan", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "g17fe479d7c0", - "email": "slack@alfthan.eu", - "first_name": "Erik", - "last_name": "Alfthan", - "image_24": "https://secure.gravatar.com/avatar/17fe479d7c0a30a4a4e927793af59c14.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0023-24.png", - "image_32": "https://secure.gravatar.com/avatar/17fe479d7c0a30a4a4e927793af59c14.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0023-32.png", - "image_48": "https://secure.gravatar.com/avatar/17fe479d7c0a30a4a4e927793af59c14.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0023-48.png", - "image_72": "https://secure.gravatar.com/avatar/17fe479d7c0a30a4a4e927793af59c14.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0023-72.png", - "image_192": "https://secure.gravatar.com/avatar/17fe479d7c0a30a4a4e927793af59c14.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0023-192.png", - "image_512": "https://secure.gravatar.com/avatar/17fe479d7c0a30a4a4e927793af59c14.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0023-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1695828947, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05A1D80QKF": { - "id": "U05A1D80QKF", - "team_id": "T01CWUYP5AR", - "name": "suraj.gupta", - "deleted": false, - "color": "a63024", - "real_name": "Suraj Gupta", - "tz": "Asia/Kolkata", - "tz_label": "India Standard Time", - "tz_offset": 19800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Suraj Gupta", - "real_name_normalized": "Suraj Gupta", - "display_name": "Suraj Gupta", - "display_name_normalized": "Suraj Gupta", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "b3c546a0a2ea", - "image_original": "https://avatars.slack-edge.com/2023-09-25/5946511342292_b3c546a0a2eac62dae08_original.jpg", - "is_custom_image": true, - "email": "suraj.gupta@atlan.com", - "first_name": "Suraj", - "last_name": "Gupta", - "image_24": "https://avatars.slack-edge.com/2023-09-25/5946511342292_b3c546a0a2eac62dae08_24.jpg", - "image_32": "https://avatars.slack-edge.com/2023-09-25/5946511342292_b3c546a0a2eac62dae08_32.jpg", - "image_48": "https://avatars.slack-edge.com/2023-09-25/5946511342292_b3c546a0a2eac62dae08_48.jpg", - "image_72": "https://avatars.slack-edge.com/2023-09-25/5946511342292_b3c546a0a2eac62dae08_72.jpg", - "image_192": "https://avatars.slack-edge.com/2023-09-25/5946511342292_b3c546a0a2eac62dae08_192.jpg", - "image_512": "https://avatars.slack-edge.com/2023-09-25/5946511342292_b3c546a0a2eac62dae08_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2023-09-25/5946511342292_b3c546a0a2eac62dae08_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1695640006, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05SMTVPPL3": { - "id": "U05SMTVPPL3", - "team_id": "T01CWUYP5AR", - "name": "sangeeta", - "deleted": false, - "color": "dc7dbb", - "real_name": "Sangeeta Mishra", - "tz": "Asia/Kolkata", - "tz_label": "India Standard Time", - "tz_offset": 19800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Sangeeta Mishra", - "real_name_normalized": "Sangeeta Mishra", - "display_name": "Sangeeta Mishra", - "display_name_normalized": "Sangeeta Mishra", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "00fc6497e7eb", - "image_original": "https://avatars.slack-edge.com/2023-09-16/5918683629649_00fc6497e7ebbd764ef4_original.png", - "is_custom_image": true, - "email": "sangeeta@acceldata.io", - "first_name": "Sangeeta", - "last_name": "Mishra", - "image_24": "https://avatars.slack-edge.com/2023-09-16/5918683629649_00fc6497e7ebbd764ef4_24.png", - "image_32": "https://avatars.slack-edge.com/2023-09-16/5918683629649_00fc6497e7ebbd764ef4_32.png", - "image_48": "https://avatars.slack-edge.com/2023-09-16/5918683629649_00fc6497e7ebbd764ef4_48.png", - "image_72": "https://avatars.slack-edge.com/2023-09-16/5918683629649_00fc6497e7ebbd764ef4_72.png", - "image_192": "https://avatars.slack-edge.com/2023-09-16/5918683629649_00fc6497e7ebbd764ef4_192.png", - "image_512": "https://avatars.slack-edge.com/2023-09-16/5918683629649_00fc6497e7ebbd764ef4_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-09-16/5918683629649_00fc6497e7ebbd764ef4_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1694852931, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05HFGKEYVB": { - "id": "U05HFGKEYVB", - "team_id": "T01CWUYP5AR", - "name": "juan_luis_cano", - "deleted": false, - "color": "43761b", - "real_name": "Juan Luis Cano Rodríguez", - "tz": "Europe/Amsterdam", - "tz_label": "Central European Time", - "tz_offset": 3600, - "profile": { - "title": "Product Manager for Kedro at QuantumBlack, AI by McKinsey", - "phone": "", - "skype": "", - "real_name": "Juan Luis Cano Rodríguez", - "real_name_normalized": "Juan Luis Cano Rodriguez", - "display_name": "Juan Luis Cano Rodríguez", - "display_name_normalized": "Juan Luis Cano Rodriguez", - "fields": { - "Xf03UL3S8CP8": { - "value": "Product Manager for Kedro at QuantumBlack, AI by McKinsey", - "alt": "" - }, - "Xf01NDQXPW6A": { - "value": "https://github.com/astrojuanlu", - "alt": "@astrojuanlu" - }, - "Xf03JNKWMZ6J": { - "value": "/xuan luis/", - "alt": "" - } - }, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "7827e7e405e2", - "image_original": "https://avatars.slack-edge.com/2023-08-02/5678140113618_7827e7e405e2f18605c2_original.jpg", - "is_custom_image": true, - "email": "juan_luis_cano@mckinsey.com", - "first_name": "Juan", - "last_name": "Luis Cano Rodríguez", - "image_24": "https://avatars.slack-edge.com/2023-08-02/5678140113618_7827e7e405e2f18605c2_24.jpg", - "image_32": "https://avatars.slack-edge.com/2023-08-02/5678140113618_7827e7e405e2f18605c2_32.jpg", - "image_48": "https://avatars.slack-edge.com/2023-08-02/5678140113618_7827e7e405e2f18605c2_48.jpg", - "image_72": "https://avatars.slack-edge.com/2023-08-02/5678140113618_7827e7e405e2f18605c2_72.jpg", - "image_192": "https://avatars.slack-edge.com/2023-08-02/5678140113618_7827e7e405e2f18605c2_192.jpg", - "image_512": "https://avatars.slack-edge.com/2023-08-02/5678140113618_7827e7e405e2f18605c2_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2023-08-02/5678140113618_7827e7e405e2f18605c2_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1699314500, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05SQGH8DV4": { - "id": "U05SQGH8DV4", - "team_id": "T01CWUYP5AR", - "name": "sarathch", - "deleted": false, - "color": "73769d", - "real_name": "sarathch", - "tz": "Asia/Kolkata", - "tz_label": "India Standard Time", - "tz_offset": 19800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "sarathch", - "real_name_normalized": "sarathch", - "display_name": "sarathch", - "display_name_normalized": "sarathch", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "g4916b069d83", - "email": "sarathch@hpe.com", - "first_name": "sarathch", - "last_name": "", - "image_24": "https://secure.gravatar.com/avatar/4916b069d8350cf64eba820d01b8fd0c.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-24.png", - "image_32": "https://secure.gravatar.com/avatar/4916b069d8350cf64eba820d01b8fd0c.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-32.png", - "image_48": "https://secure.gravatar.com/avatar/4916b069d8350cf64eba820d01b8fd0c.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-48.png", - "image_72": "https://secure.gravatar.com/avatar/4916b069d8350cf64eba820d01b8fd0c.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-72.png", - "image_192": "https://secure.gravatar.com/avatar/4916b069d8350cf64eba820d01b8fd0c.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-192.png", - "image_512": "https://secure.gravatar.com/avatar/4916b069d8350cf64eba820d01b8fd0c.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1695105737, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05K8F1T887": { - "id": "U05K8F1T887", - "team_id": "T01CWUYP5AR", - "name": "terese", - "deleted": false, - "color": "bd9336", - "real_name": "Terese Larsson", - "tz": "Europe/Amsterdam", - "tz_label": "Central European Time", - "tz_offset": 3600, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Terese Larsson", - "real_name_normalized": "Terese Larsson", - "display_name": "Terese Larsson", - "display_name_normalized": "Terese Larsson", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "7989ca00a3d3", - "image_original": "https://avatars.slack-edge.com/2023-07-31/5692752759088_7989ca00a3d39c7676dd_original.png", - "is_custom_image": true, - "email": "terese@jclab.se", - "first_name": "Terese", - "last_name": "Larsson", - "image_24": "https://avatars.slack-edge.com/2023-07-31/5692752759088_7989ca00a3d39c7676dd_24.png", - "image_32": "https://avatars.slack-edge.com/2023-07-31/5692752759088_7989ca00a3d39c7676dd_32.png", - "image_48": "https://avatars.slack-edge.com/2023-07-31/5692752759088_7989ca00a3d39c7676dd_48.png", - "image_72": "https://avatars.slack-edge.com/2023-07-31/5692752759088_7989ca00a3d39c7676dd_72.png", - "image_192": "https://avatars.slack-edge.com/2023-07-31/5692752759088_7989ca00a3d39c7676dd_192.png", - "image_512": "https://avatars.slack-edge.com/2023-07-31/5692752759088_7989ca00a3d39c7676dd_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-07-31/5692752759088_7989ca00a3d39c7676dd_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1690872647, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U055N2GRT4P": { - "id": "U055N2GRT4P", - "team_id": "T01CWUYP5AR", - "name": "tatiana.alchueyr", - "deleted": false, - "color": "e7392d", - "real_name": "tati", - "tz": "Europe/London", - "tz_label": "Greenwich Mean Time", - "tz_offset": 0, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "tati", - "real_name_normalized": "tati", - "display_name": "tati", - "display_name_normalized": "tati", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "g21115d62a29", - "email": "tatiana.alchueyr@astronomer.io", - "first_name": "tati", - "last_name": "", - "image_24": "https://secure.gravatar.com/avatar/21115d62a29c683f1390c9dfca2367fe.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0010-24.png", - "image_32": "https://secure.gravatar.com/avatar/21115d62a29c683f1390c9dfca2367fe.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0010-32.png", - "image_48": "https://secure.gravatar.com/avatar/21115d62a29c683f1390c9dfca2367fe.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0010-48.png", - "image_72": "https://secure.gravatar.com/avatar/21115d62a29c683f1390c9dfca2367fe.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0010-72.png", - "image_192": "https://secure.gravatar.com/avatar/21115d62a29c683f1390c9dfca2367fe.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0010-192.png", - "image_512": "https://secure.gravatar.com/avatar/21115d62a29c683f1390c9dfca2367fe.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0010-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1695383702, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05QNRSQW1E": { - "id": "U05QNRSQW1E", - "team_id": "T01CWUYP5AR", - "name": "sarwatfatimam", - "deleted": false, - "color": "a2a5dc", - "real_name": "Sarwat Fatima", - "tz": "Asia/Karachi", - "tz_label": "Pakistan Standard Time", - "tz_offset": 18000, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Sarwat Fatima", - "real_name_normalized": "Sarwat Fatima", - "display_name": "Sarwat Fatima", - "display_name_normalized": "Sarwat Fatima", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "30158cc69ec1", - "image_original": "https://avatars.slack-edge.com/2023-08-29/5817622531764_30158cc69ec15ed4a1c8_original.png", - "is_custom_image": true, - "email": "sarwatfatimam@gmail.com", - "huddle_state": "default_unset", - "huddle_state_expiration_ts": 0, - "first_name": "Sarwat", - "last_name": "Fatima", - "image_24": "https://avatars.slack-edge.com/2023-08-29/5817622531764_30158cc69ec15ed4a1c8_24.png", - "image_32": "https://avatars.slack-edge.com/2023-08-29/5817622531764_30158cc69ec15ed4a1c8_32.png", - "image_48": "https://avatars.slack-edge.com/2023-08-29/5817622531764_30158cc69ec15ed4a1c8_48.png", - "image_72": "https://avatars.slack-edge.com/2023-08-29/5817622531764_30158cc69ec15ed4a1c8_72.png", - "image_192": "https://avatars.slack-edge.com/2023-08-29/5817622531764_30158cc69ec15ed4a1c8_192.png", - "image_512": "https://avatars.slack-edge.com/2023-08-29/5817622531764_30158cc69ec15ed4a1c8_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-08-29/5817622531764_30158cc69ec15ed4a1c8_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1693935133, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U0595Q78HUG": { - "id": "U0595Q78HUG", - "team_id": "T01CWUYP5AR", - "name": "gaborjbernat", - "deleted": false, - "color": "5a4592", - "real_name": "Bernat Gabor", - "tz": "America/Los_Angeles", - "tz_label": "Pacific Standard Time", - "tz_offset": -28800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Bernat Gabor", - "real_name_normalized": "Bernat Gabor", - "display_name": "Bernat Gabor", - "display_name_normalized": "Bernat Gabor", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "gf8cbaf456d5", - "email": "gaborjbernat@gmail.com", - "first_name": "Bernat", - "last_name": "Gabor", - "image_24": "https://secure.gravatar.com/avatar/f8cbaf456d5bed4fe78d758a5e8d27fc.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0015-24.png", - "image_32": "https://secure.gravatar.com/avatar/f8cbaf456d5bed4fe78d758a5e8d27fc.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0015-32.png", - "image_48": "https://secure.gravatar.com/avatar/f8cbaf456d5bed4fe78d758a5e8d27fc.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0015-48.png", - "image_72": "https://secure.gravatar.com/avatar/f8cbaf456d5bed4fe78d758a5e8d27fc.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0015-72.png", - "image_192": "https://secure.gravatar.com/avatar/f8cbaf456d5bed4fe78d758a5e8d27fc.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0015-192.png", - "image_512": "https://secure.gravatar.com/avatar/f8cbaf456d5bed4fe78d758a5e8d27fc.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0015-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1684515104, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05HBLE7YPL": { - "id": "U05HBLE7YPL", - "team_id": "T01CWUYP5AR", - "name": "abdallah", - "deleted": false, - "color": "385a86", - "real_name": "Abdallah", - "tz": "Europe/Amsterdam", - "tz_label": "Central European Time", - "tz_offset": 3600, - "profile": { - "title": "Staff Data Engineer @ Decathlon", - "phone": "", - "skype": "", - "real_name": "Abdallah", - "real_name_normalized": "Abdallah", - "display_name": "Abdallah", - "display_name_normalized": "Abdallah", - "fields": { - "Xf03UL3S8CP8": { - "value": "Staff Data Engineer @ Decathlon", - "alt": "" - } - }, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "7373e3821b00", - "image_original": "https://avatars.slack-edge.com/2023-08-17/5743553612391_7373e3821b00a7631bfb_original.png", - "is_custom_image": true, - "email": "abdallah@terrab.me", - "first_name": "Abdallah", - "last_name": "", - "image_24": "https://avatars.slack-edge.com/2023-08-17/5743553612391_7373e3821b00a7631bfb_24.png", - "image_32": "https://avatars.slack-edge.com/2023-08-17/5743553612391_7373e3821b00a7631bfb_32.png", - "image_48": "https://avatars.slack-edge.com/2023-08-17/5743553612391_7373e3821b00a7631bfb_48.png", - "image_72": "https://avatars.slack-edge.com/2023-08-17/5743553612391_7373e3821b00a7631bfb_72.png", - "image_192": "https://avatars.slack-edge.com/2023-08-17/5743553612391_7373e3821b00a7631bfb_192.png", - "image_512": "https://avatars.slack-edge.com/2023-08-17/5743553612391_7373e3821b00a7631bfb_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-08-17/5743553612391_7373e3821b00a7631bfb_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1700253819, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U0323HG8C8H": { - "id": "U0323HG8C8H", - "team_id": "T01CWUYP5AR", - "name": "sheeri.cabral", - "deleted": false, - "color": "2b6836", - "real_name": "Sheeri Cabral", - "tz": "America/New_York", - "tz_label": "Eastern Standard Time", - "tz_offset": -18000, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Sheeri Cabral", - "real_name_normalized": "Sheeri Cabral", - "display_name": "Sheeri Cabral (Collibra)", - "display_name_normalized": "Sheeri Cabral (Collibra)", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "g0f9e11ecf9b", - "email": "sheeri.cabral@collibra.com", - "huddle_state": "default_unset", - "huddle_state_expiration_ts": 1673062422, - "first_name": "Sheeri", - "last_name": "Cabral", - "image_24": "https://secure.gravatar.com/avatar/0f9e11ecf9b45c68ee6f9d72f2b931d8.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-24.png", - "image_32": "https://secure.gravatar.com/avatar/0f9e11ecf9b45c68ee6f9d72f2b931d8.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-32.png", - "image_48": "https://secure.gravatar.com/avatar/0f9e11ecf9b45c68ee6f9d72f2b931d8.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-48.png", - "image_72": "https://secure.gravatar.com/avatar/0f9e11ecf9b45c68ee6f9d72f2b931d8.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-72.png", - "image_192": "https://secure.gravatar.com/avatar/0f9e11ecf9b45c68ee6f9d72f2b931d8.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-192.png", - "image_512": "https://secure.gravatar.com/avatar/0f9e11ecf9b45c68ee6f9d72f2b931d8.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0006-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1678292994, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05NGJ8AM8X": { - "id": "U05NGJ8AM8X", - "team_id": "T01CWUYP5AR", - "name": "yunhe52203334", - "deleted": false, - "color": "c386df", - "real_name": "Yunhe", - "tz": "Asia/Chongqing", - "tz_label": "China Standard Time", - "tz_offset": 28800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Yunhe", - "real_name_normalized": "Yunhe", - "display_name": "Yunhe", - "display_name_normalized": "Yunhe", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "g58baad279ee", - "email": "yunhe52203334@outlook.com", - "first_name": "Yunhe", - "last_name": "", - "image_24": "https://secure.gravatar.com/avatar/58baad279ee786786c00648fb1ad2cf6.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0016-24.png", - "image_32": "https://secure.gravatar.com/avatar/58baad279ee786786c00648fb1ad2cf6.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0016-32.png", - "image_48": "https://secure.gravatar.com/avatar/58baad279ee786786c00648fb1ad2cf6.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0016-48.png", - "image_72": "https://secure.gravatar.com/avatar/58baad279ee786786c00648fb1ad2cf6.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0016-72.png", - "image_192": "https://secure.gravatar.com/avatar/58baad279ee786786c00648fb1ad2cf6.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0016-192.png", - "image_512": "https://secure.gravatar.com/avatar/58baad279ee786786c00648fb1ad2cf6.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0016-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1692702417, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05EC8WB74N": { - "id": "U05EC8WB74N", - "team_id": "T01CWUYP5AR", - "name": "mbarrien", - "deleted": false, - "color": "7d414c", - "real_name": "Michael Barrientos", - "tz": "America/Los_Angeles", - "tz_label": "Pacific Standard Time", - "tz_offset": -28800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Michael Barrientos", - "real_name_normalized": "Michael Barrientos", - "display_name": "", - "display_name_normalized": "", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "ga99207eb377", - "email": "mbarrien@gmail.com", - "first_name": "Michael", - "last_name": "Barrientos", - "image_24": "https://secure.gravatar.com/avatar/a99207eb3777b2015dfd857f865b3376.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0026-24.png", - "image_32": "https://secure.gravatar.com/avatar/a99207eb3777b2015dfd857f865b3376.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0026-32.png", - "image_48": "https://secure.gravatar.com/avatar/a99207eb3777b2015dfd857f865b3376.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0026-48.png", - "image_72": "https://secure.gravatar.com/avatar/a99207eb3777b2015dfd857f865b3376.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0026-72.png", - "image_192": "https://secure.gravatar.com/avatar/a99207eb3777b2015dfd857f865b3376.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0026-192.png", - "image_512": "https://secure.gravatar.com/avatar/a99207eb3777b2015dfd857f865b3376.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0026-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1689066386, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05JY6MN8MS": { - "id": "U05JY6MN8MS", - "team_id": "T01CWUYP5AR", - "name": "githubopenlineageissu", - "deleted": false, - "color": "e475df", - "real_name": "GitHubOpenLineageIssues", - "tz": "America/New_York", - "tz_label": "Eastern Standard Time", - "tz_offset": -18000, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "GitHubOpenLineageIssues", - "real_name_normalized": "GitHubOpenLineageIssues", - "display_name": "GitHubOpenLineageIssues", - "display_name_normalized": "GitHubOpenLineageIssues", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "c3fd2e7f9e03", - "image_original": "https://avatars.slack-edge.com/2023-07-26/5638847371541_c3fd2e7f9e039ff2989e_original.png", - "is_custom_image": true, - "email": "githubopenlineageissues@gmail.com", - "first_name": "GitHubOpenLineageIssues", - "last_name": "", - "image_24": "https://avatars.slack-edge.com/2023-07-26/5638847371541_c3fd2e7f9e039ff2989e_24.png", - "image_32": "https://avatars.slack-edge.com/2023-07-26/5638847371541_c3fd2e7f9e039ff2989e_32.png", - "image_48": "https://avatars.slack-edge.com/2023-07-26/5638847371541_c3fd2e7f9e039ff2989e_48.png", - "image_72": "https://avatars.slack-edge.com/2023-07-26/5638847371541_c3fd2e7f9e039ff2989e_72.png", - "image_192": "https://avatars.slack-edge.com/2023-07-26/5638847371541_c3fd2e7f9e039ff2989e_192.png", - "image_512": "https://avatars.slack-edge.com/2023-07-26/5638847371541_c3fd2e7f9e039ff2989e_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-07-26/5638847371541_c3fd2e7f9e039ff2989e_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1690382661, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05PVS8GRJ6": { - "id": "U05PVS8GRJ6", - "team_id": "T01CWUYP5AR", - "name": "josdotso", - "deleted": false, - "color": "902d59", - "real_name": "Joshua Dotson", - "tz": "America/New_York", - "tz_label": "Eastern Standard Time", - "tz_offset": -18000, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Joshua Dotson", - "real_name_normalized": "Joshua Dotson", - "display_name": "Joshua Dotson", - "display_name_normalized": "Joshua Dotson", - "fields": { - "Xf01NDQXPW6A": { - "value": "https://github.com/josdotso", - "alt": "@GitHub" - } - }, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "gf5f77a7979d", - "email": "josdotso@cisco.com", - "first_name": "Joshua", - "last_name": "Dotson", - "image_24": "https://secure.gravatar.com/avatar/f5f77a7979d0cec8f762e393eeb5a010.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0026-24.png", - "image_32": "https://secure.gravatar.com/avatar/f5f77a7979d0cec8f762e393eeb5a010.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0026-32.png", - "image_48": "https://secure.gravatar.com/avatar/f5f77a7979d0cec8f762e393eeb5a010.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0026-48.png", - "image_72": "https://secure.gravatar.com/avatar/f5f77a7979d0cec8f762e393eeb5a010.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0026-72.png", - "image_192": "https://secure.gravatar.com/avatar/f5f77a7979d0cec8f762e393eeb5a010.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0026-192.png", - "image_512": "https://secure.gravatar.com/avatar/f5f77a7979d0cec8f762e393eeb5a010.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0026-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1693243608, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05J5GRKY10": { - "id": "U05J5GRKY10", - "team_id": "T01CWUYP5AR", - "name": "george.polychronopoul", - "deleted": false, - "color": "8d4b84", - "real_name": "George Polychronopoulos", - "tz": "Europe/London", - "tz_label": "Greenwich Mean Time", - "tz_offset": 0, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "George Polychronopoulos", - "real_name_normalized": "George Polychronopoulos", - "display_name": "George Polychronopoulos", - "display_name_normalized": "George Polychronopoulos", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "378e4aaddd2e", - "image_original": "https://avatars.slack-edge.com/2023-09-04/5870018949296_378e4aaddd2ed569ad37_original.png", - "is_custom_image": true, - "email": "george.polychronopoulos@6point6.co.uk", - "huddle_state": "default_unset", - "huddle_state_expiration_ts": 0, - "first_name": "George", - "last_name": "Polychronopoulos", - "image_24": "https://avatars.slack-edge.com/2023-09-04/5870018949296_378e4aaddd2ed569ad37_24.png", - "image_32": "https://avatars.slack-edge.com/2023-09-04/5870018949296_378e4aaddd2ed569ad37_32.png", - "image_48": "https://avatars.slack-edge.com/2023-09-04/5870018949296_378e4aaddd2ed569ad37_48.png", - "image_72": "https://avatars.slack-edge.com/2023-09-04/5870018949296_378e4aaddd2ed569ad37_72.png", - "image_192": "https://avatars.slack-edge.com/2023-09-04/5870018949296_378e4aaddd2ed569ad37_192.png", - "image_512": "https://avatars.slack-edge.com/2023-09-04/5870018949296_378e4aaddd2ed569ad37_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-09-04/5870018949296_378e4aaddd2ed569ad37_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1693935437, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05TQPZ4R4L": { - "id": "U05TQPZ4R4L", - "team_id": "T01CWUYP5AR", - "name": "aaruna6", - "deleted": false, - "color": "e0a729", - "real_name": "Aaruna Godthi", - "tz": "America/Los_Angeles", - "tz_label": "Pacific Standard Time", - "tz_offset": -28800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Aaruna Godthi", - "real_name_normalized": "Aaruna Godthi", - "display_name": "Aaruna Godthi", - "display_name_normalized": "Aaruna Godthi", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "fea59103aed5", - "image_original": "https://avatars.slack-edge.com/2023-09-23/5953004023841_fea59103aed522db7cba_original.png", - "is_custom_image": true, - "email": "aaruna6@gmail.com", - "first_name": "Aaruna", - "last_name": "Godthi", - "image_24": "https://avatars.slack-edge.com/2023-09-23/5953004023841_fea59103aed522db7cba_24.png", - "image_32": "https://avatars.slack-edge.com/2023-09-23/5953004023841_fea59103aed522db7cba_32.png", - "image_48": "https://avatars.slack-edge.com/2023-09-23/5953004023841_fea59103aed522db7cba_48.png", - "image_72": "https://avatars.slack-edge.com/2023-09-23/5953004023841_fea59103aed522db7cba_72.png", - "image_192": "https://avatars.slack-edge.com/2023-09-23/5953004023841_fea59103aed522db7cba_192.png", - "image_512": "https://avatars.slack-edge.com/2023-09-23/5953004023841_fea59103aed522db7cba_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-09-23/5953004023841_fea59103aed522db7cba_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1695501948, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05Q3HT6PBR": { - "id": "U05Q3HT6PBR", - "team_id": "T01CWUYP5AR", - "name": "kevin", - "deleted": false, - "color": "e23f99", - "real_name": "Kevin Languasco", - "tz": "America/Bogota", - "tz_label": "South America Pacific Standard Time", - "tz_offset": -18000, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Kevin Languasco", - "real_name_normalized": "Kevin Languasco", - "display_name": "Kevin Languasco", - "display_name_normalized": "Kevin Languasco", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "adc56d6bedf9", - "image_original": "https://avatars.slack-edge.com/2023-08-31/5836587630708_adc56d6bedf9b9198ccf_original.jpg", - "is_custom_image": true, - "email": "kevin@haystack.tv", - "first_name": "Kevin", - "last_name": "Languasco", - "image_24": "https://avatars.slack-edge.com/2023-08-31/5836587630708_adc56d6bedf9b9198ccf_24.jpg", - "image_32": "https://avatars.slack-edge.com/2023-08-31/5836587630708_adc56d6bedf9b9198ccf_32.jpg", - "image_48": "https://avatars.slack-edge.com/2023-08-31/5836587630708_adc56d6bedf9b9198ccf_48.jpg", - "image_72": "https://avatars.slack-edge.com/2023-08-31/5836587630708_adc56d6bedf9b9198ccf_72.jpg", - "image_192": "https://avatars.slack-edge.com/2023-08-31/5836587630708_adc56d6bedf9b9198ccf_192.jpg", - "image_512": "https://avatars.slack-edge.com/2023-08-31/5836587630708_adc56d6bedf9b9198ccf_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2023-08-31/5836587630708_adc56d6bedf9b9198ccf_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1700259423, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05SXDWVA7K": { - "id": "U05SXDWVA7K", - "team_id": "T01CWUYP5AR", - "name": "kgkwiz", - "deleted": false, - "color": "e85d72", - "real_name": "Greg Kim", - "tz": "America/New_York", - "tz_label": "Eastern Standard Time", - "tz_offset": -18000, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Greg Kim", - "real_name_normalized": "Greg Kim", - "display_name": "Greg Kim", - "display_name_normalized": "Greg Kim", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "8c0cc5d026c2", - "image_original": "https://avatars.slack-edge.com/2023-09-15/5905283953172_8c0cc5d026c289ba0f06_original.png", - "is_custom_image": true, - "email": "kgkwiz@gmail.com", - "first_name": "Greg", - "last_name": "Kim", - "image_24": "https://avatars.slack-edge.com/2023-09-15/5905283953172_8c0cc5d026c289ba0f06_24.png", - "image_32": "https://avatars.slack-edge.com/2023-09-15/5905283953172_8c0cc5d026c289ba0f06_32.png", - "image_48": "https://avatars.slack-edge.com/2023-09-15/5905283953172_8c0cc5d026c289ba0f06_48.png", - "image_72": "https://avatars.slack-edge.com/2023-09-15/5905283953172_8c0cc5d026c289ba0f06_72.png", - "image_192": "https://avatars.slack-edge.com/2023-09-15/5905283953172_8c0cc5d026c289ba0f06_192.png", - "image_512": "https://avatars.slack-edge.com/2023-09-15/5905283953172_8c0cc5d026c289ba0f06_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-09-15/5905283953172_8c0cc5d026c289ba0f06_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1694786964, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U01HNKK4XAM": { - "id": "U01HNKK4XAM", - "team_id": "T01CWUYP5AR", - "name": "harel.shein", - "deleted": false, - "color": "d55aef", - "real_name": "Harel Shein", - "tz": "America/New_York", - "tz_label": "Eastern Standard Time", - "tz_offset": -18000, - "profile": { - "title": "Data Observability @ Astronomer", - "phone": "", - "skype": "", - "real_name": "Harel Shein", - "real_name_normalized": "Harel Shein", - "display_name": "Harel Shein", - "display_name_normalized": "Harel Shein", - "fields": { - "Xf03UL3S8CP8": { - "value": "Data Observability @ Astronomer", - "alt": "" - } - }, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "7467955ae073", - "image_original": "https://avatars.slack-edge.com/2020-12-18/1581303887318_7467955ae073e3a3757f_original.jpg", - "is_custom_image": true, - "email": "harel.shein@gmail.com", - "image_24": "https://avatars.slack-edge.com/2020-12-18/1581303887318_7467955ae073e3a3757f_24.jpg", - "image_32": "https://avatars.slack-edge.com/2020-12-18/1581303887318_7467955ae073e3a3757f_32.jpg", - "image_48": "https://avatars.slack-edge.com/2020-12-18/1581303887318_7467955ae073e3a3757f_48.jpg", - "image_72": "https://avatars.slack-edge.com/2020-12-18/1581303887318_7467955ae073e3a3757f_72.jpg", - "image_192": "https://avatars.slack-edge.com/2020-12-18/1581303887318_7467955ae073e3a3757f_192.jpg", - "image_512": "https://avatars.slack-edge.com/2020-12-18/1581303887318_7467955ae073e3a3757f_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2020-12-18/1581303887318_7467955ae073e3a3757f_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": true, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1700085172, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05QHG1NJ8J": { - "id": "U05QHG1NJ8J", - "team_id": "T01CWUYP5AR", - "name": "mike474", - "deleted": false, - "color": "4ec0d6", - "real_name": "Mike O'Connor", - "tz": "Europe/London", - "tz_label": "Greenwich Mean Time", - "tz_offset": 0, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Mike O'Connor", - "real_name_normalized": "Mike O'Connor", - "display_name": "Mike O'Connor", - "display_name_normalized": "Mike O'Connor", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "gdc94937fa8a", - "email": "mike@entos.ai", - "first_name": "Mike", - "last_name": "O'Connor", - "image_24": "https://secure.gravatar.com/avatar/dc94937fa8a4517c54848b990c54fbbe.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0003-24.png", - "image_32": "https://secure.gravatar.com/avatar/dc94937fa8a4517c54848b990c54fbbe.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0003-32.png", - "image_48": "https://secure.gravatar.com/avatar/dc94937fa8a4517c54848b990c54fbbe.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0003-48.png", - "image_72": "https://secure.gravatar.com/avatar/dc94937fa8a4517c54848b990c54fbbe.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0003-72.png", - "image_192": "https://secure.gravatar.com/avatar/dc94937fa8a4517c54848b990c54fbbe.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0003-192.png", - "image_512": "https://secure.gravatar.com/avatar/dc94937fa8a4517c54848b990c54fbbe.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0003-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1693512222, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U01RA9B5GG2": { - "id": "U01RA9B5GG2", - "team_id": "T01CWUYP5AR", - "name": "maciej.obuchowski", - "deleted": false, - "color": "385a86", - "real_name": "Maciej Obuchowski", - "tz": "Europe/Warsaw", - "tz_label": "Central European Time", - "tz_offset": 3600, - "profile": { - "title": "OpenLineage committer", - "phone": "", - "skype": "", - "real_name": "Maciej Obuchowski", - "real_name_normalized": "Maciej Obuchowski", - "display_name": "Maciej Obuchowski", - "display_name_normalized": "Maciej Obuchowski", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "2f556751039c", - "image_original": "https://avatars.slack-edge.com/2021-03-16/1888464110336_2f556751039c2a20f1fe_original.jpg", - "is_custom_image": true, - "email": "maciej.obuchowski@getindata.com", - "huddle_state": "default_unset", - "huddle_state_expiration_ts": 0, - "first_name": "Maciej", - "last_name": "Obuchowski", - "image_24": "https://avatars.slack-edge.com/2021-03-16/1888464110336_2f556751039c2a20f1fe_24.jpg", - "image_32": "https://avatars.slack-edge.com/2021-03-16/1888464110336_2f556751039c2a20f1fe_32.jpg", - "image_48": "https://avatars.slack-edge.com/2021-03-16/1888464110336_2f556751039c2a20f1fe_48.jpg", - "image_72": "https://avatars.slack-edge.com/2021-03-16/1888464110336_2f556751039c2a20f1fe_72.jpg", - "image_192": "https://avatars.slack-edge.com/2021-03-16/1888464110336_2f556751039c2a20f1fe_192.jpg", - "image_512": "https://avatars.slack-edge.com/2021-03-16/1888464110336_2f556751039c2a20f1fe_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2021-03-16/1888464110336_2f556751039c2a20f1fe_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1698879550, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U0620HU51HA": { - "id": "U0620HU51HA", - "team_id": "T01CWUYP5AR", - "name": "sicotte.jason", - "deleted": false, - "color": "5870dd", - "real_name": "Jason", - "tz": "America/Chicago", - "tz_label": "Central Standard Time", - "tz_offset": -21600, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Jason", - "real_name_normalized": "Jason", - "display_name": "Jason", - "display_name_normalized": "Jason", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "g490f5667d26", - "email": "sicotte.jason@gmail.com", - "first_name": "Jason", - "last_name": "", - "image_24": "https://secure.gravatar.com/avatar/490f5667d260b49c54dde5b4d0fee938.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0003-24.png", - "image_32": "https://secure.gravatar.com/avatar/490f5667d260b49c54dde5b4d0fee938.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0003-32.png", - "image_48": "https://secure.gravatar.com/avatar/490f5667d260b49c54dde5b4d0fee938.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0003-48.png", - "image_72": "https://secure.gravatar.com/avatar/490f5667d260b49c54dde5b4d0fee938.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0003-72.png", - "image_192": "https://secure.gravatar.com/avatar/490f5667d260b49c54dde5b4d0fee938.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0003-192.png", - "image_512": "https://secure.gravatar.com/avatar/490f5667d260b49c54dde5b4d0fee938.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0003-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1697734908, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05U9K21LSG": { - "id": "U05U9K21LSG", - "team_id": "T01CWUYP5AR", - "name": "bill", - "deleted": false, - "color": "df3dc0", - "real_name": "Bill Dirks", - "tz": "America/Los_Angeles", - "tz_label": "Pacific Standard Time", - "tz_offset": -28800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Bill Dirks", - "real_name_normalized": "Bill Dirks", - "display_name": "Bill Dirks", - "display_name_normalized": "Bill Dirks", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "4d72939742b4", - "image_original": "https://avatars.slack-edge.com/2023-09-27/5959100825619_4d72939742b4643f68e6_original.jpg", - "is_custom_image": true, - "email": "bill@greatexpectations.io", - "first_name": "Bill", - "last_name": "Dirks", - "image_24": "https://avatars.slack-edge.com/2023-09-27/5959100825619_4d72939742b4643f68e6_24.jpg", - "image_32": "https://avatars.slack-edge.com/2023-09-27/5959100825619_4d72939742b4643f68e6_32.jpg", - "image_48": "https://avatars.slack-edge.com/2023-09-27/5959100825619_4d72939742b4643f68e6_48.jpg", - "image_72": "https://avatars.slack-edge.com/2023-09-27/5959100825619_4d72939742b4643f68e6_72.jpg", - "image_192": "https://avatars.slack-edge.com/2023-09-27/5959100825619_4d72939742b4643f68e6_192.jpg", - "image_512": "https://avatars.slack-edge.com/2023-09-27/5959100825619_4d72939742b4643f68e6_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2023-09-27/5959100825619_4d72939742b4643f68e6_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1695839552, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U02S6F54MAB": { - "id": "U02S6F54MAB", - "team_id": "T01CWUYP5AR", - "name": "jakub.dardzinski", - "deleted": false, - "color": "e06b56", - "real_name": "Jakub Dardziński", - "tz": "Europe/Warsaw", - "tz_label": "Central European Time", - "tz_offset": 3600, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Jakub Dardziński", - "real_name_normalized": "Jakub Dardzinski", - "display_name": "Jakub Dardziński", - "display_name_normalized": "Jakub Dardzinski", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "ba977a7f01d7", - "image_original": "https://avatars.slack-edge.com/2022-01-04/2902203539797_ba977a7f01d752dca1ca_original.jpg", - "is_custom_image": true, - "email": "jakub.dardzinski@getindata.com", - "first_name": "Jakub", - "last_name": "Dardziński", - "image_24": "https://avatars.slack-edge.com/2022-01-04/2902203539797_ba977a7f01d752dca1ca_24.jpg", - "image_32": "https://avatars.slack-edge.com/2022-01-04/2902203539797_ba977a7f01d752dca1ca_32.jpg", - "image_48": "https://avatars.slack-edge.com/2022-01-04/2902203539797_ba977a7f01d752dca1ca_48.jpg", - "image_72": "https://avatars.slack-edge.com/2022-01-04/2902203539797_ba977a7f01d752dca1ca_72.jpg", - "image_192": "https://avatars.slack-edge.com/2022-01-04/2902203539797_ba977a7f01d752dca1ca_192.jpg", - "image_512": "https://avatars.slack-edge.com/2022-01-04/2902203539797_ba977a7f01d752dca1ca_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2022-01-04/2902203539797_ba977a7f01d752dca1ca_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1695527453, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05U9929K3N": { - "id": "U05U9929K3N", - "team_id": "T01CWUYP5AR", - "name": "don", - "deleted": false, - "color": "99a949", - "real_name": "Don Heppner", - "tz": "America/New_York", - "tz_label": "Eastern Standard Time", - "tz_offset": -18000, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Don Heppner", - "real_name_normalized": "Don Heppner", - "display_name": "Don Heppner", - "display_name_normalized": "Don Heppner", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "87e9efa4de64", - "image_original": "https://avatars.slack-edge.com/2023-09-27/5981853048288_87e9efa4de64acdbd3c7_original.png", - "is_custom_image": true, - "email": "don@greatexpectations.io", - "first_name": "Don", - "last_name": "Heppner", - "image_24": "https://avatars.slack-edge.com/2023-09-27/5981853048288_87e9efa4de64acdbd3c7_24.png", - "image_32": "https://avatars.slack-edge.com/2023-09-27/5981853048288_87e9efa4de64acdbd3c7_32.png", - "image_48": "https://avatars.slack-edge.com/2023-09-27/5981853048288_87e9efa4de64acdbd3c7_48.png", - "image_72": "https://avatars.slack-edge.com/2023-09-27/5981853048288_87e9efa4de64acdbd3c7_72.png", - "image_192": "https://avatars.slack-edge.com/2023-09-27/5981853048288_87e9efa4de64acdbd3c7_192.png", - "image_512": "https://avatars.slack-edge.com/2023-09-27/5981853048288_87e9efa4de64acdbd3c7_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-09-27/5981853048288_87e9efa4de64acdbd3c7_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1697754119, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U01DCMDFHBK": { - "id": "U01DCMDFHBK", - "team_id": "T01CWUYP5AR", - "name": "willy", - "deleted": false, - "color": "df3dc0", - "real_name": "Willy Lulciuc", - "tz": "America/Los_Angeles", - "tz_label": "Pacific Standard Time", - "tz_offset": -28800, - "profile": { - "title": "Founding Engineer of Datakin, Co-creator Marquez, OpenLineage Committer", - "phone": "", - "skype": "", - "real_name": "Willy Lulciuc", - "real_name_normalized": "Willy Lulciuc", - "display_name": "Willy Lulciuc", - "display_name_normalized": "Willy Lulciuc", - "fields": { - "Xf01NSPCJ71S": { - "value": "willy@datakin.com", - "alt": "" - } - }, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "cff4548fe7d7", - "image_original": "https://avatars.slack-edge.com/2020-10-21/1437812757414_cff4548fe7d7cd1b1744_original.jpg", - "is_custom_image": true, - "email": "willy@datakin.com", - "first_name": "Willy", - "last_name": "Lulciuc", - "image_24": "https://avatars.slack-edge.com/2020-10-21/1437812757414_cff4548fe7d7cd1b1744_24.jpg", - "image_32": "https://avatars.slack-edge.com/2020-10-21/1437812757414_cff4548fe7d7cd1b1744_32.jpg", - "image_48": "https://avatars.slack-edge.com/2020-10-21/1437812757414_cff4548fe7d7cd1b1744_48.jpg", - "image_72": "https://avatars.slack-edge.com/2020-10-21/1437812757414_cff4548fe7d7cd1b1744_72.jpg", - "image_192": "https://avatars.slack-edge.com/2020-10-21/1437812757414_cff4548fe7d7cd1b1744_192.jpg", - "image_512": "https://avatars.slack-edge.com/2020-10-21/1437812757414_cff4548fe7d7cd1b1744_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2020-10-21/1437812757414_cff4548fe7d7cd1b1744_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": true, - "is_owner": true, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1695661607, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U02MK6YNAQ5": { - "id": "U02MK6YNAQ5", - "team_id": "T01CWUYP5AR", - "name": "pawel.leszczynski", - "deleted": false, - "color": "a2a5dc", - "real_name": "Paweł Leszczyński", - "tz": "Europe/Amsterdam", - "tz_label": "Central European Time", - "tz_offset": 3600, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Paweł Leszczyński", - "real_name_normalized": "Pawel Leszczynski", - "display_name": "Paweł Leszczyński", - "display_name_normalized": "Pawel Leszczynski", - "fields": { - "Xf01NDQXPW6A": { - "value": "https://github.com/pawel-big-lebowski", - "alt": "" - } - }, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "fcf3e1f61aa5", - "image_original": "https://avatars.slack-edge.com/2022-10-05/4175232328373_fcf3e1f61aa5625a2574_original.jpg", - "is_custom_image": true, - "email": "pawel.leszczynski@getindata.com", - "huddle_state": "default_unset", - "huddle_state_expiration_ts": 0, - "first_name": "Paweł", - "last_name": "Leszczyński", - "image_24": "https://avatars.slack-edge.com/2022-10-05/4175232328373_fcf3e1f61aa5625a2574_24.jpg", - "image_32": "https://avatars.slack-edge.com/2022-10-05/4175232328373_fcf3e1f61aa5625a2574_32.jpg", - "image_48": "https://avatars.slack-edge.com/2022-10-05/4175232328373_fcf3e1f61aa5625a2574_48.jpg", - "image_72": "https://avatars.slack-edge.com/2022-10-05/4175232328373_fcf3e1f61aa5625a2574_72.jpg", - "image_192": "https://avatars.slack-edge.com/2022-10-05/4175232328373_fcf3e1f61aa5625a2574_192.jpg", - "image_512": "https://avatars.slack-edge.com/2022-10-05/4175232328373_fcf3e1f61aa5625a2574_512.jpg", - "image_1024": "https://avatars.slack-edge.com/2022-10-05/4175232328373_fcf3e1f61aa5625a2574_1024.jpg", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1700085153, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U053LLVTHRN": { - "id": "U053LLVTHRN", - "team_id": "T01CWUYP5AR", - "name": "ross769", - "deleted": false, - "color": "e475df", - "real_name": "Ross Turk", - "tz": "America/New_York", - "tz_label": "Eastern Standard Time", - "tz_offset": -18000, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Ross Turk", - "real_name_normalized": "Ross Turk", - "display_name": "", - "display_name_normalized": "", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "g071531b6d09", - "email": "ross@rossturk.com", - "first_name": "Ross", - "last_name": "Turk", - "image_24": "https://secure.gravatar.com/avatar/071531b6d095f27a7cf6389dcfbb5766.jpg?s=24&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0025-24.png", - "image_32": "https://secure.gravatar.com/avatar/071531b6d095f27a7cf6389dcfbb5766.jpg?s=32&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0025-32.png", - "image_48": "https://secure.gravatar.com/avatar/071531b6d095f27a7cf6389dcfbb5766.jpg?s=48&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0025-48.png", - "image_72": "https://secure.gravatar.com/avatar/071531b6d095f27a7cf6389dcfbb5766.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0025-72.png", - "image_192": "https://secure.gravatar.com/avatar/071531b6d095f27a7cf6389dcfbb5766.jpg?s=192&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0025-192.png", - "image_512": "https://secure.gravatar.com/avatar/071531b6d095f27a7cf6389dcfbb5766.jpg?s=512&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0025-512.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1694564822, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - }, - "U05KKM07PJP": { - "id": "U05KKM07PJP", - "team_id": "T01CWUYP5AR", - "name": "peter.hicks", - "deleted": false, - "color": "dd8527", - "real_name": "Peter Hicks", - "tz": "America/Los_Angeles", - "tz_label": "Pacific Standard Time", - "tz_offset": -28800, - "profile": { - "title": "", - "phone": "", - "skype": "", - "real_name": "Peter Hicks", - "real_name_normalized": "Peter Hicks", - "display_name": "Peter Hicks", - "display_name_normalized": "Peter Hicks", - "fields": {}, - "status_text": "", - "status_emoji": "", - "status_emoji_display_info": [], - "status_expiration": 0, - "avatar_hash": "037cd5a411ed", - "image_original": "https://avatars.slack-edge.com/2023-07-31/5662890750117_037cd5a411ed6bbcf738_original.png", - "is_custom_image": true, - "email": "peter.hicks@astronomer.io", - "first_name": "Peter", - "last_name": "Hicks", - "image_24": "https://avatars.slack-edge.com/2023-07-31/5662890750117_037cd5a411ed6bbcf738_24.png", - "image_32": "https://avatars.slack-edge.com/2023-07-31/5662890750117_037cd5a411ed6bbcf738_32.png", - "image_48": "https://avatars.slack-edge.com/2023-07-31/5662890750117_037cd5a411ed6bbcf738_48.png", - "image_72": "https://avatars.slack-edge.com/2023-07-31/5662890750117_037cd5a411ed6bbcf738_72.png", - "image_192": "https://avatars.slack-edge.com/2023-07-31/5662890750117_037cd5a411ed6bbcf738_192.png", - "image_512": "https://avatars.slack-edge.com/2023-07-31/5662890750117_037cd5a411ed6bbcf738_512.png", - "image_1024": "https://avatars.slack-edge.com/2023-07-31/5662890750117_037cd5a411ed6bbcf738_1024.png", - "status_text_canonical": "", - "team": "T01CWUYP5AR" - }, - "is_admin": false, - "is_owner": false, - "is_primary_owner": false, - "is_restricted": false, - "is_ultra_restricted": false, - "is_bot": false, - "is_app_user": false, - "updated": 1695424773, - "is_email_confirmed": true, - "has_2fa": false, - "who_can_share_contact_card": "EVERYONE" - } -} \ No newline at end of file diff --git a/slack-archive/html/C01CK9T7HKR-0.html b/slack-archive/html/C01CK9T7HKR-0.html deleted file mode 100644 index 778fe02..0000000 --- a/slack-archive/html/C01CK9T7HKR-0.html +++ /dev/null @@ -1,694 +0,0 @@ -Slack

general

Created by Julien Le Dem on Tuesday, October 20th, 2020

OpenLineage spec discussion. -See also: <https://github.com/OpenLineage/OpenLineage#community> -2023 Ecosystem Survey: <http://bit.ly/ecosystem_survey|bit.ly/ecosystem_survey>

AbdallahWednesday, August 23rd, 2023 at 10:55:10 AM GMT-04:00
Approve a new release please 🙂
• Fix spark integration filtering Databricks events.
11
Michael RobinsonWednesday, August 23rd, 2023 at 12:27:15 PM GMT-04:00
Thank you for requesting a release @Abdallah. Three +1s from committers will authorize.
🙌1
Michael RobinsonWednesday, August 23rd, 2023 at 1:13:18 PM GMT-04:00
Thanks, all. The release is authorized and will be initiated within 2 business days.
Athitya KumarWednesday, August 23rd, 2023 at 1:08:48 PM GMT-04:00
Hey folks! Do we have clear step-by-step documentation on how we can leverage the ServiceLoader based approach for injecting specific OpenLineage customisations for tweaking the transport type with defaults / tweaking column level lineage etc?
Maciej ObuchowskiWednesday, August 23rd, 2023 at 1:29:05 PM GMT-04:00
For custom transport, you have to provide implementation of interface https://github.com/OpenLineage/OpenLineage/blob/4a1a5c3bf9767467b71ca0e1b6d820ba9e[…]ain/java/io/openlineage/client/transports/TransportBuilder.java and point to it in META_INF file
Maciej ObuchowskiWednesday, August 23rd, 2023 at 1:29:52 PM GMT-04:00
But if I understand correctly, if you want to change behavior rather than extend, the correct way may be to either contribute it to repo - if that behavior is useful to anyone, or fork the repo
Athitya KumarWednesday, August 23rd, 2023 at 3:14:43 PM GMT-04:00
@Maciej Obuchowski - Can you elaborate more on the "point to it in META_INF file"? Let's say we have the custom transport type built in a standalone jar by extending transport builder - what're the exact next steps to use this custom transport in the standalone jar when doing spark-submit?
Maciej ObuchowskiWednesday, August 23rd, 2023 at 3:23:13 PM GMT-04:00
@Athitya Kumar your jar needs to have META-INF/services/io.openlineage.client.transports.TransportBuilder with fully qualified class names of your custom TransportBuilders there - like openlineage-spark has
io.openlineage.client.transports.HttpTransportBuilder
-io.openlineage.client.transports.KafkaTransportBuilder
-io.openlineage.client.transports.ConsoleTransportBuilder
-io.openlineage.client.transports.FileTransportBuilder
-io.openlineage.client.transports.KinesisTransportBuilder
Athitya KumarFriday, August 25th, 2023 at 1:49:29 AM GMT-04:00
@Maciej Obuchowski - I think this change may be required for consumers to leverage custom transports, can you check & verify this GH comment?
https://github.com/OpenLineage/OpenLineage/issues/2007#issuecomment-1690350630
Maciej ObuchowskiFriday, August 25th, 2023 at 6:52:30 AM GMT-04:00
Probably, I will look at more details next week @Athitya Kumar as I'm in transit
👍1
Michael RobinsonWednesday, August 23rd, 2023 at 3:04:10 PM GMT-04:00
@channel
We released OpenLineage 1.1.0, including:
Additions:
• Flink: create Openlineage configuration based on Flink configuration #2033 @pawel-big-lebowski
• Java: add Javadocs to the Java client #2004 @julienledem
• Spark: append output dataset name to a job name #2036 @pawel-big-lebowski
• Spark: support Spark 3.4.1 #2057 @pawel-big-lebowski
Fixes:
• Flink: fix a bug when getting schema for KafkaSink #2042 @pentium3
• Spark: fix ignored event adaptive_spark_plan in Databricks #2061 @algorithmy1
Plus additional bug fixes, doc changes and more.
Thanks to all the contributors, especially new contributors @pentium3 and @Abdallah!
Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.1.0
Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md
Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.0.0...1.1.0
Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage
PyPI: https://pypi.org/project/openlineage-python/
👏9
1
Michael RobinsonFriday, August 25th, 2023 at 10:29:23 AM GMT-04:00
@channel
Friendly reminder: our next in-person meetup is next Wednesday, August 30th in San Francisco at Astronomer’s offices in the Financial District. You can sign up and find the details on the meetup event page.
George PolychronopoulosFriday, August 25th, 2023 at 10:57:30 AM GMT-04:00
hi Openlineage team , we would like to join one of your meetups(me and @Madhav Kakumani nad @Phil Rolph and we're wondering if you are hosting any meetups after the 18/9 ? We are trying to join this but air - tickets are quite expensive
Harel SheinFriday, August 25th, 2023 at 11:32:12 AM GMT-04:00
there will certainly be more meetups, don’t worry about that!
Harel SheinFriday, August 25th, 2023 at 11:32:30 AM GMT-04:00
where are you located? perhaps we can try to organize a meetup closer to where you are.
George PolychronopoulosFriday, August 25th, 2023 at 11:49:37 AM GMT-04:00
Thanks a lot for the response, we are in London. We'd be glad to help you organise a meetup and also meet in person!
Michael RobinsonFriday, August 25th, 2023 at 11:51:39 AM GMT-04:00
This is awesome, thanks @George Polychronopoulos. I’ll start a channel and invite you
Juan Luis Cano RodríguezMonday, August 28th, 2023 at 4:47:53 AM GMT-04:00
hi folks, I'm looking into exporting static metadata, and found that DatasetEvent requires a eventTime, which in my mind doesn't make sense for static events. I'm setting it to None and the Python client seems to work, but wanted to ask if I'm missing something.
Paweł LeszczyńskiMonday, August 28th, 2023 at 5:59:10 AM GMT-04:00
Although you emit DatasetEvent, you still emit an event and eventTime is a valid marker.
Juan Luis Cano RodríguezMonday, August 28th, 2023 at 6:01:40 AM GMT-04:00
so, should I use the current time at the moment of emitting it and that's it?
Paweł LeszczyńskiMonday, August 28th, 2023 at 6:01:53 AM GMT-04:00
yes, that should be it
1
Juan Luis Cano RodríguezMonday, August 28th, 2023 at 4:49:21 AM GMT-04:00
and something else: I understand that Marquez does not yet support the 2.0 spec, hence it's incompatible with static metadata right? I tried to emit a list of DatasetEvent s and got HTTPError: 422 Client Error: Unprocessable Entity for url: <http://localhost:3000/api/v1/lineage> (I'm using a FileTransport for now)
Paweł LeszczyńskiMonday, August 28th, 2023 at 6:02:49 AM GMT-04:00
marquez is not capable of reflecting DatasetEvents in DB but it should respond with Unsupported event type
Paweł LeszczyńskiMonday, August 28th, 2023 at 6:03:15 AM GMT-04:00
and return 200 instead of 201 created
Juan Luis Cano RodríguezMonday, August 28th, 2023 at 6:05:41 AM GMT-04:00
I'll have a deeper look then, probably I'm doing something wrong. thanks @Paweł Leszczyński
Joshua DotsonMonday, August 28th, 2023 at 1:25:58 PM GMT-04:00
Hi folks. I have some pure golang jobs from which I need to emit OL events to Marquez. Is the right way to go about this to generate a Golang client from the Marquez OpenAPI spec and use that client from my go jobs?
Jakub DardzińskiMonday, August 28th, 2023 at 2:23:24 PM GMT-04:00
I'd rather generate them from OL spec (compliant with JSON Schema)
Joshua DotsonMonday, August 28th, 2023 at 3:12:21 PM GMT-04:00
I'll look into this. I take you to mean that I would use the OL spec which is available as a set of JSON schemas to create the data object and then HTTP POST it using vanilla Golang. Is that correct? Thank you for your help!
Jakub DardzińskiMonday, August 28th, 2023 at 3:30:05 PM GMT-04:00
Correct! You’re also very welcome to contribute Golang client (currently we have Python & Java clients) if you manage to send events using golang 🙂
👏1
Michael RobinsonMonday, August 28th, 2023 at 5:28:31 PM GMT-04:00
@channel
The agenda for the Toronto Meetup at Airflow Summit on 9/18 has been updated. This promises to be an exciting, richly productive discussion. Don’t miss it if you’ll be in the area!
1. Intros
2. Evolution of spec presentation/discussion (project background/history)
3. State of the community
4. Spark/Column lineage update
5. Airflow Provider update
6. Roadmap Discussion
7. Action items review/next steps
❤️3
Michael RobinsonMonday, August 28th, 2023 at 8:05:37 PM GMT-04:00
New on the OpenLineage blog: a close look at the new OpenLineage Airflow Provider, including:
• the critical improvements it brings to the integration
• the high-level design
• implementation details
• an example operator
• planned enhancements
• a list of supported operators
• more.
The post, by @Maciej Obuchowski, @Julien Le Dem and myself is live now on the OpenLineage blog.
🎉5
Sarwat FatimaTuesday, August 29th, 2023 at 3:18:04 AM GMT-04:00
Hello, I'm currently in the process of following the instructions outlined in the provided getting started guide at https://openlineage.io/getting-started/. However, I've encountered a problem while attempting to complete Step 1 of the guide. Unfortunately, I'm encountering an internal server error at this stage. I did manage to successfully run Marquez, but it appears that there might be an issue that needs to be addressed. I have attached screen shots.
Jakub DardzińskiTuesday, August 29th, 2023 at 3:20:18 AM GMT-04:00
is 5000 port taken by any other application? or ./docker/up.sh has some errors in logs?
Sarwat FatimaTuesday, August 29th, 2023 at 5:23:01 AM GMT-04:00
@Jakub Dardziński 5000 port is not taken by any other application. The logs show some errors but I am not sure what is the issue here.
Maciej ObuchowskiTuesday, August 29th, 2023 at 10:02:38 AM GMT-04:00
I think Marquez is running on WSL while you're trying to connect from host computer?
Juan Luis Cano RodríguezTuesday, August 29th, 2023 at 5:20:39 AM GMT-04:00
hi folks, for now I'm producing .jsonl (or .ndjson ) files with one event per line, do you know if there's any way to validate those? would standard JSON Schema tools work?
Juan Luis Cano RodríguezTuesday, August 29th, 2023 at 10:58:29 AM GMT-04:00
reply by @U0544QC1DS9: yes 🙂💯
👍1
ldaceyTuesday, August 29th, 2023 at 1:12:32 PM GMT-04:00
for namespaces, if my data is moving between sources (SFTP -> GCS -> Azure Blob (synapse connects to parquet datasets) then should my namespace be based on the client I am working with? my current namespace has been to refer to the bucket, but that falls apart when considering the data sources and some destinations. perhaps I should just add a field for client-name instead to have a consolidated view?
Maciej ObuchowskiWednesday, August 30th, 2023 at 10:53:08 AM GMT-04:00
> then should my namespace be based on the client I am working with?
I think each of those sources should be a different namespace?
ldaceyWednesday, August 30th, 2023 at 12:59:53 PM GMT-04:00
got it, yeah I was kind of picturing as one namespace for the client (we handle many clients but they are completely distinct entities). I was able to get it to work with multiple namespaces like you suggested and Marquez was able to plot everything correctly in the visualization
ldaceyWednesday, August 30th, 2023 at 1:01:18 PM GMT-04:00
I noticed some of my Dataset facets make more sense as Run facets, for example, the name of the specific file I processed and how many rows of data / size of the data for that schedule. that won't impact the Run facets Airflow provides right? I can still have the schedule information + my custom run facets?
Maciej ObuchowskiWednesday, August 30th, 2023 at 1:06:38 PM GMT-04:00
Yes, unless you name it the same as one of the Airflow facets 🙂
GitHubOpenLineageIssuesWednesday, August 30th, 2023 at 8:15:29 AM GMT-04:00
Hi, Will really appreciate if someone can guide me or provide me any pointer - if they have been able to implement authentication/authorization for access to Marquez. Have not seen much info around it. Any pointers greatly appreciated. Thanks in advance.
Julien Le DemWednesday, August 30th, 2023 at 12:23:18 PM GMT-04:00
I’ve seen people do this through the ingress controller in Kubernetes. Unfortunately I don’t have documentation besides k8s specific ones you would find for the ingress controller you’re using. You’d redirect any unauthenticated request to your identity provider
1
Michael RobinsonWednesday, August 30th, 2023 at 11:50:05 AM GMT-04:00
@channel
Friendly reminder: there’s a meetup tonight at Astronomer’s offices in SF!
1
Julien Le DemWednesday, August 30th, 2023 at 12:15:31 PM GMT-04:00
I’ll be there and looking forward to see @John Lukenoff ‘s presentation
Michael BarrientosWednesday, August 30th, 2023 at 9:38:31 PM GMT-04:00
Can anyone let 3 people stuck downstairs into the 7th floor?
👍1
Willy LulciucWednesday, August 30th, 2023 at 11:25:21 PM GMT-04:00
Sorry about that!
YunheThursday, August 31st, 2023 at 2:31:48 AM GMT-04:00
hello,everyone,i can run openLineage spark code in my notebook with python,but when use my idea to execute scala code like this:
import org.apache.spark.internal.Logging
import org.apache.spark.sql.SparkSession
import io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml
import org.apache.spark.scheduler.{SparkListener, SparkListenerApplicationEnd, SparkListenerApplicationStart}
import sun.java2d.marlin.MarlinUtils.logInfo
object Test {
def main(args: Array[String]): Unit = {

val spark = SparkSession
.builder()
.master("local")
.appName("test")
.config("spark.jars.packages","io.openlineage:openlineage-spark:0.12.0")
.config("spark.extraListeners","io.openlineage.spark.agent.OpenLineageSparkListener")
.config("spark.openlineage.transport.type","console")
.getOrCreate()

spark.sparkContext.setLogLevel("INFO")

//spark.sparkContext.addSparkListener(new MySparkAppListener)
import spark.implicits._
val input = Seq((1, "zs", 2020), (2, "ls", 2023)).toDF("id", "name", "year")

input.select("id", "name").orderBy("id").show()

}

}

there is something wrong:
Exception in thread "spark-listener-group-shared" java.lang.NoSuchMethodError: io.openlineage.client.OpenLineageClientUtils.loadOpenLineageYaml(Ljava/io/InputStream;)Lio/openlineage/client/OpenLineageYaml;
at io.openlineage.spark.agent.ArgumentParser.extractOpenlineageConfFromSparkConf(ArgumentParser.java:114)
at io.openlineage.spark.agent.ArgumentParser.parse(ArgumentParser.java:78)
at io.openlineage.spark.agent.OpenLineageSparkListener.initializeContextFactoryIfNotInitialized(OpenLineageSparkListener.java:277)
at io.openlineage.spark.agent.OpenLineageSparkListener.onApplicationStart(OpenLineageSparkListener.java:267)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:55)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

i want to know how can i set idea scala environment correctly
Paweł LeszczyńskiThursday, August 31st, 2023 at 2:58:41 AM GMT-04:00
io.openlineage:openlineage-spark:0.12.0 -> could you repeat the steps with newer version?
YunheThursday, August 31st, 2023 at 3:51:52 AM GMT-04:00
ok,it`s my first use thie lineage tool. first,I added dependences in my pom.xml like this:
<dependency>
<groupId>io.openlineage</groupId>
<artifactId>openlineage-java</artifactId>
<version>0.12.0</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.7</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.7</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>2.7</version>
</dependency>
<dependency>
<groupId>io.openlineage</groupId>
<artifactId>openlineage-spark</artifactId>
<version>0.30.1</version>
</dependency>

my spark version is 3.3.1 and the version can not change

second, in file Openlineage/intergration/spark I enter command : docker-compose up and follow the steps in this doc:
https://openlineage.io/docs/integrations/spark/quickstart_local
there is no erro when i use notebook to execute pyspark for openlineage and I could get json message.
but after I enter "docker-compose up" ,I want to use my Idea tool to execute scala code like above,the erro happend like above. It seems that I does not configure the environment correctly. so how can i fix the problem .
Paweł LeszczyńskiFriday, September 1st, 2023 at 5:15:28 AM GMT-04:00
please use latest io.openlineage:openlineage-spark:1.1.0 instead. openlineage-java is already contained in the jar, no need to add it on your own.
Sheeri Cabral (Collibra)Thursday, August 31st, 2023 at 3:33:19 PM GMT-04:00
Will the August meeting be put up at https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting soon? (usually it’s up in a few days 🙂
Maciej ObuchowskiFriday, September 1st, 2023 at 6:00:53 AM GMT-04:00
@Michael Robinson
Michael RobinsonFriday, September 1st, 2023 at 5:13:32 PM GMT-04:00
The recording is on the youtube channel here. I’ll update the wiki ASAP
1
Julien Le DemThursday, August 31st, 2023 at 6:10:20 PM GMT-04:00
🎉13
🙌10
❤️5
Julien Le DemFriday, September 1st, 2023 at 11:09:55 PM GMT-04:00
Michael RobinsonFriday, September 1st, 2023 at 5:16:21 PM GMT-04:00
@channel
The latest issue of OpenLineage News is out now! Please subscribe to get it directly in your inbox each month.
🙌2
1
Anirudh ShrinivasonMonday, September 4th, 2023 at 3:38:28 AM GMT-04:00
Hi guys, I'd like to capture the spark.databricks.clusterUsageTags.clusterAllTags property from databricks. However, the value of this is a list of keys, and therefore cannot be supported by custom environment facet builder.
I was thinking that capturing this property might be useful for most databricks workloads, and whether it might make sense to auto-capture it along with other databricks variables, similar to how we capture mount points for the databricks jobs.
Does this sound okay? If so, then I can help to contribute this functionality
Maciej ObuchowskiMonday, September 4th, 2023 at 6:43:47 AM GMT-04:00
Sounds good to me
Anirudh ShrinivasonMonday, September 11th, 2023 at 5:15:03 AM GMT-04:00
Anirudh ShrinivasonMonday, September 4th, 2023 at 6:39:05 AM GMT-04:00
Also, another small clarification is that when using MergeIntoCommand, I'm receiving the lineage events on the backend, but I cannot seem to find any logging of the payload when I enable debug mode in openlineage. I remember there was a similar issue reported by another user in the past. May I check if it might be possible to help with this? It's making debugging quite hard for these cases. Thanks!
Maciej ObuchowskiMonday, September 4th, 2023 at 6:54:12 AM GMT-04:00
I think it only depends on log4j configuration
Maciej ObuchowskiMonday, September 4th, 2023 at 6:57:15 AM GMT-04:00
# Set everything to be logged to the console
-log4j.rootCategory=INFO, console
-log4j.appender.console=org.apache.log4j.ConsoleAppender
-log4j.appender.console.target=System.err
-log4j.appender.console.layout=org.apache.log4j.PatternLayout
-log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
-
-# set the log level for the openlineage spark library
-log4j.logger.io.openlineage.spark=DEBUG

this is what we have in log4j.properties in test environment and it works
Anirudh ShrinivasonMonday, September 4th, 2023 at 11:28:11 AM GMT-04:00
Hmm... I can see the logs for the other commands, like createViewCommand etc. I just cannot see it for any of the delta runs
Paweł LeszczyńskiTuesday, September 5th, 2023 at 3:33:03 AM GMT-04:00
that's interesting. So, logging is done here: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/main/java/io/openlineage/spark/agent/EventEmitter.java#L63 and this code is unaware of delta.

The possible problem could be filtering delta events (which we do bcz of delta being noisy)
Paweł LeszczyńskiTuesday, September 5th, 2023 at 3:33:36 AM GMT-04:00
Recently, we've closed that https://github.com/OpenLineage/OpenLineage/issues/1982 which prevents generating events for `
createOrReplaceTempView

Paweł LeszczyńskiTuesday, September 5th, 2023 at 3:35:12 AM GMT-04:00
Anirudh ShrinivasonTuesday, September 5th, 2023 at 5:19:22 AM GMT-04:00
Hmm I'm a little confused here. I thought we are only filtering out events for certain specific commands, like show table etc. because its noisy right? Some important commands like MergeInto or SaveIntoDataSource used to be logged before, but I notice now that its not being logged anymore...
I'm using 0.23.0 openlineage version.
Paweł LeszczyńskiTuesday, September 5th, 2023 at 5:47:51 AM GMT-04:00
yes, we do. it's just sometimes when doing a filter, we can remove too much. but SaveIntoDataSource and MergeInto should be fine, as we do check them within the tests
ldaceyMonday, September 4th, 2023 at 9:35:05 PM GMT-04:00
it looks like my dynamic task mapping in Airflow has the same run ID in marquez, so even if I am processing 100 files, there is only one version of the data. is there a way to have a separate version of each dynamic task so I can track the filename etc?
Jakub DardzińskiTuesday, September 5th, 2023 at 8:54:57 AM GMT-04:00
map_index should be indeed included when calculating run ID (it’s deterministic in Airflow integration)
what version of Airflow are you using btw?
ldaceyTuesday, September 5th, 2023 at 9:04:14 AM GMT-04:00
2.7.0

I do see this error log in all of my dynamic tasks which might explain it:

[2023-09-05, 00:31:57 UTC] {manager.py:200} ERROR - Extractor returns non-valid metadata: None
-[2023-09-05, 00:31:57 UTC] {utils.py:401} ERROR - cannot import name 'get_operator_class' from 'airflow.providers.openlineage.utils' (/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/__init__.py)
-Traceback (most recent call last):
-  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/utils.py", line 399, in wrapper
-    return f(*args, **kwargs)
-           ^^^^^^^^^^^^^^^^^^
-  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/plugins/listener.py", line 93, in on_running
-    **get_custom_facets(task_instance),
-      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/utils.py", line 148, in get_custom_facets
-    custom_facets["airflow_mappedTask"] = AirflowMappedTaskRunFacet.from_task_instance(task_instance)
-                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/plugins/facets.py", line 36, in from_task_instance
-    from airflow.providers.openlineage.utils import get_operator_class
-ImportError: cannot import name 'get_operator_class' from 'airflow.providers.openlineage.utils' (/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/utils/__init__.py)
ldaceyTuesday, September 5th, 2023 at 9:05:34 AM GMT-04:00
I only have a few custom operators with the on_complete facet so I think this is a built in one - it runs before my task custom logs for example
ldaceyTuesday, September 5th, 2023 at 9:06:05 AM GMT-04:00
and any time I messed up my custom facet, the error would be at the bottom of the logs. this is on top, probably an on_start facet?
Jakub DardzińskiTuesday, September 5th, 2023 at 9:16:32 AM GMT-04:00
seems like some circular import
Jakub DardzińskiTuesday, September 5th, 2023 at 9:19:47 AM GMT-04:00
I just tested it manually, it’s a bug in OL provider. let me fix that
ldaceyTuesday, September 5th, 2023 at 10:53:28 AM GMT-04:00
cool, thanks. I am glad it is just a bug, I was afraid dynamic tasks were not supported for a minute there
ldaceyThursday, September 7th, 2023 at 11:46:20 AM GMT-04:00
how do the provider updates work? they can be released in between Airflow releases and issues for them are raised on the main Airflow repo?
Jakub DardzińskiThursday, September 7th, 2023 at 11:50:07 AM GMT-04:00
generally speaking anything related to OL-Airflow should be placed to Airflow repo, important changes/bug fixes would be implemented in OL repo as well
ldaceyThursday, September 7th, 2023 at 3:40:31 PM GMT-04:00
got it, thanks
ldaceyThursday, September 7th, 2023 at 7:43:46 PM GMT-04:00
is there a way for me to install the openlineage provider based on the commit you made to fix the circular imports?

i was going to try to install from Airflow main branch but didnt want to mess anything up
ldaceyThursday, September 7th, 2023 at 7:44:39 PM GMT-04:00
I saw it was merged to airflow main but it is not in 2.7.1 and there is no 1.0.3 provider version yet, so I wondered if I could manually install it for the time being
Jakub DardzińskiFriday, September 8th, 2023 at 5:45:48 AM GMT-04:00
https://github.com/apache/airflow/blob/main/BREEZE.rst#preparing-provider-packages
building the provider package on your own could be best idea probably? that depends on how you manage your Airflow instance
Jakub DardzińskiFriday, September 8th, 2023 at 12:01:53 PM GMT-04:00
there's 1.1.0rc1 btw
ldaceyFriday, September 8th, 2023 at 1:44:44 PM GMT-04:00
perfect, thanks. I got started with breeze but then stopped haha
👍1
ldaceySunday, September 10th, 2023 at 8:29:00 PM GMT-04:00
The dynamic task mapping error is gone, I did run into this:

File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/openlineage/extractors/base.py", line 70, in disabled_operators
operator.strip() for operator in conf.get("openlineage", "disabled_for_operators").split(";")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.11/site-packages/airflow/configuration.py", line 1065, in get
raise AirflowConfigException(f"section/key [{section}/{key}] not found in config")

I am redeploying now with that option added to my config. I guess it did not use the default which should be ""
ldaceySunday, September 10th, 2023 at 8:49:17 PM GMT-04:00
added "disabled_for_operators" to my openlineage config and it worked (using Airflow helm chart - not sure if that means there is an error because the value I provided should just be the default value, not sure why I needed to explicitly specify it)

openlineage:
disabled_for_operators: ""
...


this is so much better and makes a lot more sense. most of my tasks are dynamic so I was missing a lot of metadata before the fix, thanks!
AbdallahWednesday, September 6th, 2023 at 4:43:07 PM GMT-04:00
Hello Everyone,

I've been diving into the Marquez codebase and found a performance bottleneck in JobDao.java for the query related to namespaceName=MyNameSpace limit=10 and 12s with limit=25. I managed to optimize it using CTEs, and the execution times dropped dramatically to 300ms (for limit=100) and under 100ms (for limit=25 ) on the same cluster.
Issue link : https://github.com/MarquezProject/marquez/issues/2608

I believe there's even more room for optimization, especially if we adjust the job_facets_view to include the namespace_name column.

Would the team be open to a PR where I share the optimized query and discuss potential further refinements? I believe these changes could significantly enhance the Marquez web UI experience.

PR link : https://github.com/MarquezProject/marquez/pull/2609

Looking forward to your feedback.
🔥4
Jakub DardzińskiWednesday, September 6th, 2023 at 6:03:01 PM GMT-04:00
@Willy Lulciuc wdyt?
Bernat GaborWednesday, September 6th, 2023 at 5:44:12 PM GMT-04:00
Has there been any conversation on the extensibility of facets/concepts? E.g.:
• how does one extends the list of run states https://openlineage.io/docs/spec/run-cycle to add a paused/resumed state?
• how does one extend https://openlineage.io/docs/spec/facets/run-facets/nominal_time to add a created at field?
Julien Le DemWednesday, September 6th, 2023 at 6:28:17 PM GMT-04:00
Hello Bernat,

The primary mechanism to extend the model is through facets. You can either:
• create new standard facets in the spec: https://github.com/OpenLineage/OpenLineage/tree/main/spec/facets
• create custom facets defined somewhere else with a prefix in their name: https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md#custom-facet-naming
• Update existing facets with a backward compatible change (example: adding an optional field).
The core spec can also be modified. Here is an example of adding a state
That being said I think more granular states like pause/resume are probably better suited in a run facet. There was an issue opened for that particular one a while ago: https://github.com/OpenLineage/OpenLineage/issues/9 maybe that particular discussion can continue there.

For the nominal time facet, You could open an issue describing the use case and on community agreement follow up with a PR on the facet itself: https://github.com/OpenLineage/OpenLineage/blob/main/spec/facets/NominalTimeRunFacet.json
(adding an optional field is backwards compatible)
👀1
Bernat GaborWednesday, September 6th, 2023 at 6:31:12 PM GMT-04:00
I see, so in general one is best copying a standard facet and maintain it under a different name. That way can be made mandatory 🙂 and one does not need to be blocked for a long time until there's a community agreement 🤔
Julien Le DemWednesday, September 6th, 2023 at 6:35:43 PM GMT-04:00
Yes, The goal of custom facets is to allow you to experiment and extend the spec however you want without having to wait for approval.
If the custom facet is very specific to a third party project/product then it makes sense for it to stay a custom facet.
If it is more generic then it makes sense to add it to the core facets as part of the spec.
Hopefully community agreement can be achieved relatively quickly. Unless someone is strongly against something, it can be added without too much red tape. Typically with support in at least one of the integrations to validate the model.
Michael RobinsonThursday, September 7th, 2023 at 3:12:20 PM GMT-04:00
@channel
This month’s TSC meeting is next Thursday the 14th at 10am PT. On the tentative agenda:
• announcements
• recent releases
• demo: Spark integration tests in Databricks runtime
• open discussion
• more (TBA)
More info and the meeting link can be found on the website. All are welcome! Also, feel free to reply or DM me with discussion topics, agenda items, etc.
👍1
Michael RobinsonMonday, September 11th, 2023 at 10:07:41 AM GMT-04:00
@channel
The first Toronto OpenLineage Meetup, featuring a presentation by recent adopter Metaphor, is just one week away. On the agenda:
1. Evolution of spec presentation/discussion (project background/history)
2. State of the community
3. Integrating OpenLineage with Metaphor (by special guests Ye & Ivan)
4. Spark/Column lineage update
5. Airflow Provider update
6. Roadmap Discussion
Find more details and RSVP here.
🙌7
John LukenoffMonday, September 11th, 2023 at 5:07:26 PM GMT-04:00
I’m seeing some odd behavior with my http transport when upgrading airflow/openlineage-airflow from 2.3.2 -> 2.6.3 and 0.24.0 -> 0.28.0. Previously I had a config like this that let me provide my own auth tokens. However, after upgrading I’m getting a 401 from the endpoint and further debugging seems to reveal that we’re not using the token provided in my TokenProvider. Does anyone know if something changed between these versions that could be causing this? (more details in 🧵 )
transport:
-  type: http
-  url: <https://my.fake-marquez-endpoint.com>
-  auth:
-    type: some.fully.qualified.classpath
John LukenoffMonday, September 11th, 2023 at 5:09:40 PM GMT-04:00
John LukenoffMonday, September 11th, 2023 at 5:11:14 PM GMT-04:00
John LukenoffMonday, September 11th, 2023 at 5:18:56 PM GMT-04:00
Ah I think I see the issue. Looks like this was introduced here, we are instantiating with the base token provider here when we should be using the subclass: https://github.com/OpenLineage/OpenLineage/pull/1869/files#diff-2f8ea6f9a22b5567de8ab56c6a63da8e7adf40cb436ee5e7e6b16e70a82afe05R57
John LukenoffMonday, September 11th, 2023 at 5:37:42 PM GMT-04:00
❤️1
John LukenoffMonday, September 11th, 2023 at 5:37:42 PM GMT-04:00
❤️1
Sarwat FatimaTuesday, September 12th, 2023 at 8:14:06 AM GMT-04:00
This particular code in docker-compose exits with code 1 because it is unable to find wait-for-it.sh, file in the container. I have checked the mounting path from the local machine, It is correct and the path on the container for Marquez is also correct i.e. /usr/src/app but it is unable to mount the wait-for-it.sh. Does anyone know why is this? This code exists in the open lineage repository as well https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/docker-compose.yml
# Marquez as an OpenLineage Client
-  api:
-    image: marquezproject/marquez
-    container_name: marquez-api
-    ports:
-      - "5000:5000"
-      - "5001:5001"
-    volumes:
-      - ./docker/wait-for-it.sh:/usr/src/app/wait-for-it.sh
-    links:
-      - "db:postgres"
-    depends_on:
-      - db
-    entrypoint: [ "./wait-for-it.sh", "db:5432", "--", "./entrypoint.sh" ]
Sarwat FatimaTuesday, September 12th, 2023 at 8:15:19 AM GMT-04:00
This is the error message:
Maciej ObuchowskiTuesday, September 12th, 2023 at 10:38:41 AM GMT-04:00
no permissions?
Guntaka Jeevan PaulTuesday, September 12th, 2023 at 3:11:45 PM GMT-04:00
I am trying to run Google Cloud Composer where i have added the openlineage-airflow pypi packagae as a dependency and have added the env OPENLINEAGE_EXTRACTORS to point to my custom extractor. I have added a folder by name dependencies and inside that i have placed my extractor file, and the path given to OPENLINEAGE_EXTRACTORS is dependencies.<file_name>.<extractor_class_name>…still it fails with the exception saying No module named ‘dependencies’. Can anyone kindly help me out on correcting my mistake
Harel SheinTuesday, September 12th, 2023 at 5:15:36 PM GMT-04:00
Hey @Guntaka Jeevan Paul, can you share some details on which versions of airflow and openlineage you’re using?
Guntaka Jeevan PaulTuesday, September 12th, 2023 at 5:16:26 PM GMT-04:00
airflow ---> 2.5.3, openlinegae-airflow ---> 1.1.0
Guntaka Jeevan PaulTuesday, September 12th, 2023 at 5:45:08 PM GMT-04:00
import traceback
-import uuid
-from typing import List, Optional
-
-from openlineage.airflow.extractors.base import BaseExtractor, TaskMetadata
-from openlineage.airflow.utils import get_job_name
-
-
-class BigQueryInsertJobExtractor(BaseExtractor):
-    def __init__(self, operator):
-        super().__init__(operator)
-
-    @classmethod
-    def get_operator_classnames(cls) -&gt; List[str]:
-        return ['BigQueryInsertJobOperator']
-
-    def extract(self) -&gt; Optional[TaskMetadata]:
-        return None
-
-    def extract_on_complete(self, task_instance) -&gt; Optional[TaskMetadata]:
-        self.log.debug(f"JEEVAN ---&gt; extract_on_complete({task_instance})")
-        random_uuid = str(uuid.uuid4())
-        self.log.debug(f"JEEVAN ---&gt; Randomly Generated UUID --&gt; {random_uuid}")
-
-        self.operator.job_id = random_uuid
-
-        return TaskMetadata(
-            name=get_job_name(task=self.operator)
-        )
Guntaka Jeevan PaulTuesday, September 12th, 2023 at 5:45:24 PM GMT-04:00
this is the custom extractor code that im trying with
Harel SheinTuesday, September 12th, 2023 at 9:10:02 PM GMT-04:00
thanks @Guntaka Jeevan Paul, will try to take a deeper look tomorrow
Maciej ObuchowskiWednesday, September 13th, 2023 at 7:54:26 AM GMT-04:00
No module named 'dependencies'.
This sounds like general Python problem
Maciej ObuchowskiWednesday, September 13th, 2023 at 7:55:12 AM GMT-04:00
Maciej ObuchowskiWednesday, September 13th, 2023 at 7:56:28 AM GMT-04:00
basically, if you're able to import the file from your dag code, OL should be able too
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 8:01:12 AM GMT-04:00
The Problem is in the GCS Composer there is a component called Triggerer, which they say is used for deferrable operators…i have logged into that pod and i could see that the GCS Bucket is not mounted on this, but i am unable to understand why is the initialisation happening inside the triggerer pod
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 8:01:32 AM GMT-04:00
Maciej ObuchowskiWednesday, September 13th, 2023 at 8:01:47 AM GMT-04:00
> The Problem is in the GCS Composer there is a component called Triggerer, which they say is used for deferrable operators…i have logged into that pod and i could see that the GCS Bucket is not mounted on this, but i am unable to understand why is the initialisation happening inside the triggerer pod
OL integration is not running on triggerer, only on worker and scheduler pods
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 8:01:53 AM GMT-04:00
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 8:03:26 AM GMT-04:00
As you can see in this screenshot i am seeing the logs of the triggerer and it says clearly unable to import plugin openlineage
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 8:03:29 AM GMT-04:00
Maciej ObuchowskiWednesday, September 13th, 2023 at 8:10:32 AM GMT-04:00
I see. There are few possible things to do here - composer could mount the user files, Airflow could not start plugins on triggerer, or we could detect we're on triggerer and not import anything there. However, does it impact OL or Airflow operation in other way than this log?
Maciej ObuchowskiWednesday, September 13th, 2023 at 8:12:06 AM GMT-04:00
Probably we'd have to do something if that really bothers you as there won't be further changes to Airflow 2.5
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 8:18:14 AM GMT-04:00
The Problem is it is actually not registering this custom extractor written by me, henceforth i am just receiving the DefaultExtractor things and my piece of extractor code is not even getting triggered
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 8:22:49 AM GMT-04:00
any suggestions to try @Maciej Obuchowski
Maciej ObuchowskiWednesday, September 13th, 2023 at 8:27:48 AM GMT-04:00
Could you share worker logs?
Maciej ObuchowskiWednesday, September 13th, 2023 at 8:27:56 AM GMT-04:00
and check if module is importable from your dag code?
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 8:31:25 AM GMT-04:00
these are the worker pod logs…where there is no log of openlineageplugin
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 8:31:52 AM GMT-04:00
Maciej ObuchowskiWednesday, September 13th, 2023 at 8:38:32 AM GMT-04:00
  {
-    "textPayload": "Traceback (most recent call last):  File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py\", line 427, in import_from_string    module = importlib.import_module(module_path)  File \"/opt/python3.8/lib/python3.8/importlib/__init__.py\", line 127, in import_module    return _bootstrap._gcd_import(name[level:], package, level)  File \"&lt;frozen importlib._bootstrap&gt;\", line 1014, in _gcd_import  File \"&lt;frozen importlib._bootstrap&gt;\", line 991, in _find_and_load  File \"&lt;frozen importlib._bootstrap&gt;\", line 961, in _find_and_load_unlocked  File \"&lt;frozen importlib._bootstrap&gt;\", line 219, in _call_with_frames_removed  File \"&lt;frozen importlib._bootstrap&gt;\", line 1014, in _gcd_import  File \"&lt;frozen importlib._bootstrap&gt;\", line 991, in _find_and_load  File \"&lt;frozen importlib._bootstrap&gt;\", line 961, in _find_and_load_unlocked  File \"&lt;frozen importlib._bootstrap&gt;\", line 219, in _call_with_frames_removed  File \"&lt;frozen importlib._bootstrap&gt;\", line 1014, in _gcd_import  File \"&lt;frozen importlib._bootstrap&gt;\", line 991, in _find_and_load  File \"&lt;frozen importlib._bootstrap&gt;\", line 973, in _find_and_load_unlockedModuleNotFoundError: No module named 'airflow.gcs'",
-    "insertId": "pt2eu6fl9z5vw",
-    "resource": {
-      "type": "cloud_composer_environment",
-      "labels": {
-        "environment_name": "openlineage",
-        "location": "us-west1",
-        "project_id": "acceldata-acm"
-      }
-    },
-    "timestamp": "2023-09-13T06:20:44.131577764Z",
-    "severity": "ERROR",
-    "labels": {
-      "worker_id": "airflow-worker-xttt8"
-    },
-    "logName": "projects/acceldata-acm/logs/airflow-worker",
-    "receiveTimestamp": "2023-09-13T06:20:48.847319607Z"
-  },

it doesn't see No module named 'airflow.gcs' that is part of your extractor path airflow.gcs.dags.big_query_insert_job_extractor.BigQueryInsertJobExtractor
however, is it necessary? I generally see people using imports directly from dags folder
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 8:44:11 AM GMT-04:00
this is one of the experimentation that i have did, but then i reverted it back to keeping it to dependencies.big_query_insert_job_extractor.BigQueryInsertJobExtractor…where dependencies is a module i have created inside my dags folder
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 8:44:33 AM GMT-04:00
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 8:45:46 AM GMT-04:00
these are the logs of the triggerer pod specifically
Maciej ObuchowskiWednesday, September 13th, 2023 at 8:46:31 AM GMT-04:00
yeah it would be expected to have this in triggerer where it's not mounted, but will it behave the same for worker where it's mounted?
Maciej ObuchowskiWednesday, September 13th, 2023 at 8:47:09 AM GMT-04:00
maybe ___init___.py is missing for top-level dag path?
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 8:49:01 AM GMT-04:00
these are the logs of the worker pod at startup, where it does not complain of the plugin like in triggerer, but when tasks are run on this worker…somehow it is not picking up the extractor for the operator that i have written it for
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 8:49:54 AM GMT-04:00
https://openlineage.slack.com/archives/C01CK9T7HKR/p1694609229577469?thread_ts=1694545905.974339&cid=C01CK9T7HKR --> you mean to make the dags folder as well like a module by adding the init.py?
Maciej ObuchowskiWednesday, September 13th, 2023 at 8:55:24 AM GMT-04:00
yes, I would put whole custom code directly in dags folder, to make sure import paths are the problem
Maciej ObuchowskiWednesday, September 13th, 2023 at 8:55:48 AM GMT-04:00
and would be nice if you could set
AIRFLOW__LOGGING__LOGGING_LEVEL="DEBUG"
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 9:14:58 AM GMT-04:00
Starting the process, got command: triggerer
-Initializing airflow.cfg.
-airflow.cfg initialization is done.
-[2023-09-13T13:11:46.620+0000] {settings.py:267} DEBUG - Setting up DB connection pool (PID 8)
-[2023-09-13T13:11:46.622+0000] {settings.py:372} DEBUG - settings.prepare_engine_args(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=570, pid=8
-[2023-09-13T13:11:46.742+0000] {cli_action_loggers.py:39} DEBUG - Adding &lt;function default_action_log at 0x7ff39ca1d3a0&gt; to pre execution callback
-[2023-09-13T13:11:47.638+0000] {cli_action_loggers.py:65} DEBUG - Calling callbacks: [&lt;function default_action_log at 0x7ff39ca1d3a0&gt;]
-  ____________       _____________
- ____    |__( )_________  __/__  /________      __
-____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
-___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
- _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
-[2023-09-13T13:11:50.527+0000] {plugins_manager.py:300} DEBUG - Loading plugins
-[2023-09-13T13:11:50.580+0000] {plugins_manager.py:244} DEBUG - Loading plugins from directory: /home/airflow/gcs/plugins
-[2023-09-13T13:11:50.581+0000] {plugins_manager.py:224} DEBUG - Loading plugins from entrypoints
-[2023-09-13T13:11:50.587+0000] {plugins_manager.py:227} DEBUG - Importing entry_point plugin OpenLineagePlugin
-[2023-09-13T13:11:50.740+0000] {utils.py:430} WARNING - No module named 'boto3'
-[2023-09-13T13:11:50.743+0000] {utils.py:430} WARNING - No module named 'botocore'
-[2023-09-13T13:11:50.833+0000] {utils.py:430} WARNING - No module named 'airflow.providers.sftp'
-[2023-09-13T13:11:51.144+0000] {utils.py:430} WARNING - No module named 'big_query_insert_job_extractor'
-[2023-09-13T13:11:51.145+0000] {plugins_manager.py:237} ERROR - Failed to import plugin OpenLineagePlugin
-Traceback (most recent call last):
-  File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py", line 427, in import_from_string
-    module = importlib.import_module(module_path)
-  File "/opt/python3.8/lib/python3.8/importlib/__init__.py", line 127, in import_module
-    return _bootstrap._gcd_import(name[level:], package, level)
-  File "&lt;frozen importlib._bootstrap&gt;", line 1014, in _gcd_import
-  File "&lt;frozen importlib._bootstrap&gt;", line 991, in _find_and_load
-  File "&lt;frozen importlib._bootstrap&gt;", line 973, in _find_and_load_unlocked
-ModuleNotFoundError: No module named 'big_query_insert_job_extractor'
-
-The above exception was the direct cause of the following exception:
-
-Traceback (most recent call last):
-  File "/opt/python3.8/lib/python3.8/site-packages/airflow/plugins_manager.py", line 229, in load_entrypoint_plugins
-    plugin_class = entry_point.load()
-  File "/opt/python3.8/lib/python3.8/site-packages/setuptools/_vendor/importlib_metadata/__init__.py", line 194, in load
-    module = import_module(match.group('module'))
-  File "/opt/python3.8/lib/python3.8/importlib/__init__.py", line 127, in import_module
-    return _bootstrap._gcd_import(name[level:], package, level)
-  File "&lt;frozen importlib._bootstrap&gt;", line 1014, in _gcd_import
-  File "&lt;frozen importlib._bootstrap&gt;", line 991, in _find_and_load
-  File "&lt;frozen importlib._bootstrap&gt;", line 975, in _find_and_load_unlocked
-  File "&lt;frozen importlib._bootstrap&gt;", line 671, in _load_unlocked
-  File "&lt;frozen importlib._bootstrap_external&gt;", line 843, in exec_module
-  File "&lt;frozen importlib._bootstrap&gt;", line 219, in _call_with_frames_removed
-  File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/plugin.py", line 32, in &lt;module&gt;
-    from openlineage.airflow import listener
-  File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/listener.py", line 75, in &lt;module&gt;
-    extractor_manager = ExtractorManager()
-  File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/extractors/manager.py", line 16, in __init__
-    self.task_to_extractor = Extractors()
-  File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/extractors/extractors.py", line 122, in __init__
-    extractor = import_from_string(extractor.strip())
-  File "/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py", line 431, in import_from_string
-    raise ImportError(f"Failed to import {path}") from e
-ImportError: Failed to import big_query_insert_job_extractor.BigQueryInsertJobExtractor
-[2023-09-13T13:11:51.235+0000] {plugins_manager.py:227} DEBUG - Importing entry_point plugin composer_menu_plugin
-[2023-09-13T13:11:51.719+0000] {plugins_manager.py:316} DEBUG - Loading 1 plugin(s) took 1.14 seconds
-[2023-09-13T13:11:51.733+0000] {triggerer_job.py:101} INFO - Starting the triggerer
-[2023-09-13T13:11:51.734+0000] {selector_events.py:59} DEBUG - Using selector: EpollSelector
-[2023-09-13T13:11:56.118+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:12:01.359+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:12:06.665+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:12:11.880+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:12:17.098+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:12:22.323+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:12:27.597+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:12:32.826+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:12:38.049+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:12:43.275+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:12:48.509+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:12:53.867+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:12:59.087+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:13:04.300+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:13:09.539+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:13:14.785+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:13:20.007+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:13:25.274+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:13:30.510+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:13:35.729+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:13:40.960+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:13:46.444+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:13:51.751+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:13:57.084+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:14:02.310+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:14:07.535+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:14:12.754+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:14:17.967+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:14:23.185+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:14:28.406+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:14:33.661+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:14:38.883+0000] {base_job.py:240} DEBUG - [heartbeat]
-[2023-09-13T13:14:44.247+0000] {base_job.py:240} DEBUG - [heartbeat]
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 9:15:10 AM GMT-04:00
still the same error in the triggerer pod
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 9:16:23 AM GMT-04:00
have changed the dags folder where i have added the init file as you suggested and then have updated the OPENLINEAGE_EXTRACTORS to big_query_insert_job_extractor.BigQueryInsertJobExtractor…still the same thing
Maciej ObuchowskiWednesday, September 13th, 2023 at 9:36:27 AM GMT-04:00
> still the same error in the triggerer pod
it won't change, we're not trying to fix the triggerer import but worker, and should look only at worker pod at this point
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 9:43:34 AM GMT-04:00
extractor for &lt;class 'airflow.providers.google.cloud.operators.bigquery.BigQueryInsertJobOperator'&gt; is &lt;class 'big_query_insert_job_extractor.BigQueryInsertJobExtractor'
-
-Using extractor BigQueryInsertJobExtractor task_type=BigQueryInsertJobOperator airflow_dag_id=data_analytics_dag task_id=join_bq_datasets.bq_join_holidays_weather_data_2021 airflow_run_id=manual__2023-09-13T13:24:08.946947+00:00 
-
-fatal: not a git repository (or any parent up to mount point /home/airflow)
-Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
-fatal: not a git repository (or any parent up to mount point /home/airflow)
-Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 9:44:44 AM GMT-04:00
able to see these logs in the worker pod…so what you said is right that it is able to get the extractor but i get the below error immediately where it says not a git repository
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 9:45:24 AM GMT-04:00
seems like we are almost there nearby…am i missing something obvious
Maciej ObuchowskiWednesday, September 13th, 2023 at 10:06:35 AM GMT-04:00
>
fatal: not a git repository (or any parent up to mount point /home/airflow)
-&gt; Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
-&gt; fatal: not a git repository (or any parent up to mount point /home/airflow)
-&gt; Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).

hm, this could be the actual bug?
Jakub DardzińskiWednesday, September 13th, 2023 at 10:06:51 AM GMT-04:00
that’s casual log in composer
Jakub DardzińskiWednesday, September 13th, 2023 at 10:12:16 AM GMT-04:00
extractor for &lt;class 'airflow.providers.google.cloud.operators.bigquery.BigQueryInsertJobOperator'&gt; is &lt;class 'big_query_insert_job_extractor.BigQueryInsertJobExtractor'

that’s actually class from your custom module, right?
Jakub DardzińskiWednesday, September 13th, 2023 at 10:14:03 AM GMT-04:00
I’ve done experiment, that’s how gcs looks like
Jakub DardzińskiWednesday, September 13th, 2023 at 10:14:09 AM GMT-04:00
and env vars
Jakub DardzińskiWednesday, September 13th, 2023 at 10:14:19 AM GMT-04:00
I have this extractor detected as expected
Jakub DardzińskiWednesday, September 13th, 2023 at 10:15:06 AM GMT-04:00
seens as &lt;class 'dependencies.bq.BigQueryInsertJobExtractor'&gt;
Jakub DardzińskiWednesday, September 13th, 2023 at 10:16:02 AM GMT-04:00
no __init__.py in base dags folder
Jakub DardzińskiWednesday, September 13th, 2023 at 10:17:02 AM GMT-04:00
I also checked that triggerer pod indeed has no gcsfuse set up, tbh no idea why, maybe some kind of optimization
the only effect is that when loading plugins in triggerer it throws some errors in logs, we don’t do anything at the moment there
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 10:19:26 AM GMT-04:00
okk…got it @Jakub Dardziński…so the init at the top level of dags is as well not reqd, got it. Just one more doubt, there is a requirement where i want to change the operators property in the extractor inside the extract function, will that be taken into account and the operator’s execute be called with the property that i have populated in my extractor?
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 10:21:28 AM GMT-04:00
for example i want to add a custom job_id to the BigQueryInsertJobOperator, so wheneerv someone uses the BigQueryInsertJobOperator operator i want to intercept that and add this job_id property to the operator…will that work?
Jakub DardzińskiWednesday, September 13th, 2023 at 10:24:46 AM GMT-04:00
I’m not sure if using OL for such thing is best choice. Wouldn’t it be better to subclass the operator?
Jakub DardzińskiWednesday, September 13th, 2023 at 10:25:37 AM GMT-04:00
but the answer is: it dependes on the airflow version, in 2.3+ I’m pretty sure the changed property stays in execute method
Guntaka Jeevan PaulWednesday, September 13th, 2023 at 10:27:49 AM GMT-04:00
yeah ideally that is how we should have done this but the problem is our client is having around 1000+ Dag’s in different google cloud projects, which are owned by multiple teams…so they are not willing to change anything in their dag. Thankfully they are using airflow 2.4.3
Maciej ObuchowskiWednesday, September 13th, 2023 at 10:31:15 AM GMT-04:00
1
Jakub DardzińskiWednesday, September 13th, 2023 at 10:35:30 AM GMT-04:00
btw I double-checked - execute method is in different process so this would not change task’s attribute there
Guntaka Jeevan PaulSaturday, September 16th, 2023 at 3:32:49 AM GMT-04:00
@Jakub Dardziński any idea how can we achieve this one. ---> https://openlineage.slack.com/archives/C01CK9T7HKR/p1694849427228709
Guntaka Jeevan PaulTuesday, September 12th, 2023 at 5:26:01 PM GMT-04:00
@here has anyone succeded in getting a custom extractor to work in GCP Cloud Composer or AWS MWAA, seems like there is no way
Mars LanTuesday, September 12th, 2023 at 5:34:29 PM GMT-04:00
Suraj GuptaWednesday, September 13th, 2023 at 1:44:27 AM GMT-04:00
I am exploring Spark - OpenLineage integration (using the latest PySpark and OL versions). I tested a simple pipeline which:
• Reads JSON data into PySpark DataFrame
• Apply data transformations
• Write transformed data to MySQL database
Observed that we receive 4 events (2 START and 2 COMPLETE) for the same job name. The events are almost identical with a small diff in the facets. All the events share the same runId, and we don't get any parentRunId.
Team, can you please confirm if this behaviour is expected? Seems to be different from the Airflow integration where we relate jobs to Parent Jobs.
Damien HawesWednesday, September 13th, 2023 at 2:54:37 AM GMT-04:00
The Spark integration requires that two parameters are passed to it, namely:

spark.openlineage.parentJobName
-spark.openlineage.parentRunId

You can find the list of parameters here:

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/README.md
Suraj GuptaWednesday, September 13th, 2023 at 2:55:51 AM GMT-04:00
Thanks, will check this out
Damien HawesWednesday, September 13th, 2023 at 2:57:43 AM GMT-04:00
As for double accounting of events - that's a bit harder to diagnose.
Maciej ObuchowskiWednesday, September 13th, 2023 at 4:33:03 AM GMT-04:00
Can you share the the job and events?
Also @Paweł Leszczyński
Suraj GuptaWednesday, September 13th, 2023 at 6:03:49 AM GMT-04:00
Sure, sharing Job and events.
Suraj GuptaWednesday, September 13th, 2023 at 6:06:21 AM GMT-04:00
Paweł LeszczyńskiWednesday, September 13th, 2023 at 6:39:02 AM GMT-04:00
Hi @Suraj Gupta,

Thanks for providing such a detailed description of the problem.

It is not expected behaviour, it's an issue. The events correspond to the same logical plan which for some reason lead to sending two OL events. Is it reproducible aka. does it occur each time? If yes, we please feel free to raise an issue for that.

We have added in recent months several tests to verify amount of OL events being generated but we haven't tested it that way with JDBC. BTW. will the same happen if you write your data df_transformed to a file (like parquet file) ?
1
Suraj GuptaWednesday, September 13th, 2023 at 7:28:03 AM GMT-04:00
Thanks @Paweł Leszczyński, will confirm about writing to file and get back.
Suraj GuptaWednesday, September 13th, 2023 at 7:33:35 AM GMT-04:00
And yes, the issue is reproducible. Will raise an issue for this.
Paweł LeszczyńskiWednesday, September 13th, 2023 at 7:33:54 AM GMT-04:00
even if you write onto a file?
Suraj GuptaWednesday, September 13th, 2023 at 7:37:21 AM GMT-04:00
Yes, even when I write to a parquet file.
Paweł LeszczyńskiWednesday, September 13th, 2023 at 7:49:28 AM GMT-04:00
ok. i think i was able to reproduce it locally with https://github.com/OpenLineage/OpenLineage/pull/2103/files
Suraj GuptaWednesday, September 13th, 2023 at 7:56:11 AM GMT-04:00
Suraj GuptaMonday, September 25th, 2023 at 4:32:09 PM GMT-04:00
@Paweł Leszczyński I see that the PR is work in progress. Any rough estimate on when we can expect this fix to be released?
Paweł LeszczyńskiTuesday, September 26th, 2023 at 3:32:03 AM GMT-04:00
@Suraj Gupta put a comment within your issue. it's a bug we need to solve but I cannot bring any estimates today.
Suraj GuptaTuesday, September 26th, 2023 at 4:33:03 AM GMT-04:00
Thanks for update @Paweł Leszczyński, also please look into this comment. It might related and I'm not sure if expected behaviour.
Michael RobinsonWednesday, September 13th, 2023 at 2:20:32 PM GMT-04:00
@channel
This month’s TSC meeting, open to all, is tomorrow: https://openlineage.slack.com/archives/C01CK9T7HKR/p1694113940400549
1
Damien HawesThursday, September 14th, 2023 at 6:20:15 AM GMT-04:00
Context:

We use Spark with YARN, running on Hadoop 2.x (I can't remember the exact minor version) with Hive support.

Problem:

I'm noticed that CreateDataSourceAsSelectCommand objects are always transformed to an OutputDataset with a namespace value set to file - which is curious, because the inputs always have a (correct) namespace of hdfs://&lt;name-node&gt; - is this a known issue? A flaw with Apache Spark? A bug in the resolution logic?

For reference:

public class CreateDataSourceTableCommandVisitor
-    extends QueryPlanVisitor&lt;CreateDataSourceTableCommand, OpenLineage.OutputDataset&gt; {
-
-  public CreateDataSourceTableCommandVisitor(OpenLineageContext context) {
-    super(context);
-  }
-
-  @Override
-  public List&lt;OpenLineage.OutputDataset&gt; apply(LogicalPlan x) {
-    CreateDataSourceTableCommand command = (CreateDataSourceTableCommand) x;
-    CatalogTable catalogTable = command.table();
-
-    return Collections.singletonList(
-        outputDataset()
-            .getDataset(
-                PathUtils.fromCatalogTable(catalogTable),
-                catalogTable.schema(),
-                OpenLineage.LifecycleStateChangeDatasetFacet.LifecycleStateChange.CREATE));
-  }
-}

Running this: cat events.log | jq '{eventTime: .eventTime, eventType: .eventType, runId: .run.runId, jobNamespace: .job.namespace, jobName: .job.name, outputs: .outputs[] | {namespace: .namespace, name: .name}, inputs: .inputs[] | {namespace: .namespace, name: .name}}'

This is an output:
{
-  "eventTime": "2023-09-13T16:01:27.059Z",
-  "eventType": "START",
-  "runId": "bbbb5763-3615-46c0-95ca-1fc398c91d5d",
-  "jobNamespace": "spark.cluster-1",
-  "jobName": "ol_hadoop_test.execute_create_data_source_table_as_select_command.dhawes_db_ol_test_hadoop_tgt",
-  "outputs": {
-    "namespace": "file",
-    "name": "/user/hive/warehouse/dhawes.db/ol_test_hadoop_tgt"
-  },
-  "inputs": {
-    "namespace": "<hdfs://nn1>",
-    "name": "/user/hive/warehouse/dhawes.db/ol_test_hadoop_src"
-  }
-}
👀1
Paweł LeszczyńskiThursday, September 14th, 2023 at 7:32:25 AM GMT-04:00
Seems like an issue on our side. Do you know how the source is read? What LogicalPlan leaf is used to read src? Would love to find how is this done differently
Damien HawesThursday, September 14th, 2023 at 9:16:58 AM GMT-04:00
Hmm, I'll have to do explain plan to see what exactly it is.

However my sample job uses spark.sql("SELECT * FROM dhawes.ol_test_hadoop_src")

which itself is created using

spark.sql("SELECT 1 AS id").write.format("orc").mode("overwrite").saveAsTable("dhawes.ol_test_hadoop_src")

Damien HawesThursday, September 14th, 2023 at 9:23:59 AM GMT-04:00
&gt;&gt;&gt; spark.sql("SELECT * FROM dhawes.ol_test_hadoop_src").explain(True)
-== Parsed Logical Plan ==
-'Project [*]
-+- 'UnresolvedRelation `dhawes`.`ol_test_hadoop_src`
-
-== Analyzed Logical Plan ==
-id: int
-Project [id#3]
-+- SubqueryAlias `dhawes`.`ol_test_hadoop_src`
-   +- Relation[id#3] orc
-
-== Optimized Logical Plan ==
-Relation[id#3] orc
-
-== Physical Plan ==
-*(1) FileScan orc dhawes.ol_test_hadoop_src[id#3] Batched: true, Format: ORC, Location: InMemoryFileIndex[<hdfs://nn1/user/hive/warehouse/dhawes.db/ol_test_hadoop_src>], PartitionFilters: [], PushedFilters: [], ReadSchema: struct&lt;id:int&gt;

tatiThursday, September 14th, 2023 at 10:03:41 AM GMT-04:00
Hey everyone,
Any chance we could have a openlineage-integration-common 1.1.1 release with the following changes..?
https://github.com/OpenLineage/OpenLineage/pull/2106
https://github.com/OpenLineage/OpenLineage/pull/2108
6
tatiThursday, September 14th, 2023 at 10:05:19 AM GMT-04:00
Specially the first PR is affecting users of the astronomer-cosmos library: https://github.com/astronomer/astronomer-cosmos/issues/533
Michael RobinsonThursday, September 14th, 2023 at 10:05:24 AM GMT-04:00
Thanks @tati for requesting your first OpenLineage release! Three +1s from committers will authorize
1
Michael RobinsonThursday, September 14th, 2023 at 11:59:55 AM GMT-04:00
The release is authorized and will be initiated within two business days.
🎉1
tatiFriday, September 15th, 2023 at 4:40:12 AM GMT-04:00
Thanks a lot, @Michael Robinson!
Julien Le DemThursday, September 14th, 2023 at 8:23:01 PM GMT-04:00
Per discussion in the OpenLineage sync today here is a very early strawman proposal for an OpenLineage registry that producers and consumers could be registered in.
Feedback or alternate proposals welcome
https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit
Once this is sufficiently fleshed out, I’ll create an actual proposal on github
👍1
Julien Le DemTuesday, October 3rd, 2023 at 8:33:35 PM GMT-04:00
I have cleaned up the registry proposal.
https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit
In particular:
• I clarified that option 2 is preferred at this point.
• I moved discussion notes to the bottom. they will go away at some point
• Once it is stable, I’ll create a proposal with the preferred option.
• we need a good proposal for the core facets prefix. My suggestion is to move core facets to core in the registry. The drawback is prefix would be inconsistent.
Julien Le DemThursday, October 5th, 2023 at 5:34:12 PM GMT-04:00
I have created a ticket to make this easier to find. Once I get more feedback I’ll turn it into a md file in the repo: https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit#heading=h.enpbmvu7n8gu
https://github.com/OpenLineage/OpenLineage/issues/2161
Michael RobinsonFriday, September 15th, 2023 at 12:03:27 PM GMT-04:00
@channel
Friendly reminder: the next OpenLineage meetup, our first in Toronto, is happening this coming Monday at 5 PM ET https://openlineage.slack.com/archives/C01CK9T7HKR/p1694441261486759
👍1
Guntaka Jeevan PaulSaturday, September 16th, 2023 at 3:30:27 AM GMT-04:00
@here we have dataproc operator getting called from a dag which submits a spark job, we wanted to maintain that continuity of parent job in the spark job so according to the documentation we can acheive that by using a macro called lineage_run_id that requires task and task_instance as the parameters. The problem we are facing is that our client’s have 1000's of dags, so asking them to change this everywhere it is used is not feasible, so we thought of using the task_policy feature in the airflow…but the problem is that task_policy gives you access to only the task/operator, but we don’t have the access to the task instance..that is required as a parameter to the lineage_run_id function. Can anyone kindly help us on how should we go about this one
t1 = DataProcPySparkOperator(
-    task_id=job_name,
-    #required pyspark configuration,
-    job_name=job_name,
-    dataproc_pyspark_properties={
-        'spark.driver.extraJavaOptions':
-            f"-javaagent:{jar}={os.environ.get('OPENLINEAGE_URL')}/api/v1/namespaces/{os.getenv('OPENLINEAGE_NAMESPACE', 'default')}/jobs/{job_name}/runs/{{{{macros.OpenLineagePlugin.lineage_run_id(task, task_instance)}}}}?api_key={os.environ.get('OPENLINEAGE_API_KEY')}"
-        dag=dag)
1
Jakub DardzińskiSaturday, September 16th, 2023 at 4:22:47 AM GMT-04:00
you don't need actual task instance to do that. you only should set additional argument as jinja template, same as above
Jakub DardzińskiSaturday, September 16th, 2023 at 4:25:28 AM GMT-04:00
task_instance in this case is just part of string which is evaluated when jinja render happens
Guntaka Jeevan PaulSaturday, September 16th, 2023 at 4:27:10 AM GMT-04:00
ohh…then we could use the same example as above inside the task_policy to intercept the Operator and add the openlineage specific additions properties?
Jakub DardzińskiSaturday, September 16th, 2023 at 4:30:59 AM GMT-04:00
correct, just remember not to override all properties, just add ol specific
Guntaka Jeevan PaulSaturday, September 16th, 2023 at 4:32:02 AM GMT-04:00
yeah sure…thank you so much @Jakub Dardziński, will try this out and keep you posted
👍1
Maciej ObuchowskiSaturday, September 16th, 2023 at 5:00:24 AM GMT-04:00
We want to automate setting those options at some point inside the operator itself
1
Guntaka Jeevan PaulSaturday, September 16th, 2023 at 7:40:27 PM GMT-04:00
@here is there a way by which we could add custom headers to openlineage client in airflow, i see that provision is there for spark integration via these properties spark.openlineage.transport.headers.xyz --> abcdef
Jakub DardzińskiTuesday, September 19th, 2023 at 4:40:55 PM GMT-04:00
there’s no out-of-the-box possibility to do that yet, you’re very welcome to create an issue in GitHub and maybe contribute as well! 🙂
Mars LanSunday, September 17th, 2023 at 9:07:41 AM GMT-04:00
It doesn't seem like there's a way to override the OL endpoint from the default (/api/v1/lineage) in Airflow? I tried setting the OPENLINEAGE_ENDPOINT environment to no avail. Based on this statement, it seems that only OPENLINEAGE_URL was used to construct HttpConfig ?
Jakub DardzińskiMonday, September 18th, 2023 at 4:25:11 PM GMT-04:00
That’s correct. For now there’s no way to configure the endpoint via env var. You can do that by using config file
Mars LanMonday, September 18th, 2023 at 4:30:39 PM GMT-04:00
How do you do that in Airflow? Any particular reason for excluding endpoint override via env var? Happy to create a PR to fix that.
Jakub DardzińskiMonday, September 18th, 2023 at 4:52:48 PM GMT-04:00
historical I guess? go for the PR, of course 🚀
Mars LanTuesday, October 3rd, 2023 at 8:52:16 AM GMT-04:00
Terese LarssonMonday, September 18th, 2023 at 8:22:34 AM GMT-04:00
Hi! I'm in need of help with wrapping my head around OpenLineage. My team have the goal of collecting metadata from the Airflow operators GreatExpectationsOperator, PythonOperator, MsSqlOperator and BashOperator (for dbt). Where can I see the sourcecode for what is collected for each operator, and is there support for these in the new provider apache-airflow-providers-openlineage? I am super confused and feel lost in the docs. 🤯 We are using MSSQL/ODBC to connect to our db, and this data does not seem to appear as datasets in Marquez, do I need to configure this? If so, HOW and WHERE? 🥲

Happy for any help, big or small! 🙏
Jakub DardzińskiMonday, September 18th, 2023 at 4:26:07 PM GMT-04:00
there’s no actual single source of what integrations are currently implemented in openlineage Airflow provider. That’s something we should work on so it’s more visible
Jakub DardzińskiMonday, September 18th, 2023 at 4:26:46 PM GMT-04:00
answering this quickly - GE & MS SQL are not currently implemented yet in the provider
Jakub DardzińskiMonday, September 18th, 2023 at 4:26:58 PM GMT-04:00
but I also invite you to contribute if you’re interested! 🙂
sarathchTuesday, September 19th, 2023 at 2:47:47 AM GMT-04:00
Hi I need help in extracting OpenLineage for PostgresOperator in json format.
any suggestions or comments would be greatly appreciated
Maciej ObuchowskiTuesday, September 19th, 2023 at 4:40:06 PM GMT-04:00
❤️1
Maciej ObuchowskiTuesday, September 19th, 2023 at 4:40:54 PM GMT-04:00
If you use one of the lower versions, take a look here https://openlineage.io/docs/integrations/airflow/usage
sarathchWednesday, September 20th, 2023 at 6:26:56 AM GMT-04:00
Maciej,
Thanks for sharing the link https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html
this should address the issue
Juan Luis Cano RodríguezWednesday, September 20th, 2023 at 9:36:54 AM GMT-04:00
🎉11
👍1
❤️1
Michael RobinsonWednesday, September 20th, 2023 at 5:08:58 PM GMT-04:00
@channel
We released OpenLineage 1.2.2!
Added
• Spark: publish the ProcessingEngineRunFacet as part of the normal operation of the OpenLineageSparkEventListener #2089 @d-m-h
• Spark: capture and emit spark.databricks.clusterUsageTags.clusterAllTags variable from databricks environment #2099 @Anirudh181001
Fixed
• Common: support parsing dbt_project.yml without target-path #2106 @tatiana
• Proxy: fix Proxy chart #2091 @harels
• Python: fix serde filtering #2044 @xli-1026
• Python: use non-deprecated apiKey if loading it from env variables @2029 @mobuchowski
• Spark: Improve RDDs on S3 integration. #2039 @pawel-big-lebowski
• Flink: prevent sending running events after job completes #2075 @pawel-big-lebowski
• Spark & Flink: Unify dataset naming from URI objects #2083 @pawel-big-lebowski
• Spark: Databricks improvements #2076 @pawel-big-lebowski
Removed
• SQL: remove sqlparser dependency from iface-java and iface-py #2090 @JDarDagran
Thanks to all the contributors, including new contributors @tati, @xli-1026, and @d-m-h!
Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.2.2
Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md
Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.1.0...1.2.2
Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage
PyPI: https://pypi.org/project/openlineage-python/
🔥3
👍3
U05ST398BHTFriday, September 22nd, 2023 at 9:05:20 PM GMT-04:00
Hi @Michael Robinson Thank you! I love the job that you’ve done. If you have a few seconds, please hint at how I can push lineage gathered from Airflow and Spark jobs into DataHub for visualization? I didn’t find any solutions or official support neither at Openlineage nor at DataHub, but I still want to continue using Openlineage
Michael RobinsonFriday, September 22nd, 2023 at 9:30:22 PM GMT-04:00
Hi Yevhenii, thank you for using OpenLineage. The DataHub integration is new to us, but perhaps the experts on Spark and Airflow know more. @Paweł Leszczyński @Maciej Obuchowski @Jakub Dardziński
Maciej ObuchowskiSaturday, September 23rd, 2023 at 8:11:17 AM GMT-04:00
@U05ST398BHT at Airflow Summit, Shirshanka Das from DataHub mentioned this as upcoming feature.
👍1
🎯1
Suraj GuptaThursday, September 21st, 2023 at 2:11:10 AM GMT-04:00
Hi, we're using Custom Operators in airflow(2.5) and are planning to expose lineage via default extractors: https://openlineage.io/docs/integrations/airflow/default-extractors/
Question: Now if we upgrade our Airflow version to 2.7 in the future, would our code be backward compatible?
Since OpenLineage has now moved inside airflow and I think there is no concept of extractors in the latest version.
Suraj GuptaThursday, September 21st, 2023 at 2:15:00 AM GMT-04:00
Also, do we have any docs on how OL works with the latest airflow version? Few questions:
• How is it replacing the concept of custom extractors and Manually Annotated Lineage in the latest version?
• Do we have any examples of setting up the integration to emit input/output datasets for non supported Operators like PythonOperator?
Jakub DardzińskiWednesday, September 27th, 2023 at 10:04:09 AM GMT-04:00
> Question: Now if we upgrade our Airflow version to 2.7 in the future, would our code be backward compatible?
It will be compatible, “default extractors” is generally the same concept as we’re using in the 2.7 integration.
One thing that might be good to update is import paths, from openlineage.airflow to airflow.providers.openlineage but should work both ways

> • Do we have any code samples/docs of setting up the integration to emit input/output datasets for non supported Operators like PythonOperator?
Our experience with that is currently lacking - this means, it works like in bare airflow, if you annotate your PythonOperator tasks with old Airflow lineage like in this doc.

We want to make this experience better - by doing few things
• instrumenting hooks, then collecting lineage from them
• integration with AIP-48 datasets
• allowing to emit lineage collected inside Airflow task by other means, by providing core Airflow API for that
All those things require changing core Airflow in a couple of ways:
• tracking which hooks were used during PythonOperator execution
• just being able to emit datasets (airflow inlets/outlets) from inside of a task - they are now a static thing, so if you try that it does not work
• providing better API for emitting that lineage, preferably based on OpenLineage itself rather than us having to convert that later.
As this requires core Airflow changes, it won’t be live until Airflow 2.8 at the earliest.

thanks to @Maciej Obuchowski for this response
Jason YipThursday, September 21st, 2023 at 6:36:17 PM GMT-04:00
I am using this accelerator that leverages OpenLineage on Databricks to publish lineage info to Purview, but it's using a rather old version of OpenLineage aka 0.18, anybody has tried it on a newer version of OpenLineage? I am facing some issues with the inputs and outputs for the same object is having different json
https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator/
1
Jason YipThursday, September 21st, 2023 at 9:51:41 PM GMT-04:00
I installed 1.2.2 on Databricks, followed the below init script: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/databricks/open-lineage-init-script.sh

my cluster config looks like this:

spark.openlineage.version v1
spark.openlineage.namespace adb-5445974573286168.8#default
spark.openlineage.endpoint v1/lineage
spark.openlineage.url.param.code 8kZl0bo2TJfnbpFxBv-R2v7xBDj-PgWMol3yUm5iP1vaAzFu9kIZGg==
spark.openlineage.url https://f77b-50-35-69-138.ngrok-free.app

But it is not calling the API, it works fine with 0.18 version
1
Jason YipThursday, September 21st, 2023 at 11:16:10 PM GMT-04:00
I am attaching the log4j, there is no openlineagecontext
1
Jason YipThursday, September 21st, 2023 at 11:47:22 PM GMT-04:00
this issue is resolved, solution can be found here: https://openlineage.slack.com/archives/C01CK9T7HKR/p1691592987038929
Harel SheinMonday, September 25th, 2023 at 8:59:10 AM GMT-04:00
We were all out at Airflow Summit last week, so apologies for the delayed response. Glad you were able to resolve the issue!
Sangeeta MishraMonday, September 25th, 2023 at 5:11:50 AM GMT-04:00
@here I'm presently addressing a particular scenario that pertains to Openlineage authentication, specifically involving the use of an access key and secret.

I've implemented a custom token provider called AccessKeySecretKeyTokenProvider, which extends the TokenProvider class. This token provider communicates with another service, obtaining a token and an expiration time based on the provided access key, secret, and client ID.

My goal is to retain this token in a cache prior to its expiration, thereby eliminating the need for network calls to the third-party service. Is it possible without relying on an external caching system.
Harel SheinMonday, September 25th, 2023 at 8:56:53 AM GMT-04:00
Hey @Sangeeta Mishra, I’m not sure that I fully understand your question here. What do you mean by OpenLineage authentication?
What are you using to generate OL events? What’s your OL receiving backend?
Sangeeta MishraMonday, September 25th, 2023 at 9:04:33 AM GMT-04:00
Hey @Harel Shein,
I wanted to clarify the previous message. I apologize for any confusion. When I mentioned "OpenLineage authentication," I was actually referring to the authentication process for the OpenLineage backend, specifically using HTTP transport. This involves using my custom token provider, which utilizes access keys and secrets for authentication. The OL backend is http based backend . I hope this clears things up!
Harel SheinMonday, September 25th, 2023 at 9:05:12 AM GMT-04:00
Are you using Marquez?
Sangeeta MishraMonday, September 25th, 2023 at 9:05:55 AM GMT-04:00
We are trying to leverage our own backend here.
Harel SheinMonday, September 25th, 2023 at 9:07:03 AM GMT-04:00
I see.. I’m not sure the OpenLineage community could help here. Which webserver framework are you using?
Sangeeta MishraMonday, September 25th, 2023 at 9:08:56 AM GMT-04:00
KTOR framework
Sangeeta MishraMonday, September 25th, 2023 at 9:15:33 AM GMT-04:00
Our backend authentication operates based on either a pair of keys or a single bearer token, with a limited time of expiry. Hence, wanted to cache this information inside the token provider.
Harel SheinMonday, September 25th, 2023 at 9:26:57 AM GMT-04:00
I see, I would ask this question here https://ktor.io/support/
Sangeeta MishraMonday, September 25th, 2023 at 10:12:52 AM GMT-04:00
Thank you
Paweł LeszczyńskiTuesday, September 26th, 2023 at 4:13:20 AM GMT-04:00
@Sangeeta Mishra which openlineage client are you using: java or python?
Sangeeta MishraTuesday, September 26th, 2023 at 4:19:53 AM GMT-04:00
@Paweł Leszczyński I am using python client
Suraj GuptaMonday, September 25th, 2023 at 1:36:25 PM GMT-04:00
I'm using the Spark OpenLineage integration. In the outputStatistics output dataset facet we receive rowCount and size.
The Job performs a SQL insert into a MySQL table and I'm receiving the size as 0.
{
-  "outputStatistics":
-  {
-    "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark>",
-    "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/OutputStatisticsOutputDatasetFacet.json#/$defs/OutputStatisticsOutputDatasetFacet>",
-    "rowCount": 1,
-    "size": 0
-   }
-}

I'm not sure what the size means here. Does this mean number of bytes inserted/updated?
Also, do we have any documentation for Spark specific Job and Run facets?
Paweł LeszczyńskiWednesday, September 27th, 2023 at 9:56:00 AM GMT-04:00
I am not sure it's stated in the doc. Here's the list of spark facets schemas: https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark/shared/facets/spark/v1
Guntaka Jeevan PaulTuesday, September 26th, 2023 at 12:51:30 AM GMT-04:00
@here In Airflow Integration we send across a lineage Event for Dag start and complete, but that is not the case with spark integration…we don’t receive any event for the application start and complete in spark…is this expected behaviour or am i missing something?
1
Paweł LeszczyńskiWednesday, September 27th, 2023 at 9:47:39 AM GMT-04:00
For spark we do send start and complete for each spark action being run (single operation that causes spark processing being run). However, it is difficult for us to know if we're dealing with the last action within spark job or a spark script.
Paweł LeszczyńskiWednesday, September 27th, 2023 at 9:49:35 AM GMT-04:00
I think we need to look deeper into that as there is reoccuring need to capture such information
Paweł LeszczyńskiWednesday, September 27th, 2023 at 9:49:57 AM GMT-04:00
and spark listener event has methods like onApplicationStart and onApplicationEnd
Guntaka Jeevan PaulWednesday, September 27th, 2023 at 9:50:13 AM GMT-04:00
We are using the SparkListener, which has a function called OnApplicationStart which gets called whenever a spark application starts, so i was thinking why cant we send one at start and simlarly at end as well
Paweł LeszczyńskiWednesday, September 27th, 2023 at 9:50:33 AM GMT-04:00
additionally, we would like to have a concept of a parent run for a spark job which aggregates all actions run within a single spark job context
Guntaka Jeevan PaulWednesday, September 27th, 2023 at 9:51:11 AM GMT-04:00
yeah exactly. the way that it works with airflow integration
Paweł LeszczyńskiWednesday, September 27th, 2023 at 9:51:26 AM GMT-04:00
Paweł LeszczyńskiWednesday, September 27th, 2023 at 9:52:08 AM GMT-04:00
what you can is: come to our monthly Openlineage open meetings and raise that issue and convince the community about its importance
Guntaka Jeevan PaulWednesday, September 27th, 2023 at 9:53:32 AM GMT-04:00
yeah sure would love to do that…how can i join them, will that be posted here in this slack channel?
Michael RobinsonWednesday, September 27th, 2023 at 9:54:08 AM GMT-04:00
Hi, you can see the schedule and RSVP here: https://openlineage.io/community
🙌1
1
Michael RobinsonWednesday, September 27th, 2023 at 11:19:16 AM GMT-04:00
Meetup recap: Toronto Meetup @ Airflow Summit, September 18, 2023
It was great to see so many members of our community at this event! I counted 32 total attendees, with all but a handful being first-timers.
Topics included:
• Presentation on the history, architecture and roadmap of the project by @Julien Le Dem and @Harel Shein
• Discussion of OpenLineage support in Marquez by @Willy Lulciuc
• Presentation by Ye Liu and Ivan Perepelitca from Metaphor, the social platform for data, about their integration
• Presentation by @Paweł Leszczyński about the Spark integration
• Presentation by @Maciej Obuchowski about the Apache Airflow Provider
Thanks to all the presenters and attendees with a shout out to @Harel Shein for the help with organizing and day-of logistics, @Jakub Dardziński for the help with set up/clean up, and @Sheeri Cabral (Collibra) for the crucial assist with the signup sheet.
This was our first meetup in Toronto, and we learned some valuable lessons about planning events in new cities — the first and foremost being to ask for a pic of the building! 🙂 But it seemed like folks were undeterred, and the space itself lived up to expectations.
For a recording and clips from the meetup, head over to our YouTube channel.
Upcoming events:
• October 5th in San Francisco: Marquez Meetup @ Astronomer (sign up here)
• November: Warsaw meetup (details, date TBA)
• January: London meetup (details, date TBA)
Are you interested in hosting or co-hosting an OpenLineage or Marquez meetup? DM me!
🙌3
❤️6
🚀2
😅1
1
Michael RobinsonWednesday, September 27th, 2023 at 11:55:47 AM GMT-04:00
A few more pics:
Damien HawesWednesday, September 27th, 2023 at 12:23:05 PM GMT-04:00
Hi folks, am I correct in my observations that the Spark integration does not generate inputs and outputs for Kafka-to-Kafka pipelines?

EDIT: Removed the crazy wall of text. Relevant GitHub issue is here.
👀1
Paweł LeszczyńskiThursday, September 28th, 2023 at 2:42:18 AM GMT-04:00
responded within the issue
Erik AlfthanThursday, September 28th, 2023 at 2:40:40 AM GMT-04:00
Hello community
First time poster - bear with me :)

I am looking to make a minor PR on the airflow integration (fixing github #2130), and the code change is easy enough, but I fail to install the python environment. I have tried the simple ones
OpenLineage/integration/airflow &gt; pip install -e .
or
OpenLineage/integration/airflow &gt; pip install -r dev-requirements.txt
but they both fail on
ERROR: No matching distribution found for openlineage-sql==1.3.0

(which I think is an unreleased version in the git project)

How would I go about to install the requirements?

//Erik

PS. Sorry for posting this in general if there is a specific integration or contribution channel - I didnt find a better channel
Paweł LeszczyńskiThursday, September 28th, 2023 at 3:04:48 AM GMT-04:00
Hi @Erik Alfthan, the channel is totally OK. I am not airflow integration expert, but what it looks to me, you're missing openlineage-sql library, which is a rust library used to extract lineage from sql queries. This is how we do that in circle ci:
https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/8080/workflows/aba53369-836c-48f5-a2dd-51bc0740a31c/jobs/140113

and subproject page with build instructions: https://github.com/OpenLineage/OpenLineage/tree/main/integration/sql
Erik AlfthanThursday, September 28th, 2023 at 3:07:23 AM GMT-04:00
Ok, so I go and "manually" build the internal dependency so that it becomes available in the pip cache?

I was hoping for something more automagical, but that should work
Paweł LeszczyńskiThursday, September 28th, 2023 at 3:08:06 AM GMT-04:00
I think so. @Jakub Dardziński am I right?
Jakub DardzińskiThursday, September 28th, 2023 at 3:18:27 AM GMT-04:00
https://openlineage.io/docs/development/developing/python/setup
there’s a guide how to setup the dev environment

> Typically, you first need to build openlineage-sql locally (see README). After each release you have to repeat this step in order to bump local version of the package.
This might be somewhat exposed more in GitHub repository README as well
Erik AlfthanThursday, September 28th, 2023 at 3:27:20 AM GMT-04:00
It didnt find the wheel in the cache, but if I used the line in the sql/README.md
pip install openlineage-sql --no-index --find-links ../target/wheels --force-reinstall
It is installed and thus skipped/passed when pip later checks if it needs to be installed.

Now I have a second issue because it is expecting me to have mysqlclient-2.2.0 which seems to need a binary
Command 'pkg-config --exists mysqlclient' returned non-zero exit status 127
and
Command 'pkg-config --exists mariadb' returned non-zero exit status 127
I am on Ubuntu 22.04 in WSL2. Should I go to apt and grab me a mysql client?
Jakub DardzińskiThursday, September 28th, 2023 at 3:31:52 AM GMT-04:00
> It didnt find the wheel in the cache, but if I used the line in the sql/README.md
> pip install openlineage-sql --no-index --find-links ../target/wheels --force-reinstall
> It is installed and thus skipped/passed when pip later checks if it needs to be installed.
That’s actually expected. You should build new wheel locally and then install it.

> Now I have a second issue because it is expecting me to have mysqlclient-2.2.0 which seems to need a binary
> Command 'pkg-config --exists mysqlclient' returned non-zero exit status 127
> and
> Command 'pkg-config --exists mariadb' returned non-zero exit status 127
> I am on Ubuntu 22.04 in WSL2. Should I go to apt and grab me a mysql client?
We’ve left some system specific configuration, e.g. mysqlclient, to users as it’s a bit aside from OpenLineage and more of general development task.

probably
sudo apt-get install python3-dev default-libmysqlclient-dev build-essential 

should work
Erik AlfthanThursday, September 28th, 2023 at 3:32:04 AM GMT-04:00
I just realized that I should probably skip setting up my wsl and just run the tests in the docker setup you prepared
Jakub DardzińskiThursday, September 28th, 2023 at 3:35:46 AM GMT-04:00
You could do that as well but if you want to test your changes vs many Airflow versions that wouldn’t be possible I think (run them with tox btw)
Erik AlfthanThursday, September 28th, 2023 at 4:54:39 AM GMT-04:00
This is starting to feel like a rabbit hole 😞

When I run tox, I get a lot of build errors
• client needs to be built
• sql needs to be built to a different target than its readme says
• a lot of builds fail on cython_sources
Jakub DardzińskiThursday, September 28th, 2023 at 5:19:34 AM GMT-04:00
would you like to share some exact log lines? I’ve never seen such errors, they probably are system specific
Erik AlfthanThursday, September 28th, 2023 at 6:45:48 AM GMT-04:00
Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─&gt; [62 lines of output]
/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/config/setupcfg.py:293: _DeprecatedConfig: Deprecated config in setup.cfg` - !! - - ** - The licensefile parameter is deprecated, use licensefiles instead. - - By 2023-Oct-30, you need to update your project and remove deprecated calls - or your builds will no longer be supported. - - See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details. - ** - - !! - parsed = self.parsers.get(option_name, lambda x: x)(value) - running egg_info - writing lib3/PyYAML.egg-info/PKG-INFO - writing dependency_links to lib3/PyYAML.egg-info/dependency_links.txt - writing top-level names to lib3/PyYAML.egg-info/top_level.txt - Traceback (most recent call last): - File "/home/obr_erikal/projects/OpenLineage/integration/airflow/.tox/py3-airflow-2.1.4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module> - main() - File "/home/obr_erikal/projects/OpenLineage/integration/airflow/.tox/py3-airflow-2.1.4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main - json_out['return_val'] = hook(**hook_input['kwargs']) - File "/home/obr_erikal/projects/OpenLineage/integration/airflow/.tox/py3-airflow-2.1.4/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel - return hook(config_settings) - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 355, in get_requires_for_build_wheel - return self._get_build_requires(config_settings, requirements=['wheel']) - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 325, in _get_build_requires - self.run_setup() - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 341, in run_setup - exec(code, locals()) - File "<string>", line 271, in <module> - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/__init.py", line 103, in setup - return distutils.core.setup(**attrs) - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/distutils/core.py", line 185, in setup - return runcommands(dist) - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands - dist.run_commands() - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands - self.run_command(cmd) - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 989, in run_command - super().run_command(command) - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command - cmd_obj.run() - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 318, in run - self.find_sources() - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 326, in find_sources - mm.run() - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 548, in run - self.add_defaults() - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 586, in add_defaults - sdist.add_defaults(self) - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/command/sdist.py", line 113, in add_defaults - super().add_defaults() - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 251, in add_defaults - self._add_defaults_ext() - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 336, in _add_defaults_ext - self.filelist.extend(build_ext.get_source_files()) - File "<string>", line 201, in get_source_files - File "/tmp/pip-build-env-q1pay0xo/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 107, in __getattr - raise AttributeError(attr) - AttributeError: cython_sources - [end of output] - -note: This error originates from a subprocess, and is likely not a problem with pip. -py3-airflow-2.1.4: exit 1 (7.85 seconds) /home/obr_erikal/projects/OpenLineage/integration/airflow> python -m pip install --find-links target/wheels/ --find-links ../sql/iface-py/target/wheels --use-deprecated=legacy-resolver --constraint=https://raw.githubusercontent.com/apache/airflow/constraints-2.1.4/constraints-3.8.txt apache-airflow==2.1.4 'mypy>=0.9.6' pytest pytest-mock -r dev-requirements.txt pid=368621 -py3-airflow-2.1.4: FAIL ✖ in 7.92 seconds`
Erik AlfthanThursday, September 28th, 2023 at 6:53:54 AM GMT-04:00
Then, for the actual error in my PR: Evidently you are not using isort, so what linter/fixer should I use for imports?
Jakub DardzińskiThursday, September 28th, 2023 at 6:58:15 AM GMT-04:00
for the error - I think there’s a mistake in the docs. Could you please run maturin build --out target/wheels as a temp solution?
👀1
Jakub DardzińskiThursday, September 28th, 2023 at 6:58:57 AM GMT-04:00
we’re using ruff , tox runs it as one of commands
Erik AlfthanThursday, September 28th, 2023 at 7:00:37 AM GMT-04:00
Not in the airflow folder?
OpenLineage/integration/airflow$ maturin build --out target/wheels
:boom: maturin failed
Caused by: pyproject.toml at /home/obr_erikal/projects/OpenLineage/integration/airflow/pyproject.toml is invalid
Caused by: TOML parse error at line 1, column 1
|
1 | [tool.ruff]
| ^
missing field build-system``
Jakub DardzińskiThursday, September 28th, 2023 at 7:02:32 AM GMT-04:00
I meant change here https://github.com/OpenLineage/OpenLineage/blob/main/integration/sql/README.md

so
cd iface-py
-python -m pip install maturin
-maturin build --out ../target/wheels

becomes
cd iface-py
-python -m pip install maturin
-maturin build --out target/wheels

tox runs
install_command = python -m pip install {opts} --find-links target/wheels/ \
-    --find-links ../sql/iface-py/target/wheels

but it should be
install_command = python -m pip install {opts} --find-links target/wheels/ \
-    --find-links ../sql/target/wheels

actually and I’m posting PR to fix that
Erik AlfthanThursday, September 28th, 2023 at 7:05:12 AM GMT-04:00
yes, that part I actually worked out myself, but the cython_sources error I fail to understand cause. I have python3-dev installed on WSL Ubuntu with python version 3.10.12 in a virtualenv. Anything in that that could cause issues?
Jakub DardzińskiThursday, September 28th, 2023 at 7:12:20 AM GMT-04:00
looks like it has something to do with latest release of Cython?
pip install "Cython&lt;3" maybe solves the issue?
Erik AlfthanThursday, September 28th, 2023 at 7:15:06 AM GMT-04:00
I didnt have any cython before the install. Also no change. Could it be some update to setuptools itself? seems like the depreciation notice and the error is coming from inside setuptools
Erik AlfthanThursday, September 28th, 2023 at 7:16:59 AM GMT-04:00
(I.e. I tried the pip install "Cython&lt;3" command without any change in the output )
Erik AlfthanThursday, September 28th, 2023 at 7:20:30 AM GMT-04:00
Applying ruff lint on the converter.py file fixed the issue on the PR though so unless you have any feedback on the change itself, I will set it up on my own computer later instead (right now doing changes on behalf of a client on the clients computer)

If the issue persists on my own computer, I'll dig a bit further
Jakub DardzińskiThursday, September 28th, 2023 at 7:21:03 AM GMT-04:00
It’s a bit hard for me to find the root cause as I cannot reproduce this locally and CI works fine as well
Erik AlfthanThursday, September 28th, 2023 at 7:22:41 AM GMT-04:00
Yeah, I am thinking that if I run into the same problem "at home", I might find it worthwhile to understand the issue. Right now, the client only wants the fix.
👍1
Erik AlfthanThursday, September 28th, 2023 at 7:25:10 AM GMT-04:00
Is there an official release cycle?

or more specific, given that the PRs are approved, how soon can they reach openlineage-dbt and apache-airflow-providers-openlineage ?
Jakub DardzińskiThursday, September 28th, 2023 at 7:28:58 AM GMT-04:00
we need to differentiate some things:
1. OpenLineage repository:
a. dbt integration - this is the only place where it is maintained
b. Airflow integration - here we only keep backwards compatibility but generally speaking starting from Airflow 2.7+ we would like to do all the job in Airflow repo as OL Airflow provider
2. Airflow repository - there’s only Airflow Openlineage provider compatible (and works best) with Airflow 2.7+

we have control over releases (obviously) in OL repo - it’s monthly cycle so beginning next week that should happen. There’s also a possibility to ask for ad-hoc release in #general slack channel and with approvals of committers the new version is also released


For Airflow providers - the cycle is monthly as well
Jakub DardzińskiThursday, September 28th, 2023 at 7:31:30 AM GMT-04:00
it’s a bit complex for this split but needed temporarily
Erik AlfthanThursday, September 28th, 2023 at 7:31:47 AM GMT-04:00
oh, I did the fix in the wrong place! The client is on airflow 2.7 and is using the provider. Is it syncing?
Jakub DardzińskiThursday, September 28th, 2023 at 7:32:28 AM GMT-04:00
it’s not, two separate places and we haven’t even added the whole thing with converting old lineage objects to OL specific

editing, that’s not true
Jakub DardzińskiThursday, September 28th, 2023 at 7:34:40 AM GMT-04:00
Jakub DardzińskiThursday, September 28th, 2023 at 7:35:17 AM GMT-04:00
sorry I did not mention this earlier. we definitely need to add some guidance how to proceed with contributions to OL and Airflow OL provider
Erik AlfthanThursday, September 28th, 2023 at 7:36:10 AM GMT-04:00
anyway, the dbt fix is the blocking issue, so if that parts comes next week, there is no real urgency in getting the columns. It is a nice to have for our ingest parquet files.
Jakub DardzińskiThursday, September 28th, 2023 at 7:37:12 AM GMT-04:00
may I ask if you use some custom operator / python operator there?
Erik AlfthanThursday, September 28th, 2023 at 7:37:33 AM GMT-04:00
yeah, taskflow with inlets/outlets
Erik AlfthanThursday, September 28th, 2023 at 7:38:38 AM GMT-04:00
so we extract from sources and use pyarrow to create parquet files in storage that an mssql-server can use as external tables
Jakub DardzińskiThursday, September 28th, 2023 at 7:39:54 AM GMT-04:00
awesome 👍
we have plans to integrate more with Python operator as well but not earlier than in Airflow 2.8
Erik AlfthanThursday, September 28th, 2023 at 7:43:41 AM GMT-04:00
I guess writing a generic extractor for the python operator is quite hard, but if you could support some inlet/outlet type for tabular fileformat / their python libraries like pyarrow or maybe even pandas and document it, I think a lot of people would understand how to use them
1
Michael RobinsonThursday, September 28th, 2023 at 4:16:24 PM GMT-04:00
Are you located in the Brussels area or within commutable distance? Interested in attending a meetup between October 16-20? If so, please DM @Sheeri Cabral (Collibra) or myself. TIA
❤️1
Michael RobinsonMonday, October 2nd, 2023 at 11:58:32 AM GMT-04:00
@channel
Hello all, I’d like to open a vote to release OpenLineage 1.3.0, including:
• support for Spark 3.5 in the Spark integration
• scheme preservation bug fix in the Spark integration
• find-links path in tox bug in the Airflow integration fix
• more graceful logging when no OL provider is installed in the Airflow integration
• columns as schema facet for airflow.lineage.Table addition
• SQLSERVER to supported dbt profile types addition
Three +1s from committers will authorize. Thanks in advance.
🙌3
👍2
4
Michael RobinsonMonday, October 2nd, 2023 at 5:00:08 PM GMT-04:00
Thanks all. The release is authorized and will be initiated within 2 business days.
Jason YipMonday, October 2nd, 2023 at 5:11:46 PM GMT-04:00
looking forward to that, I am seeing inconsistent results in Databricks for Spark 3.4+, sometimes there's no inputs / outputs, hope that is fixed?
Harel SheinTuesday, October 3rd, 2023 at 9:59:24 AM GMT-04:00
@Jason Yip if it isn’t fixed for you, would love it if you could open up an issue that will allow us to reproduce and fix
👍1
Jason YipTuesday, October 3rd, 2023 at 8:23:40 PM GMT-04:00
@Harel Shein the issue still exists -> Spark 3.4 and above, including 3.5, saveAsTable and create table won't have inputs and outputs in Databricks
Jason YipTuesday, October 3rd, 2023 at 8:30:15 PM GMT-04:00
Jason YipTuesday, October 3rd, 2023 at 8:30:21 PM GMT-04:00
and of course this issue still exists
Harel SheinTuesday, October 3rd, 2023 at 9:45:09 PM GMT-04:00
thanks for posting, we’ll continue looking into this.. if you find any clues that might help, please let us know.
Jason YipTuesday, October 3rd, 2023 at 9:46:27 PM GMT-04:00
is there any instructions on how to hook up a debugger to OL?
Harel SheinWednesday, October 4th, 2023 at 9:04:16 AM GMT-04:00
@Paweł Leszczyński has been working on adding a debug facet, but more suggestions are more than welcome!
Harel SheinWednesday, October 4th, 2023 at 9:05:58 AM GMT-04:00
👀1
👍1
Jason YipThursday, October 5th, 2023 at 3:20:11 AM GMT-04:00
@Paweł Leszczyński do you have a build for the PR? Appreciated!
Harel SheinThursday, October 5th, 2023 at 3:05:08 PM GMT-04:00
we’ll ask for a release once it’s reviewed and merged
Michael RobinsonMonday, October 2nd, 2023 at 12:28:28 PM GMT-04:00
@channel
The September issue of OpenLineage News is here! This issue covers the big news about OpenLineage coming out of Airflow Summit, progress on the Airflow Provider, highlights from our meetup in Toronto, and much more.
To get the newsletter directly in your inbox each month, sign up here.
🦆2
🔥3
Damien HawesTuesday, October 3rd, 2023 at 3:44:36 AM GMT-04:00
Hi folks - I'm wondering if its just me, but does io.openlineage:openlineage-sql-java:1.2.2 ship with the arm64.dylib binary? When i try and run code that uses the Java package on an Apple M1, the binary isn't found, The workaround is to checkout 1.2.2 and then build and publish it locally.
Paweł LeszczyńskiTuesday, October 3rd, 2023 at 9:01:38 AM GMT-04:00
Not sure if I follow your question. Whenever OL is released, there is a script new-version.sh - https://github.com/OpenLineage/OpenLineage/blob/main/new-version.sh being run and modify the codebase.

So, If you pull the code, it contains OL version that has not been released yet and in case of dependencies, one need to build them on their own.

For example, here https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#preparation Preparation section describes how to build openlineage-java and openlineage-sql in order to build openlineage-spark.
Damien HawesWednesday, October 4th, 2023 at 5:27:26 AM GMT-04:00
Hmm. Let's elaborate my use case a bit.

We run Apache Hive on-premise. Hive provides query execution hooks for pre-query, post-query, and I think failed query.

Any way, as part of the hook, you're given the query string.

So I, naturally, tried to pass the query string into OpenLineageSql.parse(Collections.singletonList(hookContext.getQueryPlan().getQueryStr()), "hive") in order to test this out.

I was using openlineage-sql-java:1.2.2 at that time, and no matter what query string I gave it, nothing was returned.

I then stepped through the code and noticed that it was looking for the arm64 lib, and I noticed that that package (downloaded from maven central) lacked that particular native binary.
Damien HawesWednesday, October 4th, 2023 at 5:27:36 AM GMT-04:00
I hope that helps.
👍1
Paweł LeszczyńskiWednesday, October 4th, 2023 at 9:03:02 AM GMT-04:00
I get in now. In Circle CI we do have 3 build steps:
            - build-integration-sql-x86
-            - build-integration-sql-arm
-            - build-integration-sql-macos

but no mac m1. I think at that time circle CI did not have a proper resource class in free plan. Additionally, @Maciej Obuchowski would prefer to migrate this to github actions as he claims this can be achieved there in a cleaner way (https://github.com/OpenLineage/OpenLineage/issues/1624).

Feel free to create an issue for this. Others would be able to upvote it in case they have similar experience.
Maciej ObuchowskiMonday, October 23rd, 2023 at 11:56:12 AM GMT-04:00
It doesn't have the free resource class still 😞
We're blocked on that unfortunately. Other solution would be to migrate to GH actions, where most of our solution could be replaced by something like that https://github.com/PyO3/maturin-action
Michael RobinsonTuesday, October 3rd, 2023 at 10:56:03 AM GMT-04:00
@channel
We released OpenLineage 1.3.1!
Added:
• Airflow: add some basic stats to the Airflow integration #1845 @harels
• Airflow: add columns as schema facet for airflow.lineage.Table (if defined) #2138 @erikalfthan
• DBT: add SQLSERVER to supported dbt profile types #2136 @erikalfthan
• Spark: support for latest 3.5 #2118 @pawel-big-lebowski
Fixed:
• Airflow: fix find-links path in tox #2139 @JDarDagran
• Airflow: add more graceful logging when no OpenLineage provider installed #2141 @JDarDagran
• Spark: fix bug in PathUtils’ prepareDatasetIdentifierFromDefaultTablePath (CatalogTable) to correctly preserve scheme from CatalogTable’s location #2142 @d-m-h
Thanks to all the contributors, including new contributor @Erik Alfthan!
Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.3.1
Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md
Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.2.2...1.3.1
Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage
PyPI: https://pypi.org/project/openlineage-python/
👍4
🎉1
Mars LanWednesday, October 4th, 2023 at 7:42:59 AM GMT-04:00
Any chance we can do a 1.3.2 soonish to include https://github.com/OpenLineage/OpenLineage/pull/2151 instead of waiting for the next monthly release?
Matthew ParasTuesday, October 3rd, 2023 at 12:34:57 PM GMT-04:00
Hey everyone - does anyone have a good mechanism for alerting on issues with open lineage? For example, maybe alerting when an event times out - perhaps to prometheus or some other kind of generic endpoint? Not sure the best approach here (if the meta inf extension would be able to achieve it)
Paweł LeszczyńskiWednesday, October 4th, 2023 at 3:01:02 AM GMT-04:00
That's a great usecase for OpenLineage. Unfortunately, we don't have any doc or recomendation on that.

I would try using FluentD proxy we have (https://github.com/OpenLineage/OpenLineage/tree/main/proxy/fluentd) to copy event stream (alerting is just one of usecases for lineage events) and write fluentd plugin to send it asynchronously further to alerting service like PagerDuty.

It looks cool to me but I never had enough time to test this approach.
👍1
Julien Le DemTuesday, October 3rd, 2023 at 8:33:35 PM GMT-04:00
I have cleaned up the registry proposal.
https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit
In particular:
• I clarified that option 2 is preferred at this point.
• I moved discussion notes to the bottom. they will go away at some point
• Once it is stable, I’ll create a proposal with the preferred option.
• we need a good proposal for the core facets prefix. My suggestion is to move core facets to core in the registry. The drawback is prefix would be inconsistent.
Michael RobinsonThursday, October 5th, 2023 at 2:44:14 PM GMT-04:00
@channel
This month’s TSC meeting is next Thursday the 12th at 10am PT. On the tentative agenda:
• announcements
• recent releases
• Airflow Summit recap
• tutorial: migrating to the Airflow Provider
• discussion topic: observability for OpenLineage/Marquez
• open discussion
• more (TBA)
More info and the meeting link can be found on the website. All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda.
👀3
Julien Le DemThursday, October 5th, 2023 at 5:34:12 PM GMT-04:00
I have created a ticket to make this easier to find. Once I get more feedback I’ll turn it into a md file in the repo: https://docs.google.com/document/d/1zIxKST59q3I6ws896M4GkUn7IsueLw8ejct5E-TR0vY/edit#heading=h.enpbmvu7n8gu
https://github.com/OpenLineage/OpenLineage/issues/2161
Julien Le DemThursday, October 5th, 2023 at 8:40:40 PM GMT-04:00
🎉2
Mars LanFriday, October 6th, 2023 at 7:19:01 AM GMT-04:00
@Michael Robinson can we cut a new release to include this change?
https://github.com/OpenLineage/OpenLineage/pull/2151
5
Michael RobinsonFriday, October 6th, 2023 at 7:16:02 PM GMT-04:00
Thanks for requesting a release, @Mars Lan. It has been approved and will be initiated within 2 business days of next Monday.
🙏1
Guntaka Jeevan PaulSunday, October 8th, 2023 at 11:59:36 PM GMT-04:00
@here I am trying out the openlineage integration of spark on databricks. There is no event getting emitted from Openlineage, I see logs saying OpenLineage Event Skipped. I am attaching the Notebook that i am trying to run and the cluster logs. Kindly can someone help me on this
Jason YipMonday, October 9th, 2023 at 12:02:10 AM GMT-04:00
from my experience, it will only work on Spark 3.3.x or below, aka Runtime 12.2 or below. Anything above the events will show up once in a blue moon
Guntaka Jeevan PaulMonday, October 9th, 2023 at 12:04:38 AM GMT-04:00
ohh, thanks for the information @Jason Yip, I am trying out with 13.3 Databricks Version and Spark 3.4.1, will try using a below version as you suggested. Any issue tracking this bug @Jason Yip
Jason YipMonday, October 9th, 2023 at 12:06:06 AM GMT-04:00
Guntaka Jeevan PaulMonday, October 9th, 2023 at 12:11:54 AM GMT-04:00
tried with databricks 12.2 --> spark 3.3.2, still the same behaviour no event getting emitted
Jason YipMonday, October 9th, 2023 at 12:12:35 AM GMT-04:00
you can do 11.3, its the most stable one I know
Guntaka Jeevan PaulMonday, October 9th, 2023 at 12:12:46 AM GMT-04:00
sure, let me try that out
Guntaka Jeevan PaulMonday, October 9th, 2023 at 12:31:51 AM GMT-04:00
still the same problem…the jar that i am using is the latest openlineage-spark-1.3.1.jar, do you think that can be the problem
Guntaka Jeevan PaulMonday, October 9th, 2023 at 12:43:59 AM GMT-04:00
tried with openlineage-spark-1.2.2.jar, still the same issue, seems like they are skipping some events
Jason YipMonday, October 9th, 2023 at 1:47:20 AM GMT-04:00
Probably not all events will be captured, I have only tested create tables and jobs
Paweł LeszczyńskiMonday, October 9th, 2023 at 4:31:12 AM GMT-04:00
Hi @Guntaka Jeevan Paul, how did you configure openlineage and what is your job doing?

We do have a bunch of integration tests on Databricks platform available here and they're passing on databricks runtime 13.0.x-scala2.12.

Could you also try running code same as our test does (this one)? If you run it and see OL events, this will make us sure your config is OK and we can continue further debug.

Looking at your spark script: could you save your dataset and see if you still don't see any events?
Guntaka Jeevan PaulMonday, October 9th, 2023 at 5:06:41 AM GMT-04:00
babynames = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("dbfs:/FileStore/babynames.csv")
-babynames.createOrReplaceTempView("babynames_table")
-years = spark.sql("select distinct(Year) from babynames_table").rdd.map(lambda row : row[0]).collect()
-years.sort()
-dbutils.widgets.dropdown("year", "2014", [str(x) for x in years])
-display(babynames.filter(babynames.Year == dbutils.widgets.get("year")))
Guntaka Jeevan PaulMonday, October 9th, 2023 at 5:08:09 AM GMT-04:00
this is the script that i am running @Paweł Leszczyński…kindly let me know if i’m doing any mistake. I have added the init script at the cluster level and from the logs i could see that openlineage is configured as i see a log statement
Paweł LeszczyńskiMonday, October 9th, 2023 at 5:10:30 AM GMT-04:00
there's nothing wrong in that script. It's just we decided to limit amount of OL events for jobs that don't write their data anywhere and just do collect operation
Paweł LeszczyńskiMonday, October 9th, 2023 at 5:11:02 AM GMT-04:00
this is also a potential reason why can't you see any events
Guntaka Jeevan PaulMonday, October 9th, 2023 at 5:14:33 AM GMT-04:00
ohh…okk, will try out the test script that you have mentioned above. Kindly correct me if my understanding is correct, so if there are a few transformatiosna nd finally writing somewhere that is where the OL events are expected to be emitted?
Paweł LeszczyńskiMonday, October 9th, 2023 at 5:16:54 AM GMT-04:00
yes. main purpose of the lineage is to track dependencies between the datasets, when a job reads from dataset A and writes to dataset B. In case of databricks notebook, that do show or collect and print some query result on the screen, there may be no reason to track it in the sense of lineage.
Michael RobinsonMonday, October 9th, 2023 at 3:25:14 PM GMT-04:00
@channel
We released OpenLineage 1.4.1!
Additions:
Client: allow setting client’s endpoint via environment variable 2151 @Mars Lan
Flink: expand Iceberg source types 2149 @U05QA2D1XNV
Spark: add debug facet 2147 @Paweł Leszczyński
Spark: enable Nessie REST catalog 2165 @julwin
Thanks to all the contributors, especially new contributors @U05QA2D1XNV and @julwin!
Release: https://github.com/OpenLineage/OpenLineage/releases/tag/1.4.1
Changelog: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md
Commit history: https://github.com/OpenLineage/OpenLineage/compare/1.3.1...1.4.1
Maven: https://oss.sonatype.org/#nexus-search;quick~openlineage
PyPI: https://pypi.org/project/openlineage-python/
👍5
Drew BittenbenderMonday, October 9th, 2023 at 4:55:35 PM GMT-04:00
Hello. I am getting started with OL and Marquez with dbt. I am using dbt-ol. The namespace of the dataset showing up in Marquez is not the namespace I provide using OPENLINEAGE_NAMESPACE. It happens to be the same as the source in Marquez which is the snowflake account uri. It's obviously picking up the other env variable OPENLINEAGE_URL so i am pretty sure its not the environment. Is this expected?
Michael RobinsonMonday, October 9th, 2023 at 6:56:13 PM GMT-04:00
Hi Drew, thank you for using OpenLineage! I don’t know the details of your use case, but I believe this is expected, yes. In general, the dataset namespace is different. Jobs are namespaced separately from datasets, which are namespaced by their containing datasources. This is the case so datasets have the same name regardless of the job writing to them, as datasets are sometimes shared by jobs in different namespaces.
👍1
Jason YipTuesday, October 10th, 2023 at 1:05:11 AM GMT-04:00
Any idea why "environment-properties" is gone in Spark 3.4+ in StartEvent?
Jason YipTuesday, October 10th, 2023 at 8:53:59 PM GMT-04:00
example:

{"environment-properties":{"spark.databricks.clusterUsageTags.clusterName":"<mailto:jason.yip@tredence.com|jason.yip@tredence.com>'s Cluster","spark.databricks.job.runId":"","spark.databricks.job.type":"","spark.databricks.clusterUsageTags.azureSubscriptionId":"a4f54399-8db8-4849-adcc-a42aed1fb97f","spark.databricks.notebook.path":"/Repos/jason.yip@tredence.com/segmentation/01_Data Prep","spark.databricks.clusterUsageTags.clusterOwnerOrgId":"4679476628690204","MountPoints":[{"MountPoint":"/databricks-datasets","Source":"databricks-datasets"},{"MountPoint":"/Volumes","Source":"UnityCatalogVolumes"},{"MountPoint":"/databricks/mlflow-tracking","Source":"databricks/mlflow-tracking"},{"MountPoint":"/databricks-results","Source":"databricks-results"},{"MountPoint":"/databricks/mlflow-registry","Source":"databricks/mlflow-registry"},{"MountPoint":"/Volume","Source":"DbfsReserved"},{"MountPoint":"/volumes","Source":"DbfsReserved"},{"MountPoint":"/","Source":"DatabricksRoot"},{"MountPoint":"/volume","Source":"DbfsReserved"}],"User":"<mailto:jason.yip@tredence.com|jason.yip@tredence.com>","UserId":"4768657035718622","OrgId":"4679476628690204"}}
Paweł LeszczyńskiWednesday, October 11th, 2023 at 3:46:13 AM GMT-04:00
Is this related to any OL version? In OL 1.2.2. we've added extra variable spark.databricks.clusterUsageTags.clusterAllTags to be captured, but this should not break things.

I think we're facing some issues on recent databricks runtime versions. Here is an issue for this: https://github.com/OpenLineage/OpenLineage/issues/2131

Is the problem you describe specific to some databricks runtime versions?
Jason YipWednesday, October 11th, 2023 at 11:17:06 AM GMT-04:00
yes, exactly Spark 3.4+
Jason YipWednesday, October 11th, 2023 at 9:12:27 PM GMT-04:00
Btw I don't understand the code flow entirely, if we are talking about a different classpath only, I see there's Unity Catalog handler in the code and it says it works the same as Delta, but I am not seeing it subclassing Delta. I suppose it will work the same.

I am happy to jump on a call to show you if needed
Jason YipMonday, October 16th, 2023 at 2:58:56 AM GMT-04:00
@Paweł Leszczyński do you think in Spark 3.4+ only one event would happen?

/**
* We get exact copies of OL events for org.apache.spark.scheduler.SparkListenerJobStart and
* org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart. The same happens for end
* events.
*
* @return
*/
private boolean isOnJobStartOrEnd(SparkListenerEvent event) {
return event instanceof SparkListenerJobStart || event instanceof SparkListenerJobEnd;
}
Guntaka Jeevan PaulTuesday, October 10th, 2023 at 11:43:39 PM GMT-04:00
@here i am trying out the databricks spark integration and in one of the events i am getting a openlineage event where the output dataset is having a facet called symlinks , the statement that generated this event is this sql
CREATE TABLE IF NOT EXISTS covid_research.covid_data 
-USING CSV
-LOCATION '<abfss://oltptestdata@jeevanacceldata.dfs.core.windows.net/testdata/johns-hopkins-covid-19-daily-dashboard-cases-by-states.csv>' 
-OPTIONS (header "true", inferSchema "true");

Can someone kindly let me know what this symlinks facet is. i tried seeing the spec but did not get it completely
Jason YipTuesday, October 10th, 2023 at 11:44:53 PM GMT-04:00
I use it to get the table with database name
Guntaka Jeevan PaulTuesday, October 10th, 2023 at 11:47:15 PM GMT-04:00
so can i think it like if there is a synlink, then that table is kind of a reference to the original dataset
Jason YipWednesday, October 11th, 2023 at 1:25:44 AM GMT-04:00
yes
🙌1
Guntaka Jeevan PaulWednesday, October 11th, 2023 at 6:55:58 AM GMT-04:00
@here When i am running this sql as part of a databricks notebook, i am recieving an OL event where i see only an output dataset and there is no input dataset or a symlink facet inside the dataset to map it to the underlying azure storage object. Can anyone kindly help on this
spark.sql(f"CREATE TABLE IF NOT EXISTS covid_research.uscoviddata USING delta LOCATION '<abfss://oltptestdata@jeevanacceldata.dfs.core.windows.net/testdata/modified-delta>'")
-{
-    "eventTime": "2023-10-11T10:47:36.296Z",
-    "producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>",
-    "schemaURL": "<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent>",
-    "eventType": "COMPLETE",
-    "run": {
-        "runId": "d0f40be9-b921-4c84-ac9f-f14a86c29ff7",
-        "facets": {
-            "spark.logicalPlan": {
-                "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>",
-                "_schemaURL": "<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet>",
-                "plan": [
-                    {
-                        "class": "org.apache.spark.sql.catalyst.plans.logical.CreateTable",
-                        "num-children": 1,
-                        "name": 0,
-                        "tableSchema": [],
-                        "partitioning": [],
-                        "tableSpec": null,
-                        "ignoreIfExists": true
-                    },
-                    {
-                        "class": "org.apache.spark.sql.catalyst.analysis.ResolvedIdentifier",
-                        "num-children": 0,
-                        "catalog": null,
-                        "identifier": null
-                    }
-                ]
-            },
-            "spark_version": {
-                "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>",
-                "_schemaURL": "<https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet>",
-                "spark-version": "3.3.0",
-                "openlineage-spark-version": "1.2.2"
-            },
-            "processing_engine": {
-                "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>",
-                "_schemaURL": "<https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet>",
-                "version": "3.3.0",
-                "name": "spark",
-                "openlineageAdapterVersion": "1.2.2"
-            }
-        }
-    },
-    "job": {
-        "namespace": "default",
-        "name": "adb-3942203504488904.4.azuredatabricks.net.create_table.covid_research_db_uscoviddata",
-        "facets": {}
-    },
-    "inputs": [],
-    "outputs": [
-        {
-            "namespace": "dbfs",
-            "name": "/user/hive/warehouse/covid_research.db/uscoviddata",
-            "facets": {
-                "dataSource": {
-                    "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>",
-                    "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet>",
-                    "name": "dbfs",
-                    "uri": "dbfs"
-                },
-                "schema": {
-                    "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>",
-                    "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet>",
-                    "fields": []
-                },
-                "storage": {
-                    "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>",
-                    "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/StorageDatasetFacet.json#/$defs/StorageDatasetFacet>",
-                    "storageLayer": "unity",
-                    "fileFormat": "parquet"
-                },
-                "symlinks": {
-                    "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>",
-                    "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet>",
-                    "identifiers": [
-                        {
-                            "namespace": "/user/hive/warehouse/covid_research.db",
-                            "name": "covid_research.uscoviddata",
-                            "type": "TABLE"
-                        }
-                    ]
-                },
-                "lifecycleStateChange": {
-                    "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark>",
-                    "_schemaURL": "<https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet>",
-                    "lifecycleStateChange": "CREATE"
-                }
-            },
-            "outputFacets": {}
-        }
-    ]
-}
Damien HawesWednesday, October 11th, 2023 at 6:57:46 AM GMT-04:00
Hey Guntaka - can I ask you a favour? Can you please stop using @here or @channel - please keep in mind, you're pinging over 1000 people when you use that mention. Its incredibly distracting to have Slack notify me of a message that isn't pertinent to me.
Guntaka Jeevan PaulWednesday, October 11th, 2023 at 6:58:50 AM GMT-04:00
sure noted @Damien Hawes
Damien HawesWednesday, October 11th, 2023 at 6:59:34 AM GMT-04:00
Thank you!
Madhav KakumaniWednesday, October 11th, 2023 at 12:04:24 PM GMT-04:00
Hi @there, I am trying to make API call to get column-lineage information could you please let me know the url construct to retrieve the same? As per the API documentation I am passing the following url to GET column-lineage: http://localhost:5000/api/v1/column-lineage but getting error code:400. Thanks
Willy LulciucThursday, October 12th, 2023 at 1:55:26 PM GMT-04:00
Make sure to provide a dataset field nodeId as a query param in your request. If you’ve seeded Marquez with test metadata, you can use:
curl -XGET "<http://localhost:5002/api/v1/column-lineage?nodeId=datasetField%3Afood_delivery%3Apublic.delivery_7_days%3Acustomer_email>"

You can view the API docs for column lineage here!
Madhav KakumaniTuesday, October 17th, 2023 at 5:57:36 AM GMT-04:00
Thanks Willy. The documentation says 'name space' so i constructed API Like this:
'http://marquez-web:3000/api/v1/column-lineage/nodeId=datasetField:file:/home/jovyan/Downloads/event_attribute.csv:eventType'
but it is still not working 😞
Madhav KakumaniTuesday, October 17th, 2023 at 6:07:06 AM GMT-04:00
nodeId is constructed like this: datasetField:<namespace>:<dataset>:<field name>
Michael RobinsonWednesday, October 11th, 2023 at 1:00:01 PM GMT-04:00
@channel
Friendly reminder: this month’s TSC meeting, open to all, is tomorrow at 10 am PT: https://openlineage.slack.com/archives/C01CK9T7HKR/p1696531454431629
Michael RobinsonWednesday, October 11th, 2023 at 2:26:45 PM GMT-04:00
Newly added discussion topics:
• a proposal to add a Registry of Consumers and Producers
• a dbt issue to add OpenLineage Dataset names to the Manifest
• a proposal to add Dataset support in Spark LogicalPlan Nodes
• a proposal to institute a certification process for new integrations
Jason YipThursday, October 12th, 2023 at 3:08:34 PM GMT-04:00
This might be a dumb question, I guess I need to setup local Spark in order for the Spark tests to run successfully?
Paweł LeszczyńskiFriday, October 13th, 2023 at 1:56:19 AM GMT-04:00
Guntaka Jeevan PaulFriday, October 13th, 2023 at 6:41:56 AM GMT-04:00
when trying to install openlineage-java in local via this command --> cd ../../client/java/ && ./gradlew publishToMavenLocal, i am receiving this error
&gt; Task :signMavenJavaPublication FAILED
-
-FAILURE: Build failed with an exception.
-
-* What went wrong:
-Execution failed for task ':signMavenJavaPublication'.
-&gt; Cannot perform signing task ':signMavenJavaPublication' because it has no configured signatory
Jason YipFriday, October 13th, 2023 at 1:35:06 PM GMT-04:00
@Paweł Leszczyński this is what I am getting
Jason YipFriday, October 13th, 2023 at 1:36:00 PM GMT-04:00
attaching the html
Paweł LeszczyńskiMonday, October 16th, 2023 at 3:02:13 AM GMT-04:00
which java are you using? what is your operation system (is it windows?)?
Jason YipMonday, October 16th, 2023 at 3:35:18 AM GMT-04:00
yes it is Windows, i downloaded java 8 but I can try to build it with Linux subsystem or Mac
Guntaka Jeevan PaulMonday, October 16th, 2023 at 3:35:51 AM GMT-04:00
In my case it is Mac
Jason YipMonday, October 16th, 2023 at 3:56:09 AM GMT-04:00
* Where:
Build file '/mnt/c/Users/jason/Downloads/github/OpenLineage/integration/spark/build.gradle' line: 9

* What went wrong:
An exception occurred applying plugin request [id: 'com.adarshr.test-logger', version: '3.2.0']
> Failed to apply plugin [id 'com.adarshr.test-logger']
> Could not generate a proxy class for class com.adarshr.gradle.testlogger.TestLoggerExtension.

* Try:
Jason YipMonday, October 16th, 2023 at 3:56:23 AM GMT-04:00
tried with Linux subsystem
Paweł LeszczyńskiMonday, October 16th, 2023 at 4:04:29 AM GMT-04:00
we don't have any restrictions for windows builds, however it is something we don't test regularly. 2h ago we did have a successful build on circle CI https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/8271/workflows/0ec521ae-cd21-444a-bfec-554d101770ea
Jason YipMonday, October 16th, 2023 at 4:13:04 AM GMT-04:00
... 111 more
Caused by: java.lang.ClassNotFoundException: org.gradle.api.provider.HasMultipleValues
... 117 more
Jason YipTuesday, October 17th, 2023 at 12:26:07 AM GMT-04:00
@Paweł Leszczyński now I am doing gradlew instead of gradle on windows coz Linux one doesn't work. The doc didn't mention about setting up Spark / Hadoop and that's my original question -- do I need to setup local Spark? Now it's throwing an error on Hadoop: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
Jason YipSaturday, October 21st, 2023 at 11:33:48 PM GMT-04:00
Got it working with Mac, couldn't get it working with Windows / Linux subsystem
Jason YipSunday, October 22nd, 2023 at 1:08:40 PM GMT-04:00
Now getting class not found despite build and test succeeded
Jason YipSunday, October 22nd, 2023 at 9:46:23 PM GMT-04:00
I uploaded the wrong jar.. there are so many jars, only the jar in the spark folder works, not subfolder
Anirudh ShrinivasonFriday, October 13th, 2023 at 2:48:40 AM GMT-04:00
Hi team, I am running the following pyspark code in a cell:
print("SELECTING 100 RECORDS FROM METADATA TABLE")
-df = spark.sql("""select * from <table> limit 100""")
-
-print("WRITING (1) 100 RECORDS FROM METADATA TABLE")
-df.write.mode("overwrite").format('delta').save("<s3 location 1>")
-df.createOrReplaceTempView("temp_metadata")
-
-print("WRITING (2) 100 RECORDS FROM METADATA TABLE")
-df.write.mode("overwrite").format("delta").save("<s3 location 2>")
-
-print("READING (1) 100 RECORDS FROM METADATA TABLE")
-df_read = spark.read.format('delta').load("<s3 location 3>")
-df_read.createOrReplaceTempView("metadata_1")
-
-print("DOING THE MERGE INTO SQL STEP!")
-df_new = spark.sql("""
-    MERGE INTO metadata_1
-    USING <table>
-    ON metadata_1.id = temp_metadata.id
-    WHEN MATCHED THEN UPDATE SET 
-        metadata_1.id = temp_metadata.id,
-        metadata_1.aspect = temp_metadata.aspect
-    WHEN NOT MATCHED THEN INSERT (id, aspect) 
-        VALUES (temp_metadata.id, temp_metadata.aspect)
-""")

I am running with debug log levels. I actually don't see any of the events being logged for SaveIntoDataSourceCommand or the MergeIntoCommand, but OL is in fact emitting events to the backend. It seems like the events are just not being logged... I actually observe this for all delta table related spark sql queries...
Anirudh ShrinivasonMonday, October 16th, 2023 at 12:01:42 AM GMT-04:00
Hi @Paweł Leszczyński is this expected? CMIIW but we should expect to see the events being logged when running with debug log level right?
Damien HawesMonday, October 16th, 2023 at 4:17:30 AM GMT-04:00
It's impossible to know without seeing how you've configured the listener.

Can you show this configuration?
Anirudh ShrinivasonTuesday, October 17th, 2023 at 3:15:20 AM GMT-04:00
spark.openlineage.transport.url &lt;url&gt;
-spark.openlineage.transport.endpoint /&lt;endpoint&gt;
-spark.openlineage.transport.type http
-spark.extraListeners io.openlineage.spark.agent.OpenLineageSparkListener
-spark.openlineage.facets.custom_environment_variables [BUNCH_OF_VARIABLES;]
-spark.openlineage.facets.disabled [spark_unknown\;spark.logicalPlan]

These are my spark configs... I'm setting log level to debug with sc.setLogLevel("DEBUG")
Damien HawesTuesday, October 17th, 2023 at 4:40:03 AM GMT-04:00
Two things:

1. If you want debug logs, you're going to have to provide a log4j.properties file or log4j2.properties file depending on the version of spark you're running. In that file, you will need to configure the logging levels. If I am not mistaken, the sc.setLogLevel controls ONLY the log levels of Spark namespaced components (i.e., org.apache.spark)
2. You're telling the listener to emit to a URL. If you want to see the events emitted to the console, then set spark.openlineage.transport.type=console, and remove the other spark.openlineage.transport.* configurations.
Do either (1) or (2).
Anirudh ShrinivasonFriday, October 20th, 2023 at 12:49:45 AM GMT-04:00
@Damien Hawes Hi, sflr.
1. So enabling sc.setLogLevel does actually enable debug logs from Openlineage. I can see the events and everyting being logged if I save it as a parquet format instead of delta.
2. I do want to emit events to the url. But, I would like to just see what exactly are the events being emitted for some specific jobs, since I see that the lineage is incorrect for some MergeInto cases
Anirudh ShrinivasonThursday, October 26th, 2023 at 4:56:50 AM GMT-04:00
Hi @Damien Hawes would like to check again on whether you'd have any thoughts about this... Thanks! 🙂
Rodrigo MaiaTuesday, October 17th, 2023 at 3:17:57 AM GMT-04:00
Hello All 👋!
We are currently trying to work the the spark integration for OpenLineage in our Databricks instance. The general setup is done and working with a few hicups here and there.
But one thing we are still struggling is how to link all spark jobs events with a Databricks job or a notebook run.
We´ve recently noticed that some of the events produced by OL have the "environment-properties" attribute with information (for our context) regarding notebook path (if it is a notebook run), or the the job run ID (if its a databricks job run). But the thing is that these attributes are not always present.
I ran some samples yesterday for a job with 4 notebook tasks. From all 20 json payload sent by the OL listener, only 3 presented the "environment-properties" attribute. Its not only happening with Databricks jobs. When i run single notebooks and each cell has its onw set of spark jobs, not all json events presented that property either.

So my question is what is the criteria to have this attributes present or not in the event json file? Or maybe this in an issue? @Jason Yip did you find out anything about this?

⚙️ Spark 3.4 / OL-Spark 1.4.1
Paweł LeszczyńskiTuesday, October 17th, 2023 at 6:55:47 AM GMT-04:00
In general, we assume that OL events per run are cumulative. So, if you have 20 events with the same runId , then even if a single event contains some facet, we consider this is OK and let the backend combine it together. That's what we do in Marquez project (a reference backend architecture for OL) and that's why it is worth to use in Marquez as a rest API.

Are you able to use job namespace to aggregate all the Spark actions run within the databricks notebook? This is something that should serve this purpose.
Jason YipTuesday, October 17th, 2023 at 12:48:33 PM GMT-04:00
@Rodrigo Maia for Spark 3.4 I don't see the environment-properties showing up at all, but if you run the code as it is, register a listener on SparkListenerJobStart and get the properties, all of those properties will show up. There's an event filter that filters out the SparkListenerJobStart, I suspect that filtered out the "unneccessary" events.. was trying to do a custom build to do that, but still trying to setup Hadoop and Spark on my local
Rodrigo MaiaWednesday, October 18th, 2023 at 5:23:16 AM GMT-04:00
@Paweł Leszczyński you are right. This is what we are doing as well, combining events with the same runId to process the information on our backend. But even so, there are several runIds without this information. I went through these events to have a better view of what was happening. As you can see from 7 runIds, only 3 were showing the "environment-properties" attribute. Some condition is not being met here, or maybe it is what @Jason Yip suspects and there's some sort of filtering of unnecessary events
Paweł LeszczyńskiThursday, October 19th, 2023 at 2:28:03 AM GMT-04:00
@Rodrigo Maia, If you are able to provide a small Spark script such that none of the OL events contain the environment-properties, but at least one should, please raise an issue for this.
Paweł LeszczyńskiThursday, October 19th, 2023 at 2:29:11 AM GMT-04:00
It's extremely helpful when community open issues that are not only described well, but also contain small piece of code needed to reproduce this.
Rodrigo MaiaThursday, October 19th, 2023 at 2:59:39 AM GMT-04:00
I know. that's the goal. that is why I wanted to understand in the first place if there was any condition preventing this from happening, but now i get that this is not expected behaviour.
👍1
Jason YipThursday, October 19th, 2023 at 1:44:00 PM GMT-04:00
Jason YipThursday, October 19th, 2023 at 2:49:03 PM GMT-04:00
Please note that I am getting the same behavior, no code is needed, Spark 3.4+ won't be generating no matter what. I have been testing the same code for 2 months from this issue: https://github.com/OpenLineage/OpenLineage/issues/2124

I tried the code without OL and it worked perfectly, so it is OL filtering out the event for sure. I will try posting the code I use to collect the properties.
Jason YipThursday, October 19th, 2023 at 11:46:17 PM GMT-04:00
this code proves that the prosperities are still there, somehow got filtered out by OL:

%scala
-import org.apache.spark.scheduler._
-
-class JobStartListener extends SparkListener {
-  override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
-    // Extract properties here
-    val jobId = jobStart.jobId
-    val stageInfos = jobStart.stageInfos
-    val properties = jobStart.properties
-
-    // You can print properties or save them somewhere
-    println(s"JobId: $jobId, Stages: ${stageInfos.size}, Properties: $properties")
-  }
-}
-
-val listener = new JobStartListener()
-spark.sparkContext.addSparkListener(listener)
-
-val df = spark.range(1000).repartition(10)
-df.count()
Jason YipThursday, October 19th, 2023 at 11:55:05 PM GMT-04:00
Rodrigo MaiaMonday, October 30th, 2023 at 4:46:16 AM GMT-04:00
Any ideas on how could i test it?
ankit jainTuesday, October 17th, 2023 at 10:57:03 PM GMT-04:00
Hello All, I am completely new for Openlineage, I have to setup the lab to conduct POC on various aspects like Lineage, metadata management , etc. As per openlineage site, i tried downloading Ubuntu, docker and binary files for Marquez. But I am lost somewhere and unable to configure whole setup. Can someone please assist in steps to start from scratch so that i can delve into the Openlineage capabilities. Many thanks
Jakub DardzińskiWednesday, October 18th, 2023 at 1:32:01 AM GMT-04:00
hey, did you try to follow one of these guides?
https://openlineage.io/docs/guides/about
Michael RobinsonWednesday, October 18th, 2023 at 9:14:08 AM GMT-04:00
Which guide were you using, and what errors/issues are you encountering?
ankit jainSaturday, October 21st, 2023 at 3:43:14 PM GMT-04:00
Thanks Jakub for the response.
ankit jainSaturday, October 21st, 2023 at 3:45:42 PM GMT-04:00
In docker, marquez-api image is not running and exiting with the exit code 127.
Michael RobinsonSunday, October 22nd, 2023 at 9:34:53 AM GMT-04:00
@ankit jain thanks. I don't recognize 127, but 9 times out of 10 if the API or DB container fails the reason is a port conflict. Have you checked if port 5000 is available?
Jakub DardzińskiSunday, October 22nd, 2023 at 9:54:10 AM GMT-04:00
could you please check what’s the output of
git config --get core.autocrlf

or
git config --global --get core.autocrlf

?
ankit jainTuesday, October 24th, 2023 at 8:09:14 AM GMT-04:00
@Michael Robinson thanks , I checked the port 5000 is not available.
I tried deleting docker images and recreating them, but still the same issue persist stating
/Usr/bin/env bash/r not found.
Gradle build is successful.
ankit jainTuesday, October 24th, 2023 at 8:09:54 AM GMT-04:00
@Jakub Dardziński thanks, first command resulted as true and second command has no response
Jakub DardzińskiTuesday, October 24th, 2023 at 8:15:57 AM GMT-04:00
are you running docker and git in Windows or Mac OS before 10.0?
Matthew ParasThursday, October 19th, 2023 at 3:00:42 PM GMT-04:00
Hey all - we've been noticing that some events go unreported by openlineage (spark) when the AsyncEventQueue fills up and starts dropping events. Wondering if anyone has experienced this before, and knows why it is happening? We've expanded the event queue capacity and thrown more hardware at the problem but no dice

Also as a note, the query plans from this job are pretty big - could the listener just be choking up? Happy to open a github issue as well if we suspect that it could be the listener itself having issues
Anirudh ShrinivasonFriday, October 20th, 2023 at 2:57:50 AM GMT-04:00
Hi, just checking, are you excluding the sparkPlan from the events? Or is it sending the spark plan too
Maciej ObuchowskiMonday, October 23rd, 2023 at 11:59:40 AM GMT-04:00
yeah - setting spark.openlineage.facets.disabled to [spark_unknown;spark.logicalPlan] should help
Matthew ParasTuesday, October 24th, 2023 at 5:50:26 PM GMT-04:00
sorry for the late reply - turns out this job is just whack 😄 we were going in circles trying to figure it out, we end up dropping events without open lineage enabled at all. But good to know that disabling the logical plan should speed us up if we run into this again
savanFriday, October 20th, 2023 at 8:31:45 AM GMT-04:00
@GitHubOpenLineageIssues
I am trying to contribute to Integration tests which is listed here as good first issue
the CONTRIBUTING.md mentions that i can trigger CI for integration tests from forked branch.
using this tool.
but i am unable to do so, is there a way to trigger CI from forked brach or do i have to get permission from someone to run the CI?

i am getting this error when i run this command sudo git-push-fork-to-upstream-branch upstream savannavalgi:hacktober
>
Username for '<https://github.com>': savannavalgi
-&gt; Password for '<https://savannavalgi@github.com>': 
-&gt; remote: Permission to OpenLineage/OpenLineage.git denied to savannavalgi.
-&gt; fatal: unable to access '<https://github.com/OpenLineage/OpenLineage.git/>': The requested URL returned error: 403

i have tried to configure ssh key
also tried to trigger CI from another brach,
and tried all of this after fetching the latest upstream

cc: @Athitya Kumar @Maciej Obuchowski @U05HD9G5T17
praveen kanamarlapudiFriday, October 20th, 2023 at 6:18:37 PM GMT-04:00
Hi,

We are using openlineage spark connector. We have used spark 3.2 and scala 2.12 so far. We have triggered a new job with Spark 3.4 and scala 2.13 and faced below exception.


java.lang.NoSuchMethodError: 'scala.collection.Seq org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.map(scala.Function1)'
-    at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.lambda$buildInputDatasets$6(OpenLineageRunEventBuilder.java:341)
-    at java.base/java.util.Optional.map(Optional.java:265)
-    at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildInputDatasets(OpenLineageRunEventBuilder.java:339)
-    at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.populateRun(OpenLineageRunEventBuilder.java:295)
-    at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:279)
-    at io.openlineage.spark.agent.lifecycle.OpenLineageRunEventBuilder.buildRun(OpenLineageRunEventBuilder.java:222)
-    at io.openlineage.spark.agent.lifecycle.SparkSQLExecutionContext.start(SparkSQLExecutionContext.java:72)
-    at io.openlineage.spark.agent.OpenLineageSparkListener.lambda$sparkSQLExecStart$0(OpenLineageSparkListener.java:91)
Paweł LeszczyńskiMonday, October 23rd, 2023 at 4:56:25 AM GMT-04:00
Hmy, that is interesting. Did it occur on databricks runtime? Could you give it a try with Scala 2.12? I think we don't test scala 2.13.
praveen kanamarlapudiMonday, October 23rd, 2023 at 12:02:13 PM GMT-04:00
I believe our Scala 2.12 jobs are working fine. It's not databricks runtime. We run Spark on Kube.
Paweł LeszczyńskiTuesday, October 24th, 2023 at 6:47:14 AM GMT-04:00
Ok. I think You can raise an issue to support Scala 2.13 for latest Spark versions.
priya narayanaThursday, October 26th, 2023 at 6:13:40 AM GMT-04:00
Hi I want to customise the events which comes from Openlineage spark . Can some one give some information
Paweł LeszczyńskiThursday, October 26th, 2023 at 7:45:41 AM GMT-04:00
Hi @priya narayana, please get familiar with Extending section on our docs: https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark#extending
priya narayanaThursday, October 26th, 2023 at 9:53:07 AM GMT-04:00
Okay thank you. Just checking any other docs or git code which also can help me
harsh loombaThursday, October 26th, 2023 at 1:11:17 PM GMT-04:00
Hello Team
harsh loombaThursday, October 26th, 2023 at 1:12:38 PM GMT-04:00
Im upgrading the version from openlineage-airflow==0.24.0 to openlineage-airflow 1.4.1 but im seeing the following error, any help is appreciated
harsh loombaThursday, October 26th, 2023 at 1:14:02 PM GMT-04:00
@Jakub Dardziński any thoughts?
Jakub DardzińskiThursday, October 26th, 2023 at 1:14:24 PM GMT-04:00
what version of Airflow are you using?
harsh loombaThursday, October 26th, 2023 at 1:14:52 PM GMT-04:00
2.6.3 that satisfies the requirement
Jakub DardzińskiThursday, October 26th, 2023 at 1:16:38 PM GMT-04:00
is it possible you have some custom operator?
harsh loombaThursday, October 26th, 2023 at 1:17:15 PM GMT-04:00
i think its the base operator causing the issue
harsh loombaThursday, October 26th, 2023 at 1:17:36 PM GMT-04:00
so no i believe
Jakub DardzińskiThursday, October 26th, 2023 at 1:18:43 PM GMT-04:00
BaseOperator is parent class for any other operators, it defines how to do deepcopy
harsh loombaThursday, October 26th, 2023 at 1:19:11 PM GMT-04:00
yeah so its controlled by Airflow itself, I didnt customize it
Jakub DardzińskiThursday, October 26th, 2023 at 1:19:49 PM GMT-04:00
uhm, maybe it's possible you could share dag code? you may hide sensitive data
harsh loombaThursday, October 26th, 2023 at 1:21:23 PM GMT-04:00
let me try with lower versions of openlineage, what's say
harsh loombaThursday, October 26th, 2023 at 1:21:39 PM GMT-04:00
its a big jump from 0.24.0 to 1.4.1
harsh loombaThursday, October 26th, 2023 at 1:22:25 PM GMT-04:00
but i will help here to investigate this issue
Jakub DardzińskiThursday, October 26th, 2023 at 1:24:03 PM GMT-04:00
for me it seems that within dag or task you're defining some object that is not easy to copy
harsh loombaThursday, October 26th, 2023 at 1:26:05 PM GMT-04:00
possible, but with 0.24.0 that issue is not occurring, so worry is that the version upgrade could potentially break things
Jakub DardzińskiThursday, October 26th, 2023 at 1:39:34 PM GMT-04:00
0.24.0 is not that old 🤔
harsh loombaThursday, October 26th, 2023 at 1:45:07 PM GMT-04:00
i see the issue with 0.24.0 I see it as warning
[airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/threading.py", line 932, in _bootstrap_inner
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     self.run()
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/threading.py", line 870, in run
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     self._target(*self._args, **self._kwargs)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/home/upgrade/.local/lib/python3.8/site-packages/openlineage/airflow/listener.py", line 89, in on_running
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     task_instance_copy = copy.deepcopy(task_instance)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 172, in deepcopy
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y = _reconstruct(x, memo, *rv)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 270, in _reconstruct
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     state = deepcopy(state, memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y = copier(x, memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y[deepcopy(key, memo)] = deepcopy(value, memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 172, in deepcopy
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y = _reconstruct(x, memo, *rv)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 270, in _reconstruct
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     state = deepcopy(state, memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y = copier(x, memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y[deepcopy(key, memo)] = deepcopy(value, memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 153, in deepcopy
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y = copier(memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/home/upgrade/.local/lib/python3.8/site-packages/airflow/models/dag.py", line 2162, in __deepcopy__
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     setattr(result, k, copy.deepcopy(v, memo))
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y = copier(x, memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y[deepcopy(key, memo)] = deepcopy(value, memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 153, in deepcopy
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y = copier(memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/home/upgrade/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 1224, in __deepcopy__
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     setattr(result, k, copy.deepcopy(v, memo))
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 172, in deepcopy
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y = _reconstruct(x, memo, *rv)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 270, in _reconstruct
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     state = deepcopy(state, memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y = copier(x, memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y[deepcopy(key, memo)] = deepcopy(value, memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y = copier(x, memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y[deepcopy(key, memo)] = deepcopy(value, memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 153, in deepcopy
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y = copier(memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/home/upgrade/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 1224, in __deepcopy__
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     setattr(result, k, copy.deepcopy(v, memo))
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 146, in deepcopy
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y = copier(x, memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 230, in _deepcopy_dict
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     y[deepcopy(key, memo)] = deepcopy(value, memo)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -   File "/usr/lib64/python3.8/copy.py", line 161, in deepcopy
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING -     rv = reductor(4)
-[2023-10-26, 17:40:50 UTC] [airflow/utils/log/logging_mixin.py::_propagate_log()::150] WARNING - TypeError: cannot pickle 'module' object

but with 1.4.1 its stopped processing any further and threw error
harsh loombaThursday, October 26th, 2023 at 2:18:08 PM GMT-04:00
I see the difference of calling in these 2 versions, current versions checks if Airflow is >2.6 then directly runs on_running but earlier version was running on separate thread. IS this what's raising this exception?
harsh loombaThursday, October 26th, 2023 at 2:24:49 PM GMT-04:00
harsh loombaThursday, October 26th, 2023 at 2:25:21 PM GMT-04:00
since we are directly running if version>2.6.0 therefore its throwing error in main processing
harsh loombaThursday, October 26th, 2023 at 2:28:02 PM GMT-04:00
may i know which Airflow version we tested this process?
harsh loombaThursday, October 26th, 2023 at 2:28:39 PM GMT-04:00
im on 2.6.3
Jakub DardzińskiThursday, October 26th, 2023 at 2:30:53 PM GMT-04:00
2.1.4, 2.2.4, 2.3.4, 2.4.3, 2.5.2, 2.6.1
usually there are not too many changes between minor versions

I still believe it might be some code you might improve and probably is also an antipattern in airflow
harsh loombaThursday, October 26th, 2023 at 2:34:26 PM GMT-04:00
hummm...that's a valid observation but I dont write DAGS, other teams do, so imagine if many people wrote such DAGS I can't ask everyone to change their patterns right? If something is running on current openlineage version with warning that should still be running on upgraded version isn't it?
harsh loombaThursday, October 26th, 2023 at 2:38:04 PM GMT-04:00
however I see ur point
harsh loombaThursday, October 26th, 2023 at 2:49:52 PM GMT-04:00
So that specific task has 570 line of query and pretty bulky query, let me split into smaller units
harsh loombaThursday, October 26th, 2023 at 2:50:15 PM GMT-04:00
that should help right? @Jakub Dardziński
Jakub DardzińskiThursday, October 26th, 2023 at 2:51:27 PM GMT-04:00
query length shouldn’t be the issue, rather any python code
Jakub DardzińskiThursday, October 26th, 2023 at 2:51:50 PM GMT-04:00
I get your point too, we might figure out some mechanism to skip irrelevant parts of task instance so that it doesn’t fail then
harsh loombaThursday, October 26th, 2023 at 2:52:12 PM GMT-04:00
actually its failing on that task itself
harsh loombaThursday, October 26th, 2023 at 2:52:33 PM GMT-04:00
let me try it will be pretty quick
harsh loombaThursday, October 26th, 2023 at 2:58:58 PM GMT-04:00
@Jakub Dardziński but ur right we have to fix this at Openlineage side as well. Because ideally Openlineage shouldn't be causing any issue to the main DAG processing
Jakub DardzińskiThursday, October 26th, 2023 at 5:51:05 PM GMT-04:00
it doesn’t break any airflow functionality, execution is wrapped into try/except block, only exception traceback is logged as you can see
Maciej ObuchowskiFriday, October 27th, 2023 at 5:25:54 AM GMT-04:00
Can you migrate to Airflow 2.7 and use apache-airflow-providers-openlineage? Ideally we wouldn't make meaningful changes to openlineage-airflow
harsh loombaFriday, October 27th, 2023 at 11:35:44 AM GMT-04:00
yup thats what im planning to do
harsh loombaFriday, October 27th, 2023 at 1:59:03 PM GMT-04:00
referencing to this conversation - what it takes to move to openlineage provider package from openlineage-airflow. Im updating Airflow to 2.7.2 but moving off of openlineage-airflow to provider package Im trying to estimate the amount of work it takes, any thoughts? reading change_logs I dont think its too much of a change but please share your thoughts and if somewhere its drafted please do share that as well
Maciej ObuchowskiMonday, October 30th, 2023 at 8:21:10 AM GMT-04:00
Generally not much - I would maybe think of a operator coverage. For example, for BigQuery old openlineage-airflow supports BigQueryExecuteQueryOperator. However, new apache-airflow-providers-openlineage supports BigQueryInsertJobOperator - because it's intended replacement for BigQueryExecuteQueryOperator and Airflow community does not want to accept contributions to deprecated operators.
🙏1
harsh loombaTuesday, October 31st, 2023 at 3:00:38 PM GMT-04:00
one question if someone is around - when im keeping both openlineage-airflow and apache-airflow-providers-openlineage in my requirement file, i see the following error -
    from openlineage.airflow.extractors import Extractors
-ModuleNotFoundError: No module named 'openlineage.airflow'

any thoughts?
John LukenoffTuesday, October 31st, 2023 at 3:37:07 PM GMT-04:00
I would usually do a pip freeze | grep openlineage as a sanity check to validate that the module is actually installed. Not sure how the provider and the module play together though
harsh loombaTuesday, October 31st, 2023 at 5:07:41 PM GMT-04:00
yeah so @John Lukenoff im not getting how i can use the specific extractor when i run my operator. Say for example, I have custom datawarehouseOperator and i want to override get_openlineage_facets_on_start and get_openlineage_facets_on_complete using the redshift extractor then how would i do that?
Rodrigo MaiaFriday, October 27th, 2023 at 5:49:25 AM GMT-04:00
Spark Integration Logs
Hey There
Are these events skipped because it's not supported or it's configured somewhere?
23/10/27 08:25:58 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart
23/10/27 08:25:58 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd
HiteshFriday, October 27th, 2023 at 8:12:32 AM GMT-04:00
Hi People, actually I want to intercept the OpenLineage spark events right after the job ends and before they are emitted, so that I can add some extra information to the events or remove some information that I don't want.
Is there any way of doing this? Can someone please help me
Michael RobinsonMonday, October 30th, 2023 at 9:03:57 AM GMT-04:00
It general, I think this kind of use case is probably best served by facets, but what do you think @Paweł Leszczyński?
harsh loombaFriday, October 27th, 2023 at 1:59:03 PM GMT-04:00
referencing to this conversation - what it takes to move to openlineage provider package from openlineage-airflow. Im updating Airflow to 2.7.2 but moving off of openlineage-airflow to provider package Im trying to estimate the amount of work it takes, any thoughts? reading change_logs I dont think its too much of a change but please share your thoughts and if somewhere its drafted please do share that as well
KavithaFriday, October 27th, 2023 at 5:01:12 PM GMT-04:00
Hello, has anyone run into similar error as posted in this github open issues[https://github.com/MarquezProject/marquez/issues/2468] while setting up marquez on an EC2 Instance, would appreciate any help to get past the errors
Willy LulciucFriday, October 27th, 2023 at 5:04:30 PM GMT-04:00
Hmm, have you looked over our Running on AWS docs?
Willy LulciucFriday, October 27th, 2023 at 5:06:08 PM GMT-04:00
More specifically, the AWS RDS section. How are you deploying Marquez on Ec2?
KavithaFriday, October 27th, 2023 at 5:08:05 PM GMT-04:00
we were primarily referencing this document on git - https://github.com/MarquezProject/marquez
KavithaFriday, October 27th, 2023 at 5:09:05 PM GMT-04:00
leveraged docker and docker-compose
Willy LulciucFriday, October 27th, 2023 at 5:13:10 PM GMT-04:00
hmm so you’re running docker-compose up on an Ec2 instance you’ve ssh’d into? (just trying to understand your setup better)
KavithaFriday, October 27th, 2023 at 5:13:26 PM GMT-04:00
yes, thats correct
Willy LulciucFriday, October 27th, 2023 at 5:16:39 PM GMT-04:00
I’ve only used docker compose for local dev or integration tests. but, ok you’re probably in the PoC phase. Can you run the docker cmd on you local machine successfully? What OS is stalled on the Ec2 instance?
KavithaFriday, October 27th, 2023 at 5:18:00 PM GMT-04:00
yes, i can run and the OS is Ubuntu 20.04.6 LTS
KavithaFriday, October 27th, 2023 at 5:19:27 PM GMT-04:00
we initiallly ran into a permission denied error related to postgressql.conf file and we had to update file permissions to 777 and after which we started to see below errors
KavithaFriday, October 27th, 2023 at 5:19:36 PM GMT-04:00
marquez-db | 2023-10-27 20:35:52.512 GMT [35] FATAL: no pg_hba.conf entry for host "172.18.0.5", user "marquez", database "marquez", no encryption
marquez-db | 2023-10-27 20:35:52.529 GMT [36] FATAL: no pg_hba.conf entry for host "172.18.0.5", user "marquez", database "marquez", no encryption
KavithaFriday, October 27th, 2023 at 5:20:12 PM GMT-04:00
we then manually updated pg_hba.conf file to include host user and db details
Willy LulciucFriday, October 27th, 2023 at 5:20:42 PM GMT-04:00
Did you also update the marquez.yml with the db user / password?
KavithaFriday, October 27th, 2023 at 5:20:48 PM GMT-04:00
after which we started to see the errors posted in the github open issues page
Willy LulciucFriday, October 27th, 2023 at 5:21:33 PM GMT-04:00
hmm are you using an external database or are you spinning up the entire Marquez stack with docker compose?
KavithaFriday, October 27th, 2023 at 5:21:56 PM GMT-04:00
we are spinning up the entire Marquez stack with docker compose
KavithaFriday, October 27th, 2023 at 5:23:24 PM GMT-04:00
we did not change anything in the marquez.yml, i think we did not find that file in the github repo that we cloned into our local instance
Willy LulciucFriday, October 27th, 2023 at 5:26:31 PM GMT-04:00
It’s important that the init-db.sh script runs, but I don’t think it is
Willy LulciucFriday, October 27th, 2023 at 5:26:56 PM GMT-04:00
can you grab all the docker compose logs and share them? it’s hard to debug otherwise
KavithaFriday, October 27th, 2023 at 5:29:59 PM GMT-04:00
Willy LulciucFriday, October 27th, 2023 at 5:33:15 PM GMT-04:00
I would first suggest to remove the --build flag since you are specifying a version of Marquez to use via --tag
Willy LulciucFriday, October 27th, 2023 at 5:33:49 PM GMT-04:00
no the issue per se, but will help clear up some of the logs
KavithaFriday, October 27th, 2023 at 5:35:06 PM GMT-04:00
for sure thanks. we could get the logs without the --build portion, we tried with that option just once
KavithaFriday, October 27th, 2023 at 5:35:40 PM GMT-04:00
the errors were the same with/without --build option
KavithaFriday, October 27th, 2023 at 5:36:02 PM GMT-04:00
marquez-api | ERROR [2023-10-27 21:34:58,019] org.apache.tomcat.jdbc.pool.ConnectionPool: Unable to create initial connections of pool.
marquez-api | ! org.postgresql.util.PSQLException: FATAL: password authentication failed for user "marquez"
marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:693)
marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:203)
marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258)
marquez-api | ! at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54)
marquez-api | ! at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:253)
marquez-api | ! at org.postgresql.Driver.makeConnection(Driver.java:434)
marquez-api | ! at org.postgresql.Driver.connect(Driver.java:291)
marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:346)
marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:227)
marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:768)
marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:696)
marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.init(ConnectionPool.java:495)
marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.<init>(ConnectionPool.java:153)
marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.pCreatePool(DataSourceProxy.java:118)
marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.createPool(DataSourceProxy.java:107)
marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:131)
marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcUtils.openConnection(JdbcUtils.java:48)
marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcConnectionFactory.<init>(JdbcConnectionFactory.java:75)
marquez-api | ! at org.flywaydb.core.FlywayExecutor.execute(FlywayExecutor.java:147)
marquez-api | ! at org.flywaydb.core.Flyway.info(Flyway.java:190)
marquez-api | ! at marquez.db.DbMigration.hasPendingDbMigrations(DbMigration.java:73)
marquez-api | ! at marquez.db.DbMigration.migrateDbOrError(DbMigration.java:27)
marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:105)
marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:48)
marquez-api | ! at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:67)
marquez-api | ! at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:98)
marquez-api | ! at io.dropwizard.cli.Cli.run(Cli.java:78)
marquez-api | ! at io.dropwizard.Application.run(Application.java:94)
marquez-api | ! at marquez.MarquezApp.main(MarquezApp.java:60)
marquez-api | INFO [2023-10-27 21:34:58,024] marquez.MarquezApp: Stopping app...
Willy LulciucFriday, October 27th, 2023 at 5:38:52 PM GMT-04:00
debugging docker issues like this is so difficult
Willy LulciucFriday, October 27th, 2023 at 5:40:44 PM GMT-04:00
it could be a number of things, but you are connected to the database it’s just that the marquez user hasn’t been created
Willy LulciucFriday, October 27th, 2023 at 5:41:59 PM GMT-04:00
the /init-db.sh is what manages user creation
Willy LulciucFriday, October 27th, 2023 at 5:42:17 PM GMT-04:00
so it’s possible that the script isn’t running for whatever reason on your Ec2 instance
Willy LulciucFriday, October 27th, 2023 at 5:44:20 PM GMT-04:00
do you have other services running on that Ec2 instance? Like, other than Marquez
Willy LulciucFriday, October 27th, 2023 at 5:44:52 PM GMT-04:00
is there a postgres process running outside of docker?
KavithaFriday, October 27th, 2023 at 8:34:50 PM GMT-04:00
no other services except marquez on this EC2 instance
KavithaFriday, October 27th, 2023 at 8:35:49 PM GMT-04:00
this was a new Ec2 instance that was spun up to install and use marquez
KavithaFriday, October 27th, 2023 at 8:36:09 PM GMT-04:00
n we can confirm that no postgres process runs outside of docker
Jason YipSunday, October 29th, 2023 at 3:06:28 AM GMT-04:00
I realize in Spark 3.4+, some job ids don't have a start event. What part of the code is responsible for triggering the START and COMPLETE event
Paweł LeszczyńskiMonday, October 30th, 2023 at 9:59:53 AM GMT-04:00
hi @Jason Yip could you provide an example of such a job?
Jason YipMonday, October 30th, 2023 at 4:51:55 PM GMT-04:00
@Paweł Leszczyński same old:

# delete the old table if needed
_ = spark.sql('DROP TABLE IF EXISTS transactions')

# expected structure of the file
transactions_schema = StructType([
StructField('household_id', IntegerType()),
StructField('basket_id', LongType()),
StructField('day', IntegerType()),
StructField('product_id', IntegerType()),
StructField('quantity', IntegerType()),
StructField('sales_amount', FloatType()),
StructField('store_id', IntegerType()),
StructField('discount_amount', FloatType()),
StructField('transaction_time', IntegerType()),
StructField('week_no', IntegerType()),
StructField('coupon_discount', FloatType()),
StructField('coupon_discount_match', FloatType())
])

# read data to dataframe
df = (spark
.read
.csv(
adlsRootPath + '/examples/data/csv/completejourney/transaction_data.csv',
header=True,
schema=transactions_schema))

df.write\
.format('delta')\
.mode('overwrite')\
.option('overwriteSchema', 'true')\
.option('path', adlsRootPath + '/examples/data/csv/completejourney/silver/transactions')\
.saveAsTable('transactions')

df.count()

# # create table object to make delta lake queryable
# _ = spark.sql(f'''
# CREATE TABLE transactions
# USING DELTA
# LOCATION '{adlsRootPath}/examples/data/csv/completejourney/silver/transactions'
# ''')

# show data
display(
spark.table('transactions')
)
John LukenoffMonday, October 30th, 2023 at 6:51:43 PM GMT-04:00
👋 Hi team, cross-posting from the Marquez Channel in case anyone here has a better idea of the spec

> For most of our lineage extractors in airflow, we are using the rust sql parser from openlineage-sql to extract table lineage via sql statements. When errors occur we are adding an extractionError run facet similar to what is being done here. I’m finding in the case that multiple statements were extracted but one failed to parse while many others were successful, the lineage for these runs doesn’t appear as expected in Marquez. Is there any logic around the extractionError run facet that could be causing this? It seems reasonable to assume that we might take this to mean the entire run event is invalid if we have any extraction errors.
>
> I would still expect to see the other lineage we sent for the run but am instead just seeing the extractionError in the marquez UI, in the database, runs with an extractionError facet don’t seem to make it to the job_versions_io_mapping table
Maciej ObuchowskiTuesday, October 31st, 2023 at 6:34:05 AM GMT-04:00
Can you show the actual event? Should be in the events tab in Marquez
KavithaTuesday, October 31st, 2023 at 11:59:07 AM GMT-04:00
@John Lukenoff, would you mind posting the link to Marquez teams slack channel?
John LukenoffTuesday, October 31st, 2023 at 12:15:37 PM GMT-04:00
yep here is the link: https://marquezproject.slack.com/archives/C01E8MQGJP7/p1698702140709439

This is the full event, sanitized of internal info:
{
-  "job": {
-    "name": "some_dag.some_task",
-    "facets": {},
-    "namespace": "default"
-  },
-  "run": {
-    "runId": "a9565df2-f1a1-3ee3-b202-7626f8c4b92d",
-    "facets": {
-      "extractionError": {
-        "errors": [
-          {
-            "task": "ALTER SESSION UNSET QUERY_TAG;",
-            "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.24.0/client/python>",
-            "_schemaURL": "<https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet>",
-            "taskNumber": 0,
-            "errorMessage": "Expected one of TABLE or INDEX, found: SESSION"
-          }
-        ],
-        "_producer": "<https://github.com/OpenLineage/OpenLineage/tree/0.24.0/client/python>",
-        "_schemaURL": "<https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ExtractionErrorRunFacet>",
-        "totalTasks": 1,
-        "failedTasks": 1
-      }
-    }
-  },
-  "inputs": [
-    {
-      "name": "foo.bar",
-      "facets": {},
-      "namespace": "snowflake"
-    },
-    {
-      "name": "fizz.buzz",
-      "facets": {},
-      "namespace": "snowflake"
-    }
-  ],
-  "outputs": [
-    { "name": "foo1.bar2", "facets": {}, "namespace": "snowflake" },
-    {
-      "name": "fizz1.buzz2",
-      "facets": {},
-      "namespace": "snowflake"
-    }
-  ],
-  "producer": "<https://github.com/MyCompany/repo/blob/next-master/company/data/pipelines/airflow_utils/openlineage_utils/client.py>",
-  "eventTime": "2023-10-30T02:46:13.367274Z",
-  "eventType": "COMPLETE"
-}
KavithaTuesday, October 31st, 2023 at 12:43:07 PM GMT-04:00
thank you!
KavithaTuesday, October 31st, 2023 at 1:14:29 PM GMT-04:00
@John Lukenoff, sorry to trouble again, is the slack channel still active? for whatever reason i cant get to this workspace
John LukenoffTuesday, October 31st, 2023 at 1:15:26 PM GMT-04:00
yep it’s still active, maybe you need to join the workspace first? https://join.slack.com/t/marquezproject/shared_invite/zt-266fdhg9g-TE7e0p~EHK50GJMMqNH4tg
KavithaTuesday, October 31st, 2023 at 1:25:51 PM GMT-04:00
that was a good call. the link you just shared worked! thank you!
Maciej ObuchowskiTuesday, October 31st, 2023 at 1:27:55 PM GMT-04:00
yeah from OL perspective this looks good - the inputs and outputs are there, the extraction error facet looks like it should
Maciej ObuchowskiTuesday, October 31st, 2023 at 1:28:05 PM GMT-04:00
must be some Marquez hiccup 🙂
👍1
John LukenoffTuesday, October 31st, 2023 at 1:28:45 PM GMT-04:00
Makes sense, I’ll tail my marquez logs today to see if I can find anything
John LukenoffWednesday, November 1st, 2023 at 7:37:06 PM GMT-04:00
Somehow this started working after we switched from our beta to prod infrastructure. I suspect something was failing due to constraints on the size of our db and the load of poor quality data it was under after months of testing against it
harsh loombaTuesday, October 31st, 2023 at 3:00:38 PM GMT-04:00
one question if someone is around - when im keeping both openlineage-airflow and apache-airflow-providers-openlineage in my requirement file, i see the following error -
    from openlineage.airflow.extractors import Extractors
-ModuleNotFoundError: No module named 'openlineage.airflow'

any thoughts?
Michael RobinsonWednesday, November 1st, 2023 at 11:34:43 AM GMT-04:00
@channel
I’m opening a vote to release OpenLineage 1.5.0, including:
• support for Cassandra Connectors lineage in the Flink integration
• support for Databricks Runtime 13.3 in the Spark integration
• support for rdd and toDF operations from the Spark Scala API in Spark
• lowered requirements for attrs and requests packages in the Airflow integration
• lazy rendering of yaml configs in the dbt integration
• bug fixes, tests, infra fixes, doc changes, and more.
Three +1s from committers will authorize an immediate release.
6
👍1
🚀2
Michael RobinsonThursday, November 2nd, 2023 at 5:11:58 AM GMT-04:00
Thanks, all. The release is authorized and will be initiated within 2 business days.
Michael RobinsonWednesday, November 1st, 2023 at 1:29:09 PM GMT-04:00
@channel
The October 2023 issue of OpenLineage News is available now! Sign up to get in directly in your inbox each month.
👍2
🎉1
John LukenoffWednesday, November 1st, 2023 at 7:40:39 PM GMT-04:00
Hi team 👋 , we’re finding that for our Spark jobs we are almost always getting some junk characters in our dataset names. We’ve pushed the regex filter to its limits and would like to extend the logic of deriving the dataset name in openlineage-spark (currently on 1.4.1). I seem to recall hearing we could do this by implementing our own LogicalPlanVisitor or something along those lines? Is that still the recommended approach and if so would this be possible to implement in Scala vs. Java (scala noob here :simple_smile:)
Paweł LeszczyńskiThursday, November 2nd, 2023 at 3:34:15 AM GMT-04:00
Hi John, we're always happy to help with the contribution.

One of the possible solutions to this would be to do that just in openlineage-java client:
• introduce config entry like normalizeDatasetNameToAscii : enabled/disabled
• modify DatasetIdentifier class to contain static member boolean normalizeDatasetNameToAscii and normalize dataset name according to this setting
• additionally, you would need to add config entry in io.openlineage.client.OpenLineageYaml and make sure both loadOpenLineageYaml methods set DatasetIdentifier.normalizeDatasetNameToAscii based on the config
• document this in the doc
So, no Scala nor custom logical plan visitors required.
Mike FangWednesday, November 1st, 2023 at 8:30:38 PM GMT-04:00
I am looking to send OpenLineage events to an AWS API Gateway endpoint from an AWS MWAA instance. The problem is that all requests to AWS services need to be signed with SigV4, and using API Gateway with IAM authentication would require requests to API Gateway be signed with SigV4. Would the best way to do so be to just modify the python client HTTP transport to include a new config option for signing emitted OpenLineage events with SigV4? Are there any alternatives?
Jakub DardzińskiThursday, November 2nd, 2023 at 2:41:50 AM GMT-04:00
there’s actually an issue for that:
https://github.com/OpenLineage/OpenLineage/issues/2189

but the way to do this is imho to create new custom transport (it might inherit from HTTP transport) and register it in transport factory
Mike FangThursday, November 2nd, 2023 at 1:05:05 PM GMT-04:00
I am thinking of just modifying the HTTP transport and using requests.auth.AuthBase to create different auth methods instead of a TokenProvider class

Classes which subclass requests.auth.AuthBase can also just directly be given to the requests call in the auth parameter
👍1
Jakub DardzińskiThursday, November 2nd, 2023 at 2:40:24 PM GMT-04:00
would you like to contribute? 🙂
Mike FangThursday, November 2nd, 2023 at 2:43:05 PM GMT-04:00
I was about to contribute, but I actually just realized that there is an existing way to provide a custom transport that would solve form y use case. My only question is how do I register this custom transport in my MWAA environment? Can I provide the custom transport as an Airflow plugin and then specify the class in the Openlineage.yml config? Will it automatically pick it up?
Jakub DardzińskiThursday, November 2nd, 2023 at 3:45:56 PM GMT-04:00
although I did not test this in MWAA but locally only: I’ve created Airflow plugin that in __init__.py has defined (or imported) following code:
from openlineage.client.transport import register_transport, Transport, Config
-
-
-@register_transport
-class FakeTransport(Transport):
-    kind = "fake"
-    config = Config
-
-    def __init__(self, config: Config) -> None:
-        print(config)
-
-    def emit(self, event) -> None:
-        print(event)

setting AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "fake"}' does take effect and I can see output in Airflow logs
Jakub DardzińskiThursday, November 2nd, 2023 at 3:47:45 PM GMT-04:00
in setup.py it’s:
    ...,
-    entry_points={
-        'airflow.plugins': [
-            'custom_transport = custom_transport:CustomTransportPlugin',
-        ],
-    },
-    install_requires=["openlineage-python"]
-)
Mike FangFriday, November 3rd, 2023 at 12:52:55 PM GMT-04:00
ok great thanks for following up on this, super helpful
Michael RobinsonThursday, November 2nd, 2023 at 12:00:00 PM GMT-04:00
👍5
🚀1
Jason YipThursday, November 2nd, 2023 at 2:49:18 PM GMT-04:00
@Paweł Leszczyński I tested 1.5.0, it works great now, but the environment facets is gone in START... which I very much want it.. any thoughts?
Jason YipFriday, November 3rd, 2023 at 4:18:11 AM GMT-04:00
actually, it shows up in one of the RUNNING now... behavior is consistent between 11.3 and 13.3, thanks for fixing this issue
👍1
Jason YipSaturday, November 4th, 2023 at 3:44:22 PM GMT-04:00
@Paweł Leszczyński looks like I need to bring bad news.. 13.3 is fixed for specific scenarios, but 11.3 is still reading output as dbfs.. there are scenarios that it's not producing input and output like:

create table table using delta as
location 'abfss://....'
Select * from parquet.`abfss://....'
Jason YipSaturday, November 4th, 2023 at 3:44:31 PM GMT-04:00
Will test more and ope issues
Rodrigo MaiaMonday, November 6th, 2023 at 5:34:33 AM GMT-05:00
@Jason Yiphow did you manage the get the environment attribute. it's not showing up to me at all. I've tried databricks abut also tried a local instance of spark.
Jason YipTuesday, November 7th, 2023 at 6:32:02 PM GMT-05:00
@Rodrigo Maia its showing up in one of the RUNNING events, not in the START event anymore
Rodrigo MaiaWednesday, November 8th, 2023 at 3:04:32 AM GMT-05:00
I never had a running event :melting_face: Am I filtering something?
Jason YipWednesday, November 8th, 2023 at 1:03:26 PM GMT-05:00
Umm.. ok show me your code, will try on my end
Jason YipWednesday, November 8th, 2023 at 2:26:06 PM GMT-05:00
@Paweł Leszczyński @Rodrigo Maia actually if you are using UC-enabled cluster, you won't get any RUNNING events
Michael RobinsonFriday, November 3rd, 2023 at 12:00:07 PM GMT-04:00
@channel
This month’s TSC meeting (open to all) is next Thursday the 9th at 10am PT. On the agenda:
• announcements
• recent releases
• recent additions to the Flink integration by @U05QA2D1XNV
• recent additions to the Spark integration by @Paweł Leszczyński
• updates on proposals by @Julien Le Dem
• discussion topics
• open discussion
More info and the meeting link can be found on the website. All are welcome! Do you have a discussion topic, use case or integration you’d like to demo? DM me to be added to the agenda.
👍1
priya narayanaSaturday, November 4th, 2023 at 7:08:10 AM GMT-04:00
Hi Team , we are trying to customize the events by writing custom lineage listener extending OpenLineageSparkListener, but would need some direction how to capture the events
Jakub DardzińskiSaturday, November 4th, 2023 at 7:11:46 AM GMT-04:00
priya narayanaSaturday, November 4th, 2023 at 7:13:47 AM GMT-04:00
yes
Jakub DardzińskiSaturday, November 4th, 2023 at 7:15:21 AM GMT-04:00
It seems pretty extensively described, what kind of help do you need?
priya narayanaSaturday, November 4th, 2023 at 7:16:13 AM GMT-04:00
io.openlineage.spark.api.OpenLineageEventHandlerFactory if i use this how will i pass custom listener to my spark submit
priya narayanaSaturday, November 4th, 2023 at 7:17:25 AM GMT-04:00
I would like to know how will i customize my events using this . For example: - In "input" Facet i want only symlinks name i am not intereseted in anything else
priya narayanaSaturday, November 4th, 2023 at 7:17:32 AM GMT-04:00
can you please provide some guidance
priya narayanaSaturday, November 4th, 2023 at 7:18:36 AM GMT-04:00
@Jakub Dardziński this is the doubt i have
priya narayanaSaturday, November 4th, 2023 at 8:17:25 AM GMT-04:00
Some one who did spark integration throw some light
Jakub DardzińskiSaturday, November 4th, 2023 at 8:21:22 AM GMT-04:00
it's weekend for most of us so you probably need to wait until Monday for precise answers
David GossMonday, November 6th, 2023 at 4:03:42 AM GMT-05:00
👋 I raised a PR https://github.com/OpenLineage/OpenLineage/pull/2223 off the back of some Marquez conversations a while back to try and clarify how names of Snowflake objects should be expressed in OL events. I used Snowflake’s OL view as a guide, but also I appreciate there are other OL producers that involve Snowflake too (Airflow? dbt?). Any feedback on this would be appreciated!
David GossWednesday, November 8th, 2023 at 10:42:35 AM GMT-05:00
Thanks for merging this @Maciej Obuchowski!
👍1
Athitya KumarMonday, November 6th, 2023 at 5:22:03 AM GMT-05:00
Hey team! 👋

We're trying to use openlineage-flink, and would like provide the openlineage.transport.type=http and configure other transport configs, but we're not able to find sufficient docs (tried this doc) on where/how these configs can be provided.

For example, in spark, the changes mostly were delegated to the spark-submit command like
spark-submit --conf "spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener" \
-    --packages "io.openlineage:openlineage-spark:&lt;spark-openlineage-version&gt;" \
-    --conf "spark.openlineage.transport.url=http://{openlineage.client.host}/api/v1/namespaces/spark_integration/" \
-    --class com.mycompany.MySparkApp my_application.jar

And the OpenLineageSparkListener has a method to retrieve the provided spark confs as an object in the ArgumentParser. Similarly, looking for some pointers on how the openlineage.transport configs can be provided to OpenLineageFlinkJobListener & how the flink listener parses/uses these configs

TIA! 😄
Maciej ObuchowskiTuesday, November 7th, 2023 at 5:56:09 AM GMT-05:00
similarly to spark config, you can use flink config
Athitya KumarTuesday, November 7th, 2023 at 10:36:53 PM GMT-05:00
@Maciej Obuchowski - Got it. Our use-case is that we're trying to build a wrapper on top of openlineage-flink for productionising for our flink jobs.

We're trying to have a wrapper class that extends OpenLineageFlinkJobListener class, and overwrites the HTTP transport endpoint/url to a constant value (say, example.com and /api/v1/flink). But we see that the OpenLineageFlinkJobListener constructor is defined as a private constructor - just wanted to check with the team whether it was just a default scope, or intended to be private. If it was just a default scope, can we contribute a PR to make it public, to make it friendly for teams trying to adopt & extend openlineage?

And also, we wanted to understand better on where we're reading the HTTP transport endpoint/url configs in OpenLineageFlinkJobListener and what'd be the best place to override it to the constant endpoint/url for our use-case
Maciej ObuchowskiWednesday, November 8th, 2023 at 5:55:43 AM GMT-05:00
We parse flink conf to get that information: https://github.com/OpenLineage/OpenLineage/blob/26494b596e9669d2ada164066a73c44e04[…]ink/src/main/java/io/openlineage/flink/client/EventEmitter.java

> But we see that the OpenLineageFlinkJobListener constructor is defined as a private constructor - just wanted to check with the team whether it was just a default scope, or intended to be private.
The way to construct is is a public builder in the same class

I think easier way than wrapper class would be use existing flink configuration, or to set up OPENLINEAGE_URL env variable, or have openlineage.yml config file - not sure why this is the way you've chosen?
Athitya KumarThursday, November 9th, 2023 at 12:41:02 PM GMT-05:00
I think easier way than wrapper class would be use existing flink configuration, or to set up OPENLINEAGE_URL env variable, or have openlineage.yml config file - not sure why this is the way you've chosen?
@Maciej Obuchowski - The reasoning behind going with a wrapper class is that we can abstract out the nitty-gritty like how/where we're publishing openlineage events etc - especially for companies that have a lot of teams that may be adopting openlineage.

For example, if we wanna move away from http transport to kafka transport - we'd be changing only this wrapper class and ask folks to update their wrapper class dependency version. If we went without the wrapper class, then the exact config changes would need to be synced and done by many different teams, who may not have enough context.

Similarly, if we wanna enable some other default best-practise configs, or inject any company-specific configs etc, the wrapper would be useful in abstracting out the details and be the 1 place that handles all openlineage related integrations for any future changes.

That's why we wanna extend openlineage's listener class & leverage most of the OSS code as-is; and at the same time, have the ability to extend & inject customisations. I think that's where some things like having getters for the class object attributes, or having public constructors would be really helpful 😄
Maciej ObuchowskiThursday, November 9th, 2023 at 1:03:56 PM GMT-05:00
@Athitya Kumar that makes sense. Feel free to provide PR adding getters and stuff.
🎉1
Yannick LibertTuesday, November 7th, 2023 at 6:03:49 AM GMT-05:00
Hi all, we (I work with @U05VDHJJ9T7 and @Abdallah) have a quick question regarding the spark integration:
if a spark app contains several jobs, they will be named "my_spark_app_name.job1" and "my_spark_app_name.job2"
eg:
spark_job.collect_limit
spark_job.map_partitions_parallel_collection

If I understood correctly, the spark integration maps one Spark job to a single OpenLineage Job, and the application itself should be assigned a Run id at startup and each job that executes will report the application's Run id as its parent job run (taken from: https://openlineage.io/docs/integrations/spark/).

In our case, the app Run Id is never created, and the jobs runs don't contain any parent facets. We tested it with a recent integration version in 1.4.1 and also an older one (0.26.0).
Did we miss something in the OL spark integration config?
Paweł LeszczyńskiTuesday, November 7th, 2023 at 6:07:51 AM GMT-05:00
hey, a name of the output dataset should be put at the end of the job name. This was introduced to help with jobs that call multiple spark actions
Yannick LibertTuesday, November 7th, 2023 at 7:05:52 AM GMT-05:00
Hi Paweł,
Thanks for your answer, yes indeed with the newer version of OL, we automatically have the name of the output dataset at the end of the job name, but no App run id, nor any parent run facet.
Paweł LeszczyńskiTuesday, November 7th, 2023 at 8:16:44 AM GMT-05:00
yes, you're right. I mean you can set in config spark.openlineage.parentJobName which will be shared through whole app run, but this needs to be set manually
Yannick LibertTuesday, November 7th, 2023 at 8:36:58 AM GMT-05:00
I see, thanks a lot for your reply we'll try that
ldaceyTuesday, November 7th, 2023 at 10:49:25 AM GMT-05:00
if I have a dataset on adls gen2 which synapse connects to as an external delta table, is that the use case of a symlink dataset? the delta table is connected to by PBI and by Synapse, but the underlying data is exactly the same
Maciej ObuchowskiWednesday, November 8th, 2023 at 10:49:04 AM GMT-05:00
Sounds like it, yes - if the logical dataset names are different but physical one is the same
Rodrigo MaiaWednesday, November 8th, 2023 at 12:38:52 PM GMT-05:00
Has anyone here tried OpenLineage with Spark on Amazon EMR?
Jason YipWednesday, November 8th, 2023 at 1:01:16 PM GMT-05:00
No but it should work the same I tried on AWS and Google Colab and Azure
👍1
U053LCT71BQThursday, November 9th, 2023 at 3:10:54 AM GMT-05:00
Yes. @Abdallah could provide some details if needed.
👍1
🔥1
Rodrigo MaiaMonday, November 20th, 2023 at 11:29:26 AM GMT-05:00
Thanks @U053LCT71BQ
HI @Abdallah i was able to set up a spark cluster on AWS EMR but im struggling to configure the OL Listener. Ive tried with steps and bootstrap actions for the jar and it didn't work out. How did you manage to include the jar? Besides, what about the spark configuration? Could you send me a sample of these configs?
Michael RobinsonWednesday, November 8th, 2023 at 12:44:54 PM GMT-05:00
@channel
Friendly reminder: this month’s TSC meeting, open to all, is tomorrow at 10 am PT: https://openlineage.slack.com/archives/C01CK9T7HKR/p1699027207361229
👍1
Jason YipFriday, November 10th, 2023 at 3:25:45 PM GMT-05:00
@Paweł Leszczyński regarding to https://github.com/OpenLineage/OpenLineage/issues/2124, OL is parsing out the table location in Hive metastore, it is the location of the table in the catalog and not the physical location of the data. It is both right and wrong because it is a table, just it is an external table.

https://docs.databricks.com/en/sql/language-manual/sql-ref-external-tables.html
Jason YipFriday, November 10th, 2023 at 3:32:28 PM GMT-05:00
Jason YipSaturday, November 11th, 2023 at 3:29:33 AM GMT-05:00
@Paweł Leszczyński this is why if create a table with adls location it won't show input and output:

https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/spark35/src[…]k35/agent/lifecycle/plan/CreateReplaceOutputDatasetBuilder.java

Because the catalog object is not there.
Jason YipSaturday, November 11th, 2023 at 3:30:44 AM GMT-05:00
Databricks needs to be re-written in a way that supports Databricks it seems like
Jason YipMonday, November 13th, 2023 at 3:00:42 AM GMT-05:00
@Paweł Leszczyński I went back to 1.4.1, output does show adls location. But environment facet is gone in 1.4.1. It shows up in 1.5.0 but namespace is back to dbfs....
Jason YipMonday, November 13th, 2023 at 3:18:37 AM GMT-05:00
@Paweł Leszczyński I diff CreateReplaceDatasetBuilder.java and CreateReplaceOutputDatasetBuilder.java and they are the same except for the class name, so I am not sure what is causing the change. I also realize you don't have a test case for ADLS
Paweł LeszczyńskiMonday, November 13th, 2023 at 4:52:07 AM GMT-05:00
Thanks @Jason Yip for your engagement in finding the cause and solution to this issue.

Among the technical problems, another problem here is that our databricks integration tests are run on AWS and the issue you describe occurs in Azure. I would consider this a primary issue as it is difficult for me to verify the behaviour you describe and fix it with a failing integration test at the start.

Are you able to reproduce the issue on AWS Databricks environment so that we could include it in our integration tests and make sure the behvaiour will not change later on in future?
Jason YipMonday, November 13th, 2023 at 6:06:44 PM GMT-05:00
I didn't know Azure and AWS Databricks are different. Let me try it on AWS as well
Naresh reddyWednesday, November 15th, 2023 at 7:17:24 AM GMT-05:00
Hi
Can anyone point me to the deck on how Airflow can be integrated using Openlineage?
Naresh reddyWednesday, November 15th, 2023 at 7:27:55 AM GMT-05:00
thank you @Maciej Obuchowski
Naresh reddyWednesday, November 15th, 2023 at 11:09:24 AM GMT-05:00
Can anyone tell me why OL is better than other competitors if you can provide an analysis that would be great
Harel SheinThursday, November 16th, 2023 at 11:46:16 AM GMT-05:00
Hey @Naresh reddy can you help me understand what you mean by competitors?
OL is a specification that can be used to solve various problems, so if you have a clear problem statement, maybe I can help with pros/cons for that problem
Naresh reddyWednesday, November 15th, 2023 at 11:10:58 AM GMT-05:00
what are the pros and cons of OL. we often talk about positives to market it but what are the pain points using OL,how it's addressing user issues?
Michael RobinsonThursday, November 16th, 2023 at 1:38:42 PM GMT-05:00
Hi @Naresh reddy, thanks for your question. We’ve heard that OpenLineage is attractive because of its desirable integrations, including a best-in-class Spark integration, its extensibility, the fact that it’s not destructive, and the fact that it’s open source. I’m not aware of pain points per se, but there are certainly features and integrations that we wish we could focus on but can’t at the moment — like the Dagster integration, which needs a new maintainer. OpenLineage is like any other open standard in that ecosystem coverage is a constant process rather than a journey, and it requires contributions in order to get close to 100%. Thankfully, we are gaining users and contributors all the time, and integrations are being added or improved upon daily. See the Ecosystem page on the website for a list of consumers and producers and links to more resources, and check out the GitHub repo for the codebase, commit history, contributors, governance procedures, and more. We’re quick to respond to messages here and issues on GitHub — usually within one day.
karthik nandagiriSunday, November 19th, 2023 at 11:57:38 PM GMT-05:00
Hi So we can use openlineage to identify column level lineage with Airflow , Spark? will it also allow to connect to Power BI and derive the downstream column lineage ?
Maciej ObuchowskiMonday, November 20th, 2023 at 6:07:36 AM GMT-05:00
Yes, it works with Airflow and Spark - there is caveat that amount of operators that support it on Airflow side is fairly small and limited generally to most popular SQL operators.
> will it also allow to connect to Power BI and derive the downstream column lineage ?
No, there is no such feature yet 🙂
However, there's nothing preventing this - if you wish to work on such implementation, we'd be happy to help.
karthik nandagiriTuesday, November 21st, 2023 at 12:20:11 AM GMT-05:00
Thank you Maciej Obuchowski for the update. Currently we are looking out for a tool which can support connecting to Power Bi and pull column level lineage information for reports and dashboards. How this can be achieved with OL ? Can you give some idea?
Maciej ObuchowskiTuesday, November 21st, 2023 at 7:59:10 AM GMT-05:00
I don't think I can help you with that now, unless you want to work on your own integration with PowerBI 🙁
Rafał WójcikTuesday, November 21st, 2023 at 7:02:08 AM GMT-05:00
Hi Everyone, first of all - big shout to all contributors - You do amazing job here.
I want to use OpenLineage in our project - to do so I want to setup some POC and experiment with possibilities library provides - I start working on sample from the conference talk: https://github.com/getindata/openlineage-bbuzz2023-column-lineage but when I go into spark transformation after staring context with openlineage I have issues with SessionHiveMetaStoreClient on section 3- does anyone has other plain sample to play with, to not setup everything from scratch?
Maciej ObuchowskiTuesday, November 21st, 2023 at 7:37:00 AM GMT-05:00
Can you provide details about those issues? Like exceptions, logs, details of the jobs and how do you run them?
Rafał WójcikTuesday, November 21st, 2023 at 7:45:37 AM GMT-05:00
Hi @Maciej Obuchowski - I rerun docker container after deleting metadata_db folder possibly created by other local test, and fix this one but got problem with OpenLineageListener - during initialization of spark:
while I execute:
spark = (SparkSession.builder.master('local')
-         .appName('Food Delivery')
-         .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener')
-         .config('spark.jars', '&lt;local-path&gt;/openlineage-spark-0.27.2.jar,&lt;local-path&gt;/postgresql-42.6.0.jar')
-         .config('spark.openlineage.transport.type', 'http')
-         .config('spark.openlineage.transport.url', '<http://api:5000>')
-         .config('spark.openlineage.facets.disabled', '[spark_unknown;spark.logicalPlan]')
-         .config('spark.openlineage.namespace', 'food-delivery')
-         .config('spark.sql.warehouse.dir', '/tmp/spark-warehouse/')
-         .config("spark.sql.repl.eagerEval.enabled", True)
-         .enableHiveSupport()
-         .getOrCreate())

I got
Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
-: org.apache.spark.SparkException: Exception when registering SparkListener
-    at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2563)
-    at org.apache.spark.SparkContext.&lt;init&gt;(SparkContext.scala:643)
-    at org.apache.spark.api.java.JavaSparkContext.&lt;init&gt;(JavaSparkContext.scala:58)
-    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
-    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
-    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
-    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
-    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
-    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
-    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
-    at py4j.Gateway.invoke(Gateway.java:238)
-    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
-    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
-    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
-    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
-    at java.base/java.lang.Thread.run(Thread.java:833)
-Caused by: java.lang.ClassNotFoundException: io.openlineage.spark.agent.OpenLineageSparkListener
-    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
-    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:587)
-    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
-    at java.base/java.lang.Class.forName0(Native Method)
-    at java.base/java.lang.Class.forName(Class.java:467)
-    at org.apache.spark.util.Utils$.classForName(Utils.scala:218)
-    at org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:2921)
-    at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
-    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
-    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
-    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
-    at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
-    at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
-    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
-    at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2919)
-    at org.apache.spark.SparkContext.$anonfun$setupAndStartListenerBus$1(SparkContext.scala:2552)
-    at org.apache.spark.SparkContext.$anonfun$setupAndStartListenerBus$1$adapted(SparkContext.scala:2551)
-    at scala.Option.foreach(Option.scala:407)
-    at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2551)
-    ... 15 more

looks like by some reasons jars are not loaded - need to look into it
Maciej ObuchowskiTuesday, November 21st, 2023 at 7:58:09 AM GMT-05:00
Maciej ObuchowskiTuesday, November 21st, 2023 at 7:58:28 AM GMT-05:00
are you sure &lt;local-path&gt; is right?
Rafał WójcikTuesday, November 21st, 2023 at 8:00:49 AM GMT-05:00
yes, it's same as in sample - wondering why it's not get added:
from pyspark.sql import SparkSession
-
-spark = (SparkSession.builder.master('local')
-         .appName('Food Delivery')
-         .config('spark.jars', '/home/jovyan/jars/openlineage-spark-0.27.2.jar,/home/jovyan/jars/postgresql-42.6.0.jar')
-         .config('spark.sql.warehouse.dir', '/tmp/spark-warehouse/')
-         .config("spark.sql.repl.eagerEval.enabled", True)
-         .enableHiveSupport()
-         .getOrCreate())
-
-print(<http://spark.sparkContext._jsc.sc|spark.sparkContext._jsc.sc>().listJars())
-
-Vector()
Maciej ObuchowskiTuesday, November 21st, 2023 at 8:04:31 AM GMT-05:00
can you make sure jars are in this directory? just by docker run --entrypoint /usr/local/bin/bash IMAGE_NAME "ls /home/jovyan/jars"
Maciej ObuchowskiTuesday, November 21st, 2023 at 8:06:27 AM GMT-05:00
another option to try is to replace spark.jars with spark.jars.packages io.openlineage:openlineage-spark:1.5.0,org.postgresql:postgresql:42.7.0
Paweł LeszczyńskiTuesday, November 21st, 2023 at 8:16:54 AM GMT-05:00
I think this was done for the purpose of presentation to make sure the demo will work without internet access. This can be the reason to add jar manually to a docker. openlineage-spark can be added to Spark via spark.jar.packages , like we do here https://openlineage.io/docs/integrations/spark/quickstart_local
Rafał WójcikTuesday, November 21st, 2023 at 9:21:59 AM GMT-05:00
got it guys - thanks a lot for help - it turns out that spark context from notebook 2 and 3 has come kind of metadata conflict - when I combine those 2 and recreate image to clean up old metadata it works.
One more note is that sometimes kernels return weird results but it may be caused by some local nuances - anyways thx !
\ No newline at end of file diff --git a/slack-archive/html/C01NAFMBVEY-0.html b/slack-archive/html/C01NAFMBVEY-0.html deleted file mode 100644 index ad7f3ba..0000000 --- a/slack-archive/html/C01NAFMBVEY-0.html +++ /dev/null @@ -1,10 +0,0 @@ -Slack

mark-grover

Created by U01HNSHB2H4 on Monday, February 15th, 2021

No messages were ever sent!
\ No newline at end of file diff --git a/slack-archive/html/C030F1J0264-0.html b/slack-archive/html/C030F1J0264-0.html deleted file mode 100644 index d371b61..0000000 --- a/slack-archive/html/C030F1J0264-0.html +++ /dev/null @@ -1,10 +0,0 @@ -Slack

dagster-integration

Created by U025D1JDTRB on Friday, January 21st, 2022

No messages were ever sent!
\ No newline at end of file diff --git a/slack-archive/html/C04E3Q18RR9-0.html b/slack-archive/html/C04E3Q18RR9-0.html deleted file mode 100644 index 39d634b..0000000 --- a/slack-archive/html/C04E3Q18RR9-0.html +++ /dev/null @@ -1,10 +0,0 @@ -Slack

open-lineage-plus-bacalhau

Created by U044VPCNMDX on Wednesday, December 7th, 2022

No messages were ever sent!
\ No newline at end of file diff --git a/slack-archive/html/C04JPTTC876-0.html b/slack-archive/html/C04JPTTC876-0.html deleted file mode 100644 index 52df9e5..0000000 --- a/slack-archive/html/C04JPTTC876-0.html +++ /dev/null @@ -1,11 +0,0 @@ -Slack

spec-compliance

Created by Julien Le Dem on Thursday, January 12th, 2023

Key points/decisions - -<https://docs.google.com/document/d/1ysZR13QwDvAiY_rQJedLHpnNn3deQeow7BmCNckd2uM/edit>

No messages were ever sent!
\ No newline at end of file diff --git a/slack-archive/html/C04QSV0GG23-0.html b/slack-archive/html/C04QSV0GG23-0.html deleted file mode 100644 index f650ff7..0000000 --- a/slack-archive/html/C04QSV0GG23-0.html +++ /dev/null @@ -1,10 +0,0 @@ -Slack

providence-meetup

Created by Michael Robinson on Friday, February 24th, 2023

No messages were ever sent!
\ No newline at end of file diff --git a/slack-archive/html/C04THH1V90X-0.html b/slack-archive/html/C04THH1V90X-0.html deleted file mode 100644 index d088544..0000000 --- a/slack-archive/html/C04THH1V90X-0.html +++ /dev/null @@ -1,10 +0,0 @@ -Slack

data-council-meetup

Created by Michael Robinson on Tuesday, March 14th, 2023

No messages were ever sent!
\ No newline at end of file diff --git a/slack-archive/html/C051C93UZK9-0.html b/slack-archive/html/C051C93UZK9-0.html deleted file mode 100644 index 1154a35..0000000 --- a/slack-archive/html/C051C93UZK9-0.html +++ /dev/null @@ -1,10 +0,0 @@ -Slack

nyc-meetup

Created by Michael Robinson on Tuesday, April 4th, 2023

No messages were ever sent!
\ No newline at end of file diff --git a/slack-archive/html/C055GGUFMHQ-0.html b/slack-archive/html/C055GGUFMHQ-0.html deleted file mode 100644 index a301646..0000000 --- a/slack-archive/html/C055GGUFMHQ-0.html +++ /dev/null @@ -1,10 +0,0 @@ -Slack

boston-meetup

Created by Michael Robinson on Friday, April 28th, 2023

No messages were ever sent!
\ No newline at end of file diff --git a/slack-archive/html/C056YHEU680-0.html b/slack-archive/html/C056YHEU680-0.html deleted file mode 100644 index a278a5d..0000000 --- a/slack-archive/html/C056YHEU680-0.html +++ /dev/null @@ -1,6 +0,0 @@ -Slack

sf-meetup

Created by Michael Robinson on Thursday, May 4th, 2023

Michael RobinsonWednesday, August 30th, 2023 at 3:11:18 PM GMT-04:00
Adding the venue info in case it’s more convenient than the meetup page:
Michael RobinsonWednesday, August 30th, 2023 at 3:12:55 PM GMT-04:00
Time: 5:30-8:30 pm
Address: 8 California St., San Francisco, CA, seventh floor
Getting in: someone from Astronomer will be in the lobby to direct you
Kevin LanguascoThursday, August 31st, 2023 at 6:29:01 PM GMT-04:00
@Kevin Languasco has joined the channel
Julien Le DemThursday, August 31st, 2023 at 11:18:10 PM GMT-04:00
Some pictures from last night
Aaruna GodthiSaturday, September 23rd, 2023 at 4:47:37 PM GMT-04:00
@Aaruna Godthi has joined the channel
\ No newline at end of file diff --git a/slack-archive/html/C05N442RQUA-0.html b/slack-archive/html/C05N442RQUA-0.html deleted file mode 100644 index 89ef08d..0000000 --- a/slack-archive/html/C05N442RQUA-0.html +++ /dev/null @@ -1,6 +0,0 @@ -Slack

toronto-meetup

Created by Michael Robinson on Wednesday, August 16th, 2023

Michael RobinsonFriday, August 25th, 2023 at 1:30:07 PM GMT-04:00
Some belated updates on this in case you’re not aware:
• Date: 9/18
• Time: 5-8:00 PM ET
• Place: Canarts, 600 Bay St., #410 (around the corner from the Airflow Summit venue)
• Venue phone: 416-805-2286
• Meetup for more info and to sign up: https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link
🎉4
🙌4
Harel SheinFriday, August 25th, 2023 at 1:33:42 PM GMT-04:00
really looking forward to meeting all of you in Toronto!!
Julien Le DemFriday, September 1st, 2023 at 11:10:51 PM GMT-04:00
Most OpenLineage regular contributors will be there. It will be fun to be all in person. Everyone is encouraged to join
🙌6
Michael RobinsonMonday, September 11th, 2023 at 10:13:57 AM GMT-04:00
@channel
It’s hard to believe this is happening in just one week! Here’s the updated agenda:
1. Intros
2. Evolution of spec presentation/discussion (project background/history)
3. State of the community
4. Integrating OpenLineage with Metaphor (by special guests Ye & Ivan)
5. Spark/Column lineage update
6. Airflow Provider update
7. Roadmap Discussion
8. Action items review/next steps
Find the details and RSVP here.
🙌1
Greg KimFriday, September 15th, 2023 at 10:11:11 AM GMT-04:00
@Greg Kim has joined the channel
Michael RobinsonFriday, September 15th, 2023 at 12:17:29 PM GMT-04:00
Looking forward to seeing you on Monday! Here’s the time/place info again for your convenience:
• Date: 9/18
• Time: 5-8:00 PM ET
• Place: Canarts, 600 Bay St., #410 (around the corner from the Airflow Summit venue)
• Venue phone: 416-805-2286
• Meetup page with more info and signup: https://www.meetup.com/openlineage/events/295488014/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link
Please send a message if you find yourself stuck in the lobby, etc.
🙌1
Michael RobinsonMonday, September 18th, 2023 at 4:20:33 PM GMT-04:00
Hi, if you’re wondering if you’re in the right place: look for Uncle Tetsu’s Cheesecake nextdoor and for the address (600 Bay St) above the door. The building is an older one (unlike the meeting space itself, which is modern and well-appointed)
\ No newline at end of file diff --git a/slack-archive/html/C05PD7VJ52S-0.html b/slack-archive/html/C05PD7VJ52S-0.html deleted file mode 100644 index 9178d1a..0000000 --- a/slack-archive/html/C05PD7VJ52S-0.html +++ /dev/null @@ -1,6 +0,0 @@ -Slack

london-meetup

Created by Michael Robinson on Friday, August 25th, 2023

Michael RobinsonFriday, August 25th, 2023 at 11:52:05 AM GMT-04:00
@Michael Robinson has joined the channel
George PolychronopoulosFriday, August 25th, 2023 at 11:52:15 AM GMT-04:00
@George Polychronopoulos has joined the channel
Harel SheinFriday, August 25th, 2023 at 11:52:15 AM GMT-04:00
@Harel Shein has joined the channel
Madhav KakumaniFriday, August 25th, 2023 at 11:52:40 AM GMT-04:00
@Madhav Kakumani has joined the channel
George PolychronopoulosFriday, August 25th, 2023 at 11:52:49 AM GMT-04:00
Hi Michael
George PolychronopoulosFriday, August 25th, 2023 at 11:53:36 AM GMT-04:00
thanks so much !
Michael RobinsonFriday, August 25th, 2023 at 11:53:54 AM GMT-04:00
Hi George, nice to meet you. Thanks for asking about future meetups. Would November be too soon, or what’s a good timeframe for you all?
George PolychronopoulosFriday, August 25th, 2023 at 11:54:12 AM GMT-04:00
thats perfect !
Michael RobinsonFriday, August 25th, 2023 at 11:54:31 AM GMT-04:00
Great! Do you happen to have space we could use?
George PolychronopoulosFriday, August 25th, 2023 at 11:55:58 AM GMT-04:00
I will have to confirm but 99% yes
George PolychronopoulosFriday, August 25th, 2023 at 11:57:47 AM GMT-04:00
I am pretty sure you can use our 6point6 offices or at least part of it
George PolychronopoulosFriday, August 25th, 2023 at 11:58:00 AM GMT-04:00
and if that not the case i can provide personal space
Michael RobinsonFriday, August 25th, 2023 at 11:58:46 AM GMT-04:00
OK! Would you please let me know when you know, and we’ll go from there?
George PolychronopoulosFriday, August 25th, 2023 at 11:59:29 AM GMT-04:00
yes absolutely will give you an answer by Monday
👍1
Madhav KakumaniFriday, August 25th, 2023 at 12:51:08 PM GMT-04:00
Thanks Michael for starting this channel
Madhav KakumaniFriday, August 25th, 2023 at 12:51:21 PM GMT-04:00
hopefully meet you soon in London
Michael RobinsonFriday, August 25th, 2023 at 1:19:58 PM GMT-04:00
Yes, hope so! Thank you for your interest in joining a meetup!
Maciej ObuchowskiFriday, August 25th, 2023 at 1:34:21 PM GMT-04:00
@Maciej Obuchowski has joined the channel
Mike O'ConnorThursday, August 31st, 2023 at 4:04:17 PM GMT-04:00
@Mike O'Connor has joined the channel
\ No newline at end of file diff --git a/slack-archive/html/C05U3UC85LM-0.html b/slack-archive/html/C05U3UC85LM-0.html deleted file mode 100644 index 410d409..0000000 --- a/slack-archive/html/C05U3UC85LM-0.html +++ /dev/null @@ -1,6 +0,0 @@ -Slack

gx-integration

Created by Michael Robinson on Wednesday, September 27th, 2023

Michael RobinsonWednesday, September 27th, 2023 at 1:38:03 PM GMT-04:00
@Michael Robinson has joined the channel
🎉1
Don HeppnerWednesday, September 27th, 2023 at 1:38:23 PM GMT-04:00
@Don Heppner has joined the channel
Harel SheinWednesday, September 27th, 2023 at 1:38:23 PM GMT-04:00
@Harel Shein has joined the channel
Jakub DardzińskiWednesday, September 27th, 2023 at 1:38:23 PM GMT-04:00
@Jakub Dardziński has joined the channel
Maciej ObuchowskiWednesday, September 27th, 2023 at 1:41:17 PM GMT-04:00
@Maciej Obuchowski has joined the channel
Harel SheinWednesday, September 27th, 2023 at 2:44:18 PM GMT-04:00
@Don Heppner it was great meeting earlier, looking forward to collaborating on this!
2
Bill DirksThursday, September 28th, 2023 at 11:59:31 AM GMT-04:00
@Bill Dirks has joined the channel
Jakub DardzińskiMonday, October 9th, 2023 at 7:46:52 AM GMT-04:00
Hello guys! I’ve been looking recently into changes in GX.
https://greatexpectations.io/blog/the-fluent-way-to-connect-to-data-sources-in-gx/
is this the major change you’d like to introduce in OL<-> GX?
Harel SheinMonday, October 9th, 2023 at 10:14:39 AM GMT-04:00
@Don Heppner @Bill Dirks ^^
Bill DirksTuesday, October 10th, 2023 at 7:12:02 PM GMT-04:00
Just seeing this, we had a company holiday yesterday. Yes, fluent data sources are our new way of connecting to data and the older "block-style" is deprecated and will be removed when we cut 0.18.0. I'm not sure of the timing of that but likely in the next couple months.
👍1
Harel SheinFriday, October 13th, 2023 at 8:47:55 AM GMT-04:00
@Bill Dirks would be great if we could get your eyes on this PR: https://github.com/OpenLineage/OpenLineage/pull/2134
Bill DirksFriday, October 13th, 2023 at 3:00:37 PM GMT-04:00
I'm a bit slammed today but can look on Tuesday.
1
JasonThursday, October 19th, 2023 at 1:02:24 PM GMT-04:00
@Jason has joined the channel
\ No newline at end of file diff --git a/slack-archive/html/C065PQ4TL8K-0.html b/slack-archive/html/C065PQ4TL8K-0.html deleted file mode 100644 index 3d207b3..0000000 --- a/slack-archive/html/C065PQ4TL8K-0.html +++ /dev/null @@ -1,6 +0,0 @@ -Slack

dev-discuss

Created by Harel Shein on Tuesday, November 14th, 2023

Harel SheinTuesday, November 14th, 2023 at 12:13:06 PM GMT-05:00
@Harel Shein has joined the channel
Maciej ObuchowskiTuesday, November 14th, 2023 at 12:13:10 PM GMT-05:00
@Maciej Obuchowski has joined the channel
Julien Le DemTuesday, November 14th, 2023 at 12:13:46 PM GMT-05:00
@Julien Le Dem has joined the channel
Paweł LeszczyńskiTuesday, November 14th, 2023 at 12:13:46 PM GMT-05:00
@Paweł Leszczyński has joined the channel
Jakub DardzińskiTuesday, November 14th, 2023 at 12:13:46 PM GMT-05:00
@Jakub Dardziński has joined the channel
Michael RobinsonTuesday, November 14th, 2023 at 12:13:46 PM GMT-05:00
@Michael Robinson has joined the channel
Willy LulciucTuesday, November 14th, 2023 at 12:13:46 PM GMT-05:00
@Willy Lulciuc has joined the channel
Peter HicksTuesday, November 14th, 2023 at 12:13:46 PM GMT-05:00
@Peter Hicks has joined the channel
Jakub DardzińskiTuesday, November 14th, 2023 at 12:13:57 PM GMT-05:00
👋
Ross TurkTuesday, November 14th, 2023 at 12:14:02 PM GMT-05:00
@Ross Turk has joined the channel
Michael RobinsonTuesday, November 14th, 2023 at 12:16:19 PM GMT-05:00
👋
Julien Le DemTuesday, November 14th, 2023 at 12:18:42 PM GMT-05:00
👋
Willy LulciucTuesday, November 14th, 2023 at 12:18:53 PM GMT-05:00
👋
Ross TurkTuesday, November 14th, 2023 at 12:29:47 PM GMT-05:00
🌊
Maciej ObuchowskiTuesday, November 14th, 2023 at 1:53:08 PM GMT-05:00
👋
Jakub DardzińskiTuesday, November 14th, 2023 at 6:30:48 PM GMT-05:00
Jakub DardzińskiWednesday, November 15th, 2023 at 4:35:37 AM GMT-05:00
Maciej ObuchowskiWednesday, November 15th, 2023 at 5:03:58 AM GMT-05:00
nice to have fun with you Jakub
🙂3
Paweł LeszczyńskiWednesday, November 15th, 2023 at 5:42:34 AM GMT-05:00
Can't wait to see it on the 1st January.
Harel SheinWednesday, November 15th, 2023 at 6:56:03 AM GMT-05:00
Ain’t no party like a dev ex improvement party
Maciej ObuchowskiWednesday, November 15th, 2023 at 11:45:53 AM GMT-05:00
Gentoo installation party is in similar category of fun
Willy LulciucWednesday, November 15th, 2023 at 3:32:27 AM GMT-05:00
@Paweł Leszczyński approved PR #2661 with minor comments, I think the enum defined in the db layer is one comment we’ll need to address before merging; otherwise solid work dude 👌
🙌2
Paweł LeszczyńskiWednesday, November 15th, 2023 at 3:34:42 AM GMT-05:00
_Minor_: We can consider defining a _run_state column and eventually dropping the event_type. That is, we can consider columns prefixed with _ to be "remappings" of OL properties to Marquez. -> didn't get this one. Is it for now or some future plans?
Willy LulciucWednesday, November 15th, 2023 at 3:36:02 AM GMT-05:00
future 😉
Paweł LeszczyńskiWednesday, November 15th, 2023 at 3:36:10 AM GMT-05:00
ok
Paweł LeszczyńskiWednesday, November 15th, 2023 at 3:36:23 AM GMT-05:00
I will then replace enum with string
Willy LulciucWednesday, November 15th, 2023 at 3:36:10 AM GMT-05:00
Paweł LeszczyńskiWednesday, November 15th, 2023 at 3:36:33 AM GMT-05:00
this is the next to go
Paweł LeszczyńskiWednesday, November 15th, 2023 at 3:36:38 AM GMT-05:00
and i consider it ready
Paweł LeszczyńskiWednesday, November 15th, 2023 at 3:37:31 AM GMT-05:00
Then we have a draft one with streaming support https://github.com/MarquezProject/marquez/pull/2682/files -> which has an integration test of lineage endpoint working for streaming jobs
Paweł LeszczyńskiWednesday, November 15th, 2023 at 3:38:32 AM GMT-05:00
I still need to work on #2682 but you can review #2654. once you get some sleep, of course 😉
❤️1
Jakub DardzińskiWednesday, November 15th, 2023 at 4:35:37 AM GMT-05:00
Maciej ObuchowskiWednesday, November 15th, 2023 at 11:44:44 AM GMT-05:00
👀1
Jakub DardzińskiWednesday, November 15th, 2023 at 12:24:27 PM GMT-05:00
did you check if LineageCollector is instantiated once per process?
Maciej ObuchowskiWednesday, November 15th, 2023 at 12:26:37 PM GMT-05:00
Using it only via get_hook_lineage_collector
Jakub DardzińskiWednesday, November 15th, 2023 at 12:17:31 PM GMT-05:00
is it time to support hudi?
😂1
Michael RobinsonWednesday, November 15th, 2023 at 2:57:10 PM GMT-05:00
Anyone have thoughts about how to address the question about “pain points” here? https://openlineage.slack.com/archives/C01CK9T7HKR/p1700064564825909. (Listing pros is easy — it’s the cons we don’t have boilerplate for)
Michael RobinsonWednesday, November 15th, 2023 at 2:58:08 PM GMT-05:00
Maybe something like “OL has many desirable integrations, including a best-in-class Spark integration, but it’s like any other open standard in that it requires contributions in order to approach total coverage. Thankfully, we have many active contributors, and integrations are being added or improved upon all the time.”
Maciej ObuchowskiWednesday, November 15th, 2023 at 4:04:51 PM GMT-05:00
Maybe rephrase pain points to "something we're not actively focusing on"
Michael RobinsonWednesday, November 15th, 2023 at 2:59:19 PM GMT-05:00
Apparently an admin can view a Slack archive at any time at this URL: https://openlineage.slack.com/services/export. Only public channels are available, though.
Julien Le DemWednesday, November 15th, 2023 at 4:53:09 PM GMT-05:00
you are now admin
👍1
Willy LulciucWednesday, November 15th, 2023 at 5:32:26 PM GMT-05:00
Jakub DardzińskiWednesday, November 15th, 2023 at 5:33:19 PM GMT-05:00
we have it in SQL operators
Willy LulciucWednesday, November 15th, 2023 at 5:34:25 PM GMT-05:00
OOh any docs / code? or if you’d like to respond in the MQZ slack 🙏
Jakub DardzińskiWednesday, November 15th, 2023 at 5:35:19 PM GMT-05:00
I’ll reply there
❤️2
Michael RobinsonWednesday, November 15th, 2023 at 5:50:23 PM GMT-05:00
Any opinions about a free task management alternative to the free version of Notion (10-person limit)? Looking at Trello for keeping track of talks.
Harel SheinWednesday, November 15th, 2023 at 7:32:17 PM GMT-05:00
What about GitHub projects?
👍1
Michael RobinsonThursday, November 16th, 2023 at 9:27:46 AM GMT-05:00
Projects is the way to go, thanks
Michael RobinsonThursday, November 16th, 2023 at 10:23:34 AM GMT-05:00
Set up a Projects board. New projects are private by default. We could make it public. The one thing that’s missing that we could use is a built-in date field for alerting about upcoming deadlines…
🙌2
Michael RobinsonThursday, November 16th, 2023 at 9:31:24 AM GMT-05:00
worlds are colliding: 6point6 has been acquired by Accenture
Maciej ObuchowskiThursday, November 16th, 2023 at 10:03:27 AM GMT-05:00
We should sell OL to governments
🙃1
Harel SheinThursday, November 16th, 2023 at 10:20:36 AM GMT-05:00
we may have to rebrand to ClosedLineage
Maciej ObuchowskiThursday, November 16th, 2023 at 10:23:37 AM GMT-05:00
not in this way; just emit any event second time to secret NSA endpoint
Michael RobinsonThursday, November 16th, 2023 at 11:13:17 AM GMT-05:00
we would need to improve our stock photo game
Maciej ObuchowskiThursday, November 16th, 2023 at 12:17:22 PM GMT-05:00
CFP for Berlin Buzzwords went up: https://2024.berlinbuzzwords.de/call-for-papers/
Still over 3 months to submit 🙂
Michael RobinsonThursday, November 16th, 2023 at 12:42:56 PM GMT-05:00
thanks, updated the talks board
Michael RobinsonThursday, November 16th, 2023 at 12:43:10 PM GMT-05:00
Jakub DardzińskiThursday, November 16th, 2023 at 3:19:53 PM GMT-05:00
I'm in, will think what to talk about and appreciate any advice 🙂
Julien Le DemFriday, November 17th, 2023 at 1:42:19 PM GMT-05:00
Julien Le DemFriday, November 17th, 2023 at 1:47:21 PM GMT-05:00
It looks like the datahub airflow plugin uses OL. but turns it off
https://github.com/datahub-project/datahub/blob/2b0811b9875d7d7ea11fb01d0157a21fdd67f020/docs/lineage/airflow.md
disable_openlineage_plugin    true    Disable the OpenLineage plugin to avoid duplicative processing.

They reuse the extractors but then “patch” the behavior.
Julien Le DemFriday, November 17th, 2023 at 1:48:52 PM GMT-05:00
Of course this approach will need changing again with AF 2.7
Julien Le DemFriday, November 17th, 2023 at 1:49:02 PM GMT-05:00
It’s their choice 🤷
Julien Le DemFriday, November 17th, 2023 at 1:51:23 PM GMT-05:00
It looks like we can possibly learn from their approach in SQL parsing: https://datahubproject.io/docs/lineage/airflow/#automatic-lineage-extraction
Jakub DardzińskiFriday, November 17th, 2023 at 4:42:51 PM GMT-05:00
what's that approach? I only know they have been claiming best SQL parsing capabilities
Julien Le DemFriday, November 17th, 2023 at 8:54:48 PM GMT-05:00
I haven’t looked in the details but I’m assuming it is in this repo. (my comment is entirely based on the claim here)
Paweł LeszczyńskiMonday, November 20th, 2023 at 2:58:07 AM GMT-05:00
<https://www.acryldata.io/blog/extracting-column-level-lineage-from-sql> -> The interesting difference is that in order to find table schemas, they use their data catalog to evaluate column-level lineage instead of doing this on the client side.

My understanding by example is: If you do
create table x as select * from y

you need to resolve * to know column level lineage. Our approach is to do that on the client side, probably with an extra call to database. Their approach is to do that based on the data catalog information.
Julien Le DemFriday, November 17th, 2023 at 8:56:54 PM GMT-05:00
I’m off on vacation. See you in a week
❤️5
Maciej ObuchowskiTuesday, November 21st, 2023 at 5:23:31 AM GMT-05:00
Maybe move today's meeting earlier, since no one from west coast is joining? @Harel Shein
\ No newline at end of file diff --git a/slack-archive/html/avatars/U01DCLP0GU9.jpg b/slack-archive/html/avatars/U01DCLP0GU9.jpg deleted file mode 100644 index b3de302..0000000 Binary files a/slack-archive/html/avatars/U01DCLP0GU9.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U01HNKK4XAM.jpg b/slack-archive/html/avatars/U01HNKK4XAM.jpg deleted file mode 100644 index 373bc37..0000000 Binary files a/slack-archive/html/avatars/U01HNKK4XAM.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U01HVNU6A4C.jpg b/slack-archive/html/avatars/U01HVNU6A4C.jpg deleted file mode 100644 index 13843d9..0000000 Binary files a/slack-archive/html/avatars/U01HVNU6A4C.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U01RA9B5GG2.jpg b/slack-archive/html/avatars/U01RA9B5GG2.jpg deleted file mode 100644 index e9ecc5b..0000000 Binary files a/slack-archive/html/avatars/U01RA9B5GG2.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U021QJMRP47.png b/slack-archive/html/avatars/U021QJMRP47.png deleted file mode 100644 index aba8653..0000000 Binary files a/slack-archive/html/avatars/U021QJMRP47.png and /dev/null differ diff --git a/slack-archive/html/avatars/U02LXF3HUN7.jpg b/slack-archive/html/avatars/U02LXF3HUN7.jpg deleted file mode 100644 index 9f8a0fc..0000000 Binary files a/slack-archive/html/avatars/U02LXF3HUN7.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U02S6F54MAB.jpg b/slack-archive/html/avatars/U02S6F54MAB.jpg deleted file mode 100644 index 06036ee..0000000 Binary files a/slack-archive/html/avatars/U02S6F54MAB.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U0323HG8C8H.png b/slack-archive/html/avatars/U0323HG8C8H.png deleted file mode 100644 index 561dbfe..0000000 Binary files a/slack-archive/html/avatars/U0323HG8C8H.png and /dev/null differ diff --git a/slack-archive/html/avatars/U03D8K119LJ.png b/slack-archive/html/avatars/U03D8K119LJ.png deleted file mode 100644 index 328e801..0000000 Binary files a/slack-archive/html/avatars/U03D8K119LJ.png and /dev/null differ diff --git a/slack-archive/html/avatars/U04AZ7992SU.jpg b/slack-archive/html/avatars/U04AZ7992SU.jpg deleted file mode 100644 index ad61439..0000000 Binary files a/slack-archive/html/avatars/U04AZ7992SU.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U04EZ2LPDV4.jpg b/slack-archive/html/avatars/U04EZ2LPDV4.jpg deleted file mode 100644 index b9a8024..0000000 Binary files a/slack-archive/html/avatars/U04EZ2LPDV4.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U055N2GRT4P.png b/slack-archive/html/avatars/U055N2GRT4P.png deleted file mode 100644 index 243b253..0000000 Binary files a/slack-archive/html/avatars/U055N2GRT4P.png and /dev/null differ diff --git a/slack-archive/html/avatars/U0595Q78HUG.png b/slack-archive/html/avatars/U0595Q78HUG.png deleted file mode 100644 index df1d083..0000000 Binary files a/slack-archive/html/avatars/U0595Q78HUG.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05A1D80QKF.jpg b/slack-archive/html/avatars/U05A1D80QKF.jpg deleted file mode 100644 index 84dcbbd..0000000 Binary files a/slack-archive/html/avatars/U05A1D80QKF.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U05CAULTYG2.png b/slack-archive/html/avatars/U05CAULTYG2.png deleted file mode 100644 index 7acb161..0000000 Binary files a/slack-archive/html/avatars/U05CAULTYG2.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05EC8WB74N.png b/slack-archive/html/avatars/U05EC8WB74N.png deleted file mode 100644 index e3792c2..0000000 Binary files a/slack-archive/html/avatars/U05EC8WB74N.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05FLJE4GDU.png b/slack-archive/html/avatars/U05FLJE4GDU.png deleted file mode 100644 index d33ec8f..0000000 Binary files a/slack-archive/html/avatars/U05FLJE4GDU.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05HBLE7YPL.png b/slack-archive/html/avatars/U05HBLE7YPL.png deleted file mode 100644 index 4b4f019..0000000 Binary files a/slack-archive/html/avatars/U05HBLE7YPL.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05HFGKEYVB.jpg b/slack-archive/html/avatars/U05HFGKEYVB.jpg deleted file mode 100644 index 3399b52..0000000 Binary files a/slack-archive/html/avatars/U05HFGKEYVB.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U05HK41VCH1.png b/slack-archive/html/avatars/U05HK41VCH1.png deleted file mode 100644 index 73e8569..0000000 Binary files a/slack-archive/html/avatars/U05HK41VCH1.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05J5GRKY10.png b/slack-archive/html/avatars/U05J5GRKY10.png deleted file mode 100644 index 7252199..0000000 Binary files a/slack-archive/html/avatars/U05J5GRKY10.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05J9LZ355L.png b/slack-archive/html/avatars/U05J9LZ355L.png deleted file mode 100644 index f0ee2d8..0000000 Binary files a/slack-archive/html/avatars/U05J9LZ355L.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05JBHLPY8K.jpg b/slack-archive/html/avatars/U05JBHLPY8K.jpg deleted file mode 100644 index 8620698..0000000 Binary files a/slack-archive/html/avatars/U05JBHLPY8K.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U05JY6MN8MS.png b/slack-archive/html/avatars/U05JY6MN8MS.png deleted file mode 100644 index 08b84bf..0000000 Binary files a/slack-archive/html/avatars/U05JY6MN8MS.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05K8F1T887.png b/slack-archive/html/avatars/U05K8F1T887.png deleted file mode 100644 index 4c16bc4..0000000 Binary files a/slack-archive/html/avatars/U05K8F1T887.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05KCF3EEUR.png b/slack-archive/html/avatars/U05KCF3EEUR.png deleted file mode 100644 index 25af181..0000000 Binary files a/slack-archive/html/avatars/U05KCF3EEUR.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05NGJ8AM8X.png b/slack-archive/html/avatars/U05NGJ8AM8X.png deleted file mode 100644 index d9ee059..0000000 Binary files a/slack-archive/html/avatars/U05NGJ8AM8X.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05NMJ0NBUK.png b/slack-archive/html/avatars/U05NMJ0NBUK.png deleted file mode 100644 index b62bce7..0000000 Binary files a/slack-archive/html/avatars/U05NMJ0NBUK.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05PVS8GRJ6.png b/slack-archive/html/avatars/U05PVS8GRJ6.png deleted file mode 100644 index e3792c2..0000000 Binary files a/slack-archive/html/avatars/U05PVS8GRJ6.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05Q3HT6PBR.jpg b/slack-archive/html/avatars/U05Q3HT6PBR.jpg deleted file mode 100644 index cd8246f..0000000 Binary files a/slack-archive/html/avatars/U05Q3HT6PBR.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U05QHG1NJ8J.png b/slack-archive/html/avatars/U05QHG1NJ8J.png deleted file mode 100644 index daaff6c..0000000 Binary files a/slack-archive/html/avatars/U05QHG1NJ8J.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05QL7LN2GH.png b/slack-archive/html/avatars/U05QL7LN2GH.png deleted file mode 100644 index a617998..0000000 Binary files a/slack-archive/html/avatars/U05QL7LN2GH.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05QNRSQW1E.png b/slack-archive/html/avatars/U05QNRSQW1E.png deleted file mode 100644 index 99f8b46..0000000 Binary files a/slack-archive/html/avatars/U05QNRSQW1E.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05SMTVPPL3.png b/slack-archive/html/avatars/U05SMTVPPL3.png deleted file mode 100644 index baf4f28..0000000 Binary files a/slack-archive/html/avatars/U05SMTVPPL3.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05SQGH8DV4.png b/slack-archive/html/avatars/U05SQGH8DV4.png deleted file mode 100644 index 561dbfe..0000000 Binary files a/slack-archive/html/avatars/U05SQGH8DV4.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05SXDWVA7K.png b/slack-archive/html/avatars/U05SXDWVA7K.png deleted file mode 100644 index 85e85f2..0000000 Binary files a/slack-archive/html/avatars/U05SXDWVA7K.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05T8BJD4DU.png b/slack-archive/html/avatars/U05T8BJD4DU.png deleted file mode 100644 index 68d08b7..0000000 Binary files a/slack-archive/html/avatars/U05T8BJD4DU.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05TQPZ4R4L.png b/slack-archive/html/avatars/U05TQPZ4R4L.png deleted file mode 100644 index 1dc3501..0000000 Binary files a/slack-archive/html/avatars/U05TQPZ4R4L.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05TU0U224A.png b/slack-archive/html/avatars/U05TU0U224A.png deleted file mode 100644 index 8353b35..0000000 Binary files a/slack-archive/html/avatars/U05TU0U224A.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05TZE47F2S.png b/slack-archive/html/avatars/U05TZE47F2S.png deleted file mode 100644 index 0bc4b99..0000000 Binary files a/slack-archive/html/avatars/U05TZE47F2S.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05U9929K3N.png b/slack-archive/html/avatars/U05U9929K3N.png deleted file mode 100644 index 3ed2b81..0000000 Binary files a/slack-archive/html/avatars/U05U9929K3N.png and /dev/null differ diff --git a/slack-archive/html/avatars/U05U9K21LSG.jpg b/slack-archive/html/avatars/U05U9K21LSG.jpg deleted file mode 100644 index ed0c7c2..0000000 Binary files a/slack-archive/html/avatars/U05U9K21LSG.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U0616K9TSTZ.jpg b/slack-archive/html/avatars/U0616K9TSTZ.jpg deleted file mode 100644 index 5eed971..0000000 Binary files a/slack-archive/html/avatars/U0616K9TSTZ.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U0620HU51HA.png b/slack-archive/html/avatars/U0620HU51HA.png deleted file mode 100644 index daaff6c..0000000 Binary files a/slack-archive/html/avatars/U0620HU51HA.png and /dev/null differ diff --git a/slack-archive/html/avatars/U0625RZ7KR9.jpg b/slack-archive/html/avatars/U0625RZ7KR9.jpg deleted file mode 100644 index a5a6655..0000000 Binary files a/slack-archive/html/avatars/U0625RZ7KR9.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U062Q95A1FG.png b/slack-archive/html/avatars/U062Q95A1FG.png deleted file mode 100644 index eed8d3d..0000000 Binary files a/slack-archive/html/avatars/U062Q95A1FG.png and /dev/null differ diff --git a/slack-archive/html/avatars/U062WLFMRTP.png b/slack-archive/html/avatars/U062WLFMRTP.png deleted file mode 100644 index 561dbfe..0000000 Binary files a/slack-archive/html/avatars/U062WLFMRTP.png and /dev/null differ diff --git a/slack-archive/html/avatars/U06315TMT61.png b/slack-archive/html/avatars/U06315TMT61.png deleted file mode 100644 index 283aca6..0000000 Binary files a/slack-archive/html/avatars/U06315TMT61.png and /dev/null differ diff --git a/slack-archive/html/avatars/U0635GK8Y14.jpg b/slack-archive/html/avatars/U0635GK8Y14.jpg deleted file mode 100644 index 53189ef..0000000 Binary files a/slack-archive/html/avatars/U0635GK8Y14.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U063YP6UJJ0.png b/slack-archive/html/avatars/U063YP6UJJ0.png deleted file mode 100644 index d9ee059..0000000 Binary files a/slack-archive/html/avatars/U063YP6UJJ0.png and /dev/null differ diff --git a/slack-archive/html/avatars/U066CNW85D3.jpg b/slack-archive/html/avatars/U066CNW85D3.jpg deleted file mode 100644 index 0b29a39..0000000 Binary files a/slack-archive/html/avatars/U066CNW85D3.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U066HKFCHUG.jpg b/slack-archive/html/avatars/U066HKFCHUG.jpg deleted file mode 100644 index f4e7e9e..0000000 Binary files a/slack-archive/html/avatars/U066HKFCHUG.jpg and /dev/null differ diff --git a/slack-archive/html/avatars/U066S97A90C.png b/slack-archive/html/avatars/U066S97A90C.png deleted file mode 100644 index 5a28f64..0000000 Binary files a/slack-archive/html/avatars/U066S97A90C.png and /dev/null differ diff --git a/slack-archive/html/emojis/gratitude-thank-you.png b/slack-archive/html/emojis/gratitude-thank-you.png deleted file mode 100644 index b2baa04..0000000 Binary files a/slack-archive/html/emojis/gratitude-thank-you.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05PKBM0RRV.png b/slack-archive/html/files/C01CK9T7HKR/F05PKBM0RRV.png deleted file mode 100644 index 8cff498..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05PKBM0RRV.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05PSGC7D8E.png b/slack-archive/html/files/C01CK9T7HKR/F05PSGC7D8E.png deleted file mode 100644 index bc60f55..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05PSGC7D8E.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05PW99E3L5.png b/slack-archive/html/files/C01CK9T7HKR/F05PW99E3L5.png deleted file mode 100644 index eead18a..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05PW99E3L5.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05RM6EV6DV.png b/slack-archive/html/files/C01CK9T7HKR/F05RM6EV6DV.png deleted file mode 100644 index c7a5b48..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05RM6EV6DV.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05RR4YHH1V.png b/slack-archive/html/files/C01CK9T7HKR/F05RR4YHH1V.png deleted file mode 100644 index 6b0146a..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05RR4YHH1V.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05RZ520LH0.png b/slack-archive/html/files/C01CK9T7HKR/F05RZ520LH0.png deleted file mode 100644 index a231f1c..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05RZ520LH0.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05S21H1LHK.txt b/slack-archive/html/files/C01CK9T7HKR/F05S21H1LHK.txt deleted file mode 100644 index 3d4ecef..0000000 --- a/slack-archive/html/files/C01CK9T7HKR/F05S21H1LHK.txt +++ /dev/null @@ -1,4 +0,0 @@ -{"eventTime": "2023-09-12T20:44:10.764Z", "producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent", "eventType": "START", "run": {"runId": "9293fb3d-bbe9-4237-b518-719a7c0f149d", "facets": {"spark.logicalPlan": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", "plan": [{"class": "org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand", "num-children": 0, "query": [{"class": "org.apache.spark.sql.catalyst.plans.logical.Project", "num-children": 1, "projectList": [[{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "customer_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 497, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_year", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 507, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.Year", "num-children": 1, "child": 0}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "date", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_month", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 508, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.Month", "num-children": 1, "child": 0}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "date", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_day", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 509, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.DayOfMonth", "num-children": 1, "child": 0}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "date", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_amount", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 510, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "float", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_amount", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 500, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_key", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 511, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.ConcatWs", "num-children": 3, "children": [0, 1, 2]}, {"class": "org.apache.spark.sql.catalyst.expressions.Literal", "num-children": 0, "value": "-", "dataType": "string"}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "string", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "customer_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 497, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "string", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 498, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "product_name", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 501, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}]], "child": 0}, {"class": "org.apache.spark.sql.execution.datasources.LogicalRelation", "num-children": 0, "relation": null, "output": [[{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "customer_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 497, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 498, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_amount", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 500, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "product_name", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 501, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}]], "isStreaming": false}], "dataSource": null, "options": null, "mode": null}]}, "spark_version": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", "spark-version": "3.4.1", "openlineage-spark-version": "1.1.0"}, "environment-properties": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", "environment-properties": {}}}}, "job": {"namespace": "staging", "name": "etl_test.execute_save_into_data_source_command.atlan_orders", "facets": {}}, "inputs": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "facets": {"dataSource": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", "name": "file", "uri": "file"}, "schema": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", "fields": [{"name": "customer_id", "type": "integer"}, {"name": "order_id", "type": "integer"}, {"name": "order_date", "type": "string"}, {"name": "order_amount", "type": "string"}, {"name": "product_name", "type": "string"}]}}, "inputFacets": {}}], "outputs": [{"namespace": "mysql://atlan-mysql.crmgvlgwn1cx.ap-south-1.rds.amazonaws.com:3306", "name": "atlan.orders", "facets": {"dataSource": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", "name": "mysql://atlan-mysql.crmgvlgwn1cx.ap-south-1.rds.amazonaws.com:3306", "uri": "mysql://atlan-mysql.crmgvlgwn1cx.ap-south-1.rds.amazonaws.com:3306"}, "schema": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", "fields": [{"name": "customer_id", "type": "integer"}, {"name": "order_year", "type": "integer"}, {"name": "order_month", "type": "integer"}, {"name": "order_day", "type": "integer"}, {"name": "order_amount", "type": "integer"}, {"name": "order_key", "type": "string"}, {"name": "product_name", "type": "string"}]}, "columnLineage": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-1/ColumnLineageDatasetFacet.json#/$defs/ColumnLineageDatasetFacet", "fields": {"customer_id": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "customer_id"}]}, "order_year": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_date"}]}, "order_month": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_date"}]}, "order_day": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_date"}]}, "order_amount": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_amount"}]}, "order_key": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "customer_id"}, {"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_id"}]}, "product_name": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "product_name"}]}}}, "lifecycleStateChange": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet", "lifecycleStateChange": "OVERWRITE"}}, "outputFacets": {}}]} -{"eventTime": "2023-09-12T20:44:11.462Z", "producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent", "eventType": "START", "run": {"runId": "9293fb3d-bbe9-4237-b518-719a7c0f149d", "facets": {"spark.logicalPlan": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", "plan": [{"class": "org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand", "num-children": 0, "query": [{"class": "org.apache.spark.sql.catalyst.plans.logical.Project", "num-children": 1, "projectList": [[{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "customer_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 497, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_year", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 507, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.Year", "num-children": 1, "child": 0}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "date", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_month", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 508, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.Month", "num-children": 1, "child": 0}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "date", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_day", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 509, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.DayOfMonth", "num-children": 1, "child": 0}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "date", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_amount", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 510, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "float", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_amount", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 500, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_key", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 511, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.ConcatWs", "num-children": 3, "children": [0, 1, 2]}, {"class": "org.apache.spark.sql.catalyst.expressions.Literal", "num-children": 0, "value": "-", "dataType": "string"}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "string", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "customer_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 497, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "string", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 498, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "product_name", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 501, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}]], "child": 0}, {"class": "org.apache.spark.sql.execution.datasources.LogicalRelation", "num-children": 0, "relation": null, "output": [[{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "customer_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 497, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 498, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_amount", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 500, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "product_name", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 501, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}]], "isStreaming": false}], "dataSource": null, "options": null, "mode": null}]}, "spark_version": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", "spark-version": "3.4.1", "openlineage-spark-version": "1.1.0"}, "spark_properties": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", "properties": {"spark.master": "local", "spark.app.name": "etl-test"}}, "environment-properties": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", "environment-properties": {}}}}, "job": {"namespace": "staging", "name": "etl_test.execute_save_into_data_source_command.atlan_orders", "facets": {}}, "inputs": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "facets": {"dataSource": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", "name": "file", "uri": "file"}, "schema": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", "fields": [{"name": "customer_id", "type": "integer"}, {"name": "order_id", "type": "integer"}, {"name": "order_date", "type": "string"}, {"name": "order_amount", "type": "string"}, {"name": "product_name", "type": "string"}]}}, "inputFacets": {}}], "outputs": [{"namespace": "mysql://atlan-mysql.crmgvlgwn1cx.ap-south-1.rds.amazonaws.com:3306", "name": "atlan.orders", "facets": {"dataSource": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", "name": "mysql://atlan-mysql.crmgvlgwn1cx.ap-south-1.rds.amazonaws.com:3306", "uri": "mysql://atlan-mysql.crmgvlgwn1cx.ap-south-1.rds.amazonaws.com:3306"}, "schema": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", "fields": [{"name": "customer_id", "type": "integer"}, {"name": "order_year", "type": "integer"}, {"name": "order_month", "type": "integer"}, {"name": "order_day", "type": "integer"}, {"name": "order_amount", "type": "integer"}, {"name": "order_key", "type": "string"}, {"name": "product_name", "type": "string"}]}, "columnLineage": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-1/ColumnLineageDatasetFacet.json#/$defs/ColumnLineageDatasetFacet", "fields": {"customer_id": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "customer_id"}]}, "order_year": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_date"}]}, "order_month": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_date"}]}, "order_day": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_date"}]}, "order_amount": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_amount"}]}, "order_key": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "customer_id"}, {"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_id"}]}, "product_name": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "product_name"}]}}}, "lifecycleStateChange": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet", "lifecycleStateChange": "OVERWRITE"}}, "outputFacets": {}}]} -{"eventTime": "2023-09-12T20:44:12.106Z", "producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent", "eventType": "COMPLETE", "run": {"runId": "9293fb3d-bbe9-4237-b518-719a7c0f149d", "facets": {"spark.logicalPlan": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", "plan": [{"class": "org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand", "num-children": 0, "query": [{"class": "org.apache.spark.sql.catalyst.plans.logical.Project", "num-children": 1, "projectList": [[{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "customer_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 497, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_year", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 507, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.Year", "num-children": 1, "child": 0}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "date", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_month", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 508, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.Month", "num-children": 1, "child": 0}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "date", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_day", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 509, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.DayOfMonth", "num-children": 1, "child": 0}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "date", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_amount", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 510, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "float", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_amount", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 500, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_key", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 511, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.ConcatWs", "num-children": 3, "children": [0, 1, 2]}, {"class": "org.apache.spark.sql.catalyst.expressions.Literal", "num-children": 0, "value": "-", "dataType": "string"}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "string", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "customer_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 497, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "string", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 498, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "product_name", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 501, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}]], "child": 0}, {"class": "org.apache.spark.sql.execution.datasources.LogicalRelation", "num-children": 0, "relation": null, "output": [[{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "customer_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 497, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 498, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_amount", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 500, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "product_name", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 501, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}]], "isStreaming": false}], "dataSource": null, "options": null, "mode": null}]}, "spark_version": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", "spark-version": "3.4.1", "openlineage-spark-version": "1.1.0"}, "environment-properties": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", "environment-properties": {}}}}, "job": {"namespace": "staging", "name": "etl_test.execute_save_into_data_source_command.atlan_orders", "facets": {}}, "inputs": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "facets": {"dataSource": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", "name": "file", "uri": "file"}, "schema": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", "fields": [{"name": "customer_id", "type": "integer"}, {"name": "order_id", "type": "integer"}, {"name": "order_date", "type": "string"}, {"name": "order_amount", "type": "string"}, {"name": "product_name", "type": "string"}]}}, "inputFacets": {}}], "outputs": [{"namespace": "mysql://atlan-mysql.crmgvlgwn1cx.ap-south-1.rds.amazonaws.com:3306", "name": "atlan.orders", "facets": {"dataSource": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", "name": "mysql://atlan-mysql.crmgvlgwn1cx.ap-south-1.rds.amazonaws.com:3306", "uri": "mysql://atlan-mysql.crmgvlgwn1cx.ap-south-1.rds.amazonaws.com:3306"}, "schema": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", "fields": [{"name": "customer_id", "type": "integer"}, {"name": "order_year", "type": "integer"}, {"name": "order_month", "type": "integer"}, {"name": "order_day", "type": "integer"}, {"name": "order_amount", "type": "integer"}, {"name": "order_key", "type": "string"}, {"name": "product_name", "type": "string"}]}, "columnLineage": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-1/ColumnLineageDatasetFacet.json#/$defs/ColumnLineageDatasetFacet", "fields": {"customer_id": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "customer_id"}]}, "order_year": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_date"}]}, "order_month": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_date"}]}, "order_day": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_date"}]}, "order_amount": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_amount"}]}, "order_key": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "customer_id"}, {"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_id"}]}, "product_name": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "product_name"}]}}}, "lifecycleStateChange": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet", "lifecycleStateChange": "OVERWRITE"}}, "outputFacets": {"outputStatistics": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/OutputStatisticsOutputDatasetFacet.json#/$defs/OutputStatisticsOutputDatasetFacet", "rowCount": 1, "size": 0}}}]} -{"eventTime": "2023-09-12T20:44:12.518Z", "producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent", "eventType": "COMPLETE", "run": {"runId": "9293fb3d-bbe9-4237-b518-719a7c0f149d", "facets": {"spark.logicalPlan": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", "plan": [{"class": "org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand", "num-children": 0, "query": [{"class": "org.apache.spark.sql.catalyst.plans.logical.Project", "num-children": 1, "projectList": [[{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "customer_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 497, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_year", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 507, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.Year", "num-children": 1, "child": 0}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "date", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_month", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 508, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.Month", "num-children": 1, "child": 0}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "date", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_day", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 509, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.DayOfMonth", "num-children": 1, "child": 0}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "date", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_amount", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 510, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "float", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_amount", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 500, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.Alias", "num-children": 1, "child": 0, "name": "order_key", "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 511, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": [], "nonInheritableMetadataKeys": "[__dataset_id, __col_position]"}, {"class": "org.apache.spark.sql.catalyst.expressions.ConcatWs", "num-children": 3, "children": [0, 1, 2]}, {"class": "org.apache.spark.sql.catalyst.expressions.Literal", "num-children": 0, "value": "-", "dataType": "string"}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "string", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "customer_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 497, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}, {"class": "org.apache.spark.sql.catalyst.expressions.Cast", "num-children": 1, "child": 0, "dataType": "string", "timeZoneId": "Etc/UTC", "evalMode": null}, {"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 498, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "product_name", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 501, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}]], "child": 0}, {"class": "org.apache.spark.sql.execution.datasources.LogicalRelation", "num-children": 0, "relation": null, "output": [[{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "customer_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 497, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_id", "dataType": "integer", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 498, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_date", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 499, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "order_amount", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 500, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}], [{"class": "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children": 0, "name": "product_name", "dataType": "string", "nullable": true, "metadata": {}, "exprId": {"product-class": "org.apache.spark.sql.catalyst.expressions.ExprId", "id": 501, "jvmId": "bca387c7-9171-4d47-8061-7031cec5e75f"}, "qualifier": []}]], "isStreaming": false}], "dataSource": null, "options": null, "mode": null}]}, "spark_version": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", "spark-version": "3.4.1", "openlineage-spark-version": "1.1.0"}, "environment-properties": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", "environment-properties": {}}}}, "job": {"namespace": "staging", "name": "etl_test.execute_save_into_data_source_command.atlan_orders", "facets": {}}, "inputs": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "facets": {"dataSource": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", "name": "file", "uri": "file"}, "schema": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", "fields": [{"name": "customer_id", "type": "integer"}, {"name": "order_id", "type": "integer"}, {"name": "order_date", "type": "string"}, {"name": "order_amount", "type": "string"}, {"name": "product_name", "type": "string"}]}}, "inputFacets": {}}], "outputs": [{"namespace": "mysql://atlan-mysql.crmgvlgwn1cx.ap-south-1.rds.amazonaws.com:3306", "name": "atlan.orders", "facets": {"dataSource": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", "name": "mysql://atlan-mysql.crmgvlgwn1cx.ap-south-1.rds.amazonaws.com:3306", "uri": "mysql://atlan-mysql.crmgvlgwn1cx.ap-south-1.rds.amazonaws.com:3306"}, "schema": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", "fields": [{"name": "customer_id", "type": "integer"}, {"name": "order_year", "type": "integer"}, {"name": "order_month", "type": "integer"}, {"name": "order_day", "type": "integer"}, {"name": "order_amount", "type": "integer"}, {"name": "order_key", "type": "string"}, {"name": "product_name", "type": "string"}]}, "columnLineage": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-1/ColumnLineageDatasetFacet.json#/$defs/ColumnLineageDatasetFacet", "fields": {"customer_id": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "customer_id"}]}, "order_year": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_date"}]}, "order_month": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_date"}]}, "order_day": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_date"}]}, "order_amount": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_amount"}]}, "order_key": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "customer_id"}, {"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "order_id"}]}, "product_name": {"inputFields": [{"namespace": "file", "name": "/home/jovyan/notebooks/input.json", "field": "product_name"}]}}}, "lifecycleStateChange": {"_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/spark", "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet", "lifecycleStateChange": "OVERWRITE"}}, "outputFacets": {}}]} diff --git a/slack-archive/html/files/C01CK9T7HKR/F05S2NCLM9T.png b/slack-archive/html/files/C01CK9T7HKR/F05S2NCLM9T.png deleted file mode 100644 index c65eba7..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05S2NCLM9T.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05S2NNQ1UM.png b/slack-archive/html/files/C01CK9T7HKR/F05S2NNQ1UM.png deleted file mode 100644 index 7104339..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05S2NNQ1UM.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05S39T55QD.png b/slack-archive/html/files/C01CK9T7HKR/F05S39T55QD.png deleted file mode 100644 index 3103ff1..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05S39T55QD.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05S4S20MDZ.py b/slack-archive/html/files/C01CK9T7HKR/F05S4S20MDZ.py deleted file mode 100644 index a5debef..0000000 --- a/slack-archive/html/files/C01CK9T7HKR/F05S4S20MDZ.py +++ /dev/null @@ -1,57 +0,0 @@ -from pyspark.sql import SparkSession -from pyspark.sql.functions import col, concat_ws, year, month, dayofmonth -from pyspark.sql.types import StructType, StructField, StringType, IntegerType - - -spark = (SparkSession.builder.master('local') - .appName('etl-test') - .config('spark.jars.packages', "io.openlineage:openlineage-spark:1.1.0,mysql:mysql-connector-java:8.0.33") - .config('spark.extraListeners', 'io.openlineage.spark.agent.OpenLineageSparkListener') - .config('spark.openlineage.transport.type', 'http') - .config('spark.openlineage.transport.url', 'http://host.docker.internal:5009/events/spark/') - .config('spark.openlineage.namespace', 'staging') - .config('spark.openlineage.transport.auth.type', 'api_key') - .config('spark.openlineage.transport.auth.apiKey', 'abcdefghijk') - .getOrCreate()) - - -spark.sparkContext.setLogLevel("INFO") - - -# Define schema for JSON data -schema = StructType([ - StructField("customer_id", IntegerType(), True), - StructField("order_id", IntegerType(), True), - StructField("order_date", StringType(), True), - StructField("order_amount", StringType(), True), - StructField("product_name", StringType(), True) -]) - - -# Read JSON data into PySpark DataFrame -df = spark.read.option('multiline', True).json("input.json", schema=schema) - -# Apply data transformations -df_transformed = df.select( - col("customer_id"), - year(col("order_date")).alias("order_year"), - month(col("order_date")).alias("order_month"), - dayofmonth(col("order_date")).alias("order_day"), - col("order_amount").cast("float").alias("order_amount"), - concat_ws("-", col("customer_id"), col("order_id")).alias("order_key"), - col("product_name") -) - - -# Write transformed data to MySQL database -url = "" -table_name = "orders" -mode = "append" -properties = { - "driver": "com.mysql.cj.jdbc.Driver", - "user": "", - "password": "" -} - - -df_transformed.write.jdbc(url=url, table=table_name, mode=mode, properties=properties) \ No newline at end of file diff --git a/slack-archive/html/files/C01CK9T7HKR/F05S5J5AY2E.json b/slack-archive/html/files/C01CK9T7HKR/F05S5J5AY2E.json deleted file mode 100644 index fcb6cf2..0000000 --- a/slack-archive/html/files/C01CK9T7HKR/F05S5J5AY2E.json +++ /dev/null @@ -1,81197 +0,0 @@ -[ - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "o5h328f64kgrn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:17:17.341082567Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:17:23.108567573Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[b179c836-534a-4cbc-988b-b062a2689cd4] received", - "insertId": "pt2eu6fl9z5vp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:43.338356815Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-xttt8", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "[b179c836-534a-4cbc-988b-b062a2689cd4] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T06:10:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "pt2eu6fl9z5vq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:43.347299223Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-xttt8", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "pt2eu6fl9z5vr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:20:43.836767286Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "pt2eu6fl9z5vs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:43.838992505Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "pt2eu6fl9z5vt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:20:44.020621661Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-xttt8", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "No module named 'airflow.gcs'", - "insertId": "pt2eu6fl9z5vu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:20:44.122037728Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Failed to import plugin OpenLineagePlugin", - "insertId": "pt2eu6fl9z5vv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:20:44.131512340Z", - "severity": "ERROR", - "labels": { - "process": "plugins_manager.py:237", - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py\", line 427, in import_from_string module = importlib.import_module(module_path) File \"/opt/python3.8/lib/python3.8/importlib/__init__.py\", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File \"\", line 1014, in _gcd_import File \"\", line 991, in _find_and_load File \"\", line 961, in _find_and_load_unlocked File \"\", line 219, in _call_with_frames_removed File \"\", line 1014, in _gcd_import File \"\", line 991, in _find_and_load File \"\", line 961, in _find_and_load_unlocked File \"\", line 219, in _call_with_frames_removed File \"\", line 1014, in _gcd_import File \"\", line 991, in _find_and_load File \"\", line 973, in _find_and_load_unlockedModuleNotFoundError: No module named 'airflow.gcs'", - "insertId": "pt2eu6fl9z5vw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:44.131577764Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "", - "insertId": "pt2eu6fl9z5vx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:44.131698239Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "The above exception was the direct cause of the following exception:", - "insertId": "pt2eu6fl9z5vy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:44.131704399Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "", - "insertId": "pt2eu6fl9z5vz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:20:44.131712503Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/plugins_manager.py\", line 229, in load_entrypoint_plugins plugin_class = entry_point.load() File \"/opt/python3.8/lib/python3.8/site-packages/setuptools/_vendor/importlib_metadata/__init__.py\", line 194, in load module = import_module(match.group('module')) File \"/opt/python3.8/lib/python3.8/importlib/__init__.py\", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File \"\", line 1014, in _gcd_import File \"\", line 991, in _find_and_load File \"\", line 975, in _find_and_load_unlocked File \"\", line 671, in _load_unlocked File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/plugin.py\", line 32, in from openlineage.airflow import listener File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/listener.py\", line 75, in extractor_manager = ExtractorManager() File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/extractors/manager.py\", line 16, in __init__ self.task_to_extractor = Extractors() File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/extractors/extractors.py\", line 122, in __init__ extractor = import_from_string(extractor.strip()) File \"/opt/python3.8/lib/python3.8/site-packages/openlineage/airflow/utils.py\", line 431, in import_from_string raise ImportError(f\"Failed to import {path}\") from eImportError: Failed to import airflow.gcs.dags.big_query_insert_job_extractor.BigQueryInsertJobExtractor", - "insertId": "pt2eu6fl9z5w0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:20:44.131723039Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "pt2eu6fl9z5w1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:20:44.855276216Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-xttt8", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Running on host airflow-worker-xttt8", - "insertId": "pt2eu6fl9z5w2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:20:45.429806563Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "pt2eu6fl9z5w3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:20:45.558770406Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "workflow": "airflow_monitoring", - "task-id": "echo", - "worker_id": "airflow-worker-xttt8", - "execution-date": "2023-09-13T06:10:00+00:00", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "pt2eu6fl9z5w4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:20:45.602639080Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-xttt8", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T06:10:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "pt2eu6fl9z5w5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:45.603495843Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-xttt8", - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:1289", - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "pt2eu6fl9z5w6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:45.604780565Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-xttt8", - "task-id": "echo", - "execution-date": "2023-09-13T06:10:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "pt2eu6fl9z5w7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:45.604808705Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "task-id": "echo", - "execution-date": "2023-09-13T06:10:00+00:00", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Executing on 2023-09-13 06:10:00+00:00", - "insertId": "pt2eu6fl9z5w8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:45.637291340Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T06:10:00+00:00", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1", - "process": "taskinstance.py:1310", - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Started process 13235 to run task", - "insertId": "pt2eu6fl9z5w9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:20:45.648107344Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-xttt8", - "execution-date": "2023-09-13T06:10:00+00:00", - "process": "standard_task_runner.py:55", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T06:10:00+00:00', '--job-id', '894', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp_n224aa7']", - "insertId": "pt2eu6fl9z5wa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:45.705841248Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T06:10:00+00:00", - "process": "standard_task_runner.py:82", - "worker_id": "airflow-worker-xttt8", - "try-number": "1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Job 894: Subtask echo", - "insertId": "pt2eu6fl9z5wb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:20:45.706449263Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "standard_task_runner.py:83", - "execution-date": "2023-09-13T06:10:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-xttt8", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Running on host airflow-worker-xttt8", - "insertId": "pt2eu6fl9z5wc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:46.090072160Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-xttt8", - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "execution-date": "2023-09-13T06:10:00+00:00", - "workflow": "airflow_monitoring", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T06:10:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T06:10:00+00:00", - "insertId": "pt2eu6fl9z5wd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:46.299038062Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "execution-date": "2023-09-13T06:10:00+00:00", - "worker_id": "airflow-worker-xttt8", - "process": "taskinstance.py:1518" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "pt2eu6fl9z5we", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:20:46.300988663Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "subprocess.py:63", - "execution-date": "2023-09-13T06:10:00+00:00", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "pt2eu6fl9z5wf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:46.303239453Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "process": "subprocess.py:75", - "worker_id": "airflow-worker-xttt8", - "execution-date": "2023-09-13T06:10:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Output:", - "insertId": "pt2eu6fl9z5wg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:20:46.463815092Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-xttt8", - "task-id": "echo", - "map-index": "-1", - "execution-date": "2023-09-13T06:10:00+00:00", - "process": "subprocess.py:86", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "test", - "insertId": "pt2eu6fl9z5wh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:20:46.469961225Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "subprocess.py:93", - "worker_id": "airflow-worker-xttt8", - "execution-date": "2023-09-13T06:10:00+00:00", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "pt2eu6fl9z5wi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:20:46.470980894Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:10:00+00:00", - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-xttt8", - "process": "subprocess.py:97", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T061000, start_date=20230913T062045, end_date=20230913T062046", - "insertId": "pt2eu6fl9z5wj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:20:46.520652439Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-xttt8", - "execution-date": "2023-09-13T06:10:00+00:00", - "task-id": "echo", - "process": "taskinstance.py:1328", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "pt2eu6fl9z5wk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:20:46.749143500Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-xttt8", - "process": "local_task_job.py:212", - "task-id": "echo", - "execution-date": "2023-09-13T06:10:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "pt2eu6fl9z5wl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:20:46.799640725Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-xttt8", - "process": "taskinstance.py:2599", - "task-id": "echo", - "execution-date": "2023-09-13T06:10:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[b179c836-534a-4cbc-988b-b062a2689cd4] succeeded in 3.657855946017662s: None", - "insertId": "pt2eu6fl9z5wm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:20:46.999036148Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-xttt8", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:20:48.847319607Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1q634m9foffwd5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:22:19.630355980Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:22:24.923847676Z" - }, - { - "textPayload": "I0913 06:23:24.979289 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ci0rqeflculbq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:23:24.979688101Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T06:23:30.760880457Z" - }, - { - "textPayload": "I0913 06:23:24.982578 1 airflowworkerset_controller.go:268] \"controllers/AirflowWorkerSet: Worker uses old template. Recreating.\" worker name=\"airflow-worker-xttt8\"", - "insertId": "ci0rqeflculbr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:23:24.982789363Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T06:23:30.760880457Z" - }, - { - "textPayload": "I0913 06:23:25.001946 1 airflowworkerset_controller.go:77] \"controllers/AirflowWorkerSet: Template changed, workers recreated.\"", - "insertId": "ci0rqeflculbs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:23:25.002206409Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T06:23:30.760880457Z" - }, - { - "textPayload": "I0913 06:23:25.007660 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ci0rqeflculbt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:23:25.007908746Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T06:23:30.760880457Z" - }, - { - "textPayload": "Caught SIGTERM signal!", - "insertId": "1dogrosfiejd96", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:23:25.069522479Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:23:30.820822010Z" - }, - { - "textPayload": "Passing SIGTERM to Airflow process.", - "insertId": "1dogrosfiejd97", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:23:25.069575680Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:23:30.820822010Z" - }, - { - "textPayload": "", - "insertId": "1dogrosfiejd98", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:23:25.073569310Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:23:30.820822010Z" - }, - { - "textPayload": "worker: Warm shutdown (MainProcess)", - "insertId": "1dogrosfiejd99", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:23:25.073658948Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:23:30.820822010Z" - }, - { - "textPayload": "I0913 06:23:25.082440 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ci0rqeflculbu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:23:25.082718313Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T06:23:30.760880457Z" - }, - { - "textPayload": "Exiting due to SIGTERM.", - "insertId": "1jrvr6tfhpf11u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:23:29.810289597Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-xttt8" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:23:35.888896400Z" - }, - { - "textPayload": "I0913 06:23:30.584958 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "y55jdnfhm8gvz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:23:30.585280543Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T06:23:36.856860030Z" - }, - { - "textPayload": "I0913 06:23:30.586358 1 airflowworkerset_controller.go:97] \"controllers/AirflowWorkerSet: Workers scale up needed.\" current number of workers=0 desired=1 scaling up by=1", - "insertId": "y55jdnfhm8gw0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:23:30.586466023Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T06:23:36.856860030Z" - }, - { - "textPayload": "I0913 06:23:30.912234 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "y55jdnfhm8gw1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:23:30.912487602Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T06:23:36.856860030Z" - }, - { - "textPayload": "I0913 06:23:30.971358 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "y55jdnfhm8gw2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:23:30.971544870Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T06:23:36.856860030Z" - }, - { - "textPayload": "I0913 06:23:31.030738 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "y55jdnfhm8gw3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:23:31.030923361Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T06:23:36.856860030Z" - }, - { - "textPayload": "Starting the process, got command: worker", - "insertId": "sl44fbfaq7wcs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:23:32.392869382Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:04.950374461Z" - }, - { - "textPayload": "Initializing airflow.cfg.", - "insertId": "sl44fbfaq7wct", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:23:32.398918478Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:04.950374461Z" - }, - { - "textPayload": "airflow.cfg initialization is done.", - "insertId": "sl44fbfaq7wcu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:23:32.419699715Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:04.950374461Z" - }, - { - "textPayload": "I0913 06:23:33.788962 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "y55jdnfhm8gw4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:23:33.789191529Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T06:23:36.856860030Z" - }, - { - "textPayload": "I0913 06:23:33.829697 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "y55jdnfhm8gw5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:23:33.829877090Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T06:23:36.856860030Z" - }, - { - "textPayload": "Setupping GCS Fuse.", - "insertId": "sl44fbfaq7wcv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:23:39.736992822Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:04.950374461Z" - }, - { - "textPayload": "gcsfuse mount seems ready, proceeding.", - "insertId": "sl44fbfaq7wcw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:23:39.737050648Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:04.950374461Z" - }, - { - "textPayload": "Initializing kube_config.", - "insertId": "sl44fbfaq7wcx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:23:39.748607770Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:04.950374461Z" - }, - { - "textPayload": "Fetching cluster endpoint and auth data.", - "insertId": "sl44fbfaq7wcy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:23:47.057332038Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:04.950374461Z" - }, - { - "textPayload": "kubeconfig entry generated for us-west1-openlineage-1614b57c-gke.", - "insertId": "sl44fbfaq7wcz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:23:47.278148257Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:04.950374461Z" - }, - { - "textPayload": "/home/airflow/composer_kube_config is initialized", - "insertId": "sl44fbfaq7wd0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:23:52.392607973Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:04.950374461Z" - }, - { - "textPayload": "Waiting for dags and plugins synchronization.", - "insertId": "sl44fbfaq7wd1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:23:52.393823570Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:04.950374461Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "sl44fbfaq7wd2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:23:52.394475666Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:04.950374461Z" - }, - { - "textPayload": "Searching for recent worker pod evictions", - "insertId": "sl44fbfaq7wd3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:23:52.441567237Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:04.950374461Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "sl44fbfaq7wd4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:23:57.508178620Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:04.950374461Z" - }, - { - "textPayload": "Finished searching for recent worker pod evictions", - "insertId": "sl44fbfaq7wd5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:23:59.652722004Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:04.950374461Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "sl44fbfaq7wd6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:24:02.527378821Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:04.950374461Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "chuu7tf17rfrl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:24:07.535521318Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:09.924037788Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "166p863figjlxg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:24:12.543766682Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:15.008340397Z" - }, - { - "textPayload": "I0913 06:24:16.206428 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1yhp8opfie0tsr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:24:16.206672300Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T06:24:22.105391933Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "10mylthfli26gs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:24:17.555686921Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:22.058366259Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1kvzo1hfhw38qc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:24:22.566621960Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:27.832520562Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1pw3kolfomiw8u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:24:27.567861778Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:32.834784424Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "15wh8bofhy63d0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:24:32.574771270Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:37.832701857Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "fjs92bfomsst5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:24:37.581974917Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:42.832133344Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1oi2t08fht12a0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:24:42.588949088Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:47.830981808Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "3mfw3lf64yvcy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:24:47.595937070Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:52.836163543Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1w9qjmufokl544", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:24:52.604615756Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:24:57.830667542Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "uhdj2f7zjpa5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:24:57.620941413Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:25:02.844830275Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1pw3cpmflfs3v9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:25:02.628876352Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:25:07.835632696Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "8vnkihf8dfpxz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:25:07.634096218Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:25:12.833136747Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "289vddflfv9hi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:25:12.643454189Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:25:17.829943181Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "dbxn1rfig64y9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:25:17.652130927Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:25:22.841839373Z" - }, - { - "textPayload": "Dags and plugins are synced", - "insertId": "1trsr1qf666yp0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:25:22.658094278Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:25:27.922941805Z" - }, - { - "textPayload": "Starting Airflow Celery Flower API.", - "insertId": "1trsr1qf666yp1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:25:22.659092164Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:25:27.922941805Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1sxdoryfhrykzm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:25:45.311638999Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:25:51.046875331Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1sxdoryfhrykzn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:25:45.327455687Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:25:51.046875331Z" - }, - { - "textPayload": " ", - "insertId": "1aw5z45fiim844", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:26:00.658094718Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": " -------------- celery@airflow-worker-bwrv5 v5.2.7 (dawn-chorus)", - "insertId": "1aw5z45fiim845", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:26:00.658154634Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "1aw5z45fiim846", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:26:00.658163092Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": "-- ******* ---- Linux-5.15.109+-x86_64-with-glibc2.27 2023-09-13 06:26:00", - "insertId": "1aw5z45fiim847", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:26:00.658169909Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": "- *** --- * --- ", - "insertId": "1aw5z45fiim848", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:26:00.658175456Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": "- ** ---------- [config]", - "insertId": "1aw5z45fiim849", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:26:00.658182010Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": "- ** ---------- .> app: airflow.executors.celery_executor:0x7b007b043370", - "insertId": "1aw5z45fiim84a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:26:00.658187578Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": "- ** ---------- .> transport: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "1aw5z45fiim84b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:26:00.658264722Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": "- ** ---------- .> results: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "1aw5z45fiim84c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:26:00.658275482Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": "- *** --- * --- .> concurrency: 6 (prefork)", - "insertId": "1aw5z45fiim84d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:26:00.658281855Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": "-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)", - "insertId": "1aw5z45fiim84e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:26:00.658287382Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "1aw5z45fiim84f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:26:00.658293412Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": " -------------- [queues]", - "insertId": "1aw5z45fiim84g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:26:00.658299159Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": " .> default exchange=default(direct) key=default", - "insertId": "1aw5z45fiim84h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:26:00.658304846Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": " ", - "insertId": "1aw5z45fiim84i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:26:00.658310905Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": "", - "insertId": "1aw5z45fiim84j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:26:00.658316379Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": "[tasks]", - "insertId": "1aw5z45fiim84k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:26:00.658322592Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": " . airflow.executors.celery_executor.execute_command", - "insertId": "1aw5z45fiim84l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:26:00.658328763Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": "", - "insertId": "1aw5z45fiim84m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:26:00.658333880Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:05.088908243Z" - }, - { - "textPayload": "Connected to redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "6nzk7tfonw84j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:26:09.308396582Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "connection.py:22" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:14.832075846Z" - }, - { - "textPayload": "mingle: searching for neighbors", - "insertId": "6nzk7tfonw84k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:26:09.324639429Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "mingle.py:40" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:14.832075846Z" - }, - { - "textPayload": "mingle: all alone", - "insertId": "6nzk7tfonw84l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:26:10.427049546Z", - "severity": "INFO", - "labels": { - "process": "mingle.py:49", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:14.832075846Z" - }, - { - "textPayload": "celery@airflow-worker-bwrv5 ready.", - "insertId": "6nzk7tfonw84m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:26:10.471400331Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "worker.py:176" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:14.832075846Z" - }, - { - "textPayload": "Events of group {task} enabled by remote.", - "insertId": "1hk9mvrfihp277", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:26:14.977024801Z", - "severity": "INFO", - "labels": { - "process": "control.py:277", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:26:19.834742905Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[8d530ab2-e4be-4ce5-bdec-1d2518ad6c8e] received", - "insertId": "1p1qyl7fhrk4i6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:30:01.583065354Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:04.706181938Z" - }, - { - "textPayload": "[8d530ab2-e4be-4ce5-bdec-1d2518ad6c8e] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T06:20:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "1p1qyl7fhrk4i7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:30:01.652815954Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:04.706181938Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1p1qyl7fhrk4i8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:30:02.048551494Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:04.706181938Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1p1qyl7fhrk4i9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:30:02.050814775Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:04.706181938Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1p1qyl7fhrk4ia", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:30:02.204486431Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:04.706181938Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "1p1qyl7fhrk4ib", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:30:03.141609103Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:04.706181938Z" - }, - { - "textPayload": "Running on host airflow-worker-bwrv5", - "insertId": "1p1qyl7fhrk4ic", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:30:03.679473214Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:04.706181938Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1thoa8of6778pt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:30:03.823033917Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "try-number": "1", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-13T06:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1thoa8of6778pu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:30:03.850564140Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "map-index": "-1", - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:20:00+00:00", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1thoa8of6778pv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:30:03.851626993Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "map-index": "-1", - "task-id": "echo", - "execution-date": "2023-09-13T06:20:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1289" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "1thoa8of6778pw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:30:03.852451918Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:1290", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "execution-date": "2023-09-13T06:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1thoa8of6778px", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:30:03.853203288Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1291", - "task-id": "echo", - "execution-date": "2023-09-13T06:20:00+00:00", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1thoa8of6778py", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:30:04.160511615Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1thoa8of6778pz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:30:04.160555065Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1thoa8of6778q0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:30:04.179151428Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1thoa8of6778q1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:30:04.179194758Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Executing on 2023-09-13 06:20:00+00:00", - "insertId": "1thoa8of6778q2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:30:05.186561467Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1310", - "execution-date": "2023-09-13T06:20:00+00:00", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Started process 302 to run task", - "insertId": "1thoa8of6778q3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:30:05.265056655Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-bwrv5", - "task-id": "echo", - "execution-date": "2023-09-13T06:20:00+00:00", - "try-number": "1", - "process": "standard_task_runner.py:55" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T06:20:00+00:00', '--job-id', '896', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpaiawzt5q']", - "insertId": "1thoa8of6778q4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:30:05.265496678Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "standard_task_runner.py:82", - "try-number": "1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:20:00+00:00", - "worker_id": "airflow-worker-bwrv5", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Job 896: Subtask echo", - "insertId": "1thoa8of6778q5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:30:05.266051717Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:20:00+00:00", - "process": "standard_task_runner.py:83", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Running on host airflow-worker-bwrv5", - "insertId": "1thoa8of6778q6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:30:05.661502419Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "process": "task_command.py:393", - "try-number": "1", - "execution-date": "2023-09-13T06:20:00+00:00", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T06:20:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T06:20:00+00:00", - "insertId": "1thoa8of6778q7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:30:05.838640607Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "taskinstance.py:1518", - "try-number": "1", - "execution-date": "2023-09-13T06:20:00+00:00", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1thoa8of6778q8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:30:05.840516426Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "subprocess.py:63", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T06:20:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1thoa8of6778q9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:30:05.842547258Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "subprocess.py:75", - "worker_id": "airflow-worker-bwrv5", - "task-id": "echo", - "execution-date": "2023-09-13T06:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Output:", - "insertId": "1thoa8of6778qa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:30:05.989622917Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "map-index": "-1", - "execution-date": "2023-09-13T06:20:00+00:00", - "worker_id": "airflow-worker-bwrv5", - "process": "subprocess.py:86", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "test", - "insertId": "1thoa8of6778qb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:30:05.997131045Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T06:20:00+00:00", - "process": "subprocess.py:93", - "worker_id": "airflow-worker-bwrv5", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1thoa8of6778qc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:30:05.998314604Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "process": "subprocess.py:97", - "execution-date": "2023-09-13T06:20:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T062000, start_date=20230913T063003, end_date=20230913T063006", - "insertId": "1thoa8of6778qd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:30:06.045895913Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "execution-date": "2023-09-13T06:20:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1328", - "workflow": "airflow_monitoring", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1thoa8of6778qe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:30:06.859145691Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T06:20:00+00:00", - "map-index": "-1", - "workflow": "airflow_monitoring", - "process": "local_task_job.py:212", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1thoa8of6778qf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:30:06.928781454Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:2599", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-bwrv5", - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T06:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[8d530ab2-e4be-4ce5-bdec-1d2518ad6c8e] succeeded in 5.4964942400110886s: None", - "insertId": "1thoa8of6778qg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:30:07.084160854Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:09.798776737Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "kdbmofhr34oq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:30:31.346614912Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:30:36.129633977Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "jf9638f66nhdm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:35:38.415490779Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:35:44.259469583Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[a8133082-189a-4bb7-8a45-3c67dd779e10] received", - "insertId": "qd24ixfcqtnkt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:01.094902587Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:05.885365022Z" - }, - { - "textPayload": "[a8133082-189a-4bb7-8a45-3c67dd779e10] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T06:30:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "qd24ixfcqtnku", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:40:01.099742681Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:05.885365022Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "qd24ixfcqtnkv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:40:02.054189781Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:05.885365022Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "qd24ixfcqtnkw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:02.066581626Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:05.885365022Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "qd24ixfcqtnkx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:40:02.343932399Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:05.885365022Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "lnavu5f4ajo0v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:04.933559807Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Running on host airflow-worker-bwrv5", - "insertId": "lnavu5f4ajo0w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:40:05.827678529Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "lnavu5f4ajo0x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:06.136778571Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-bwrv5", - "process": "taskinstance.py:1091", - "task-id": "echo", - "execution-date": "2023-09-13T06:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "lnavu5f4ajo0y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:40:06.223036661Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T06:30:00+00:00", - "process": "taskinstance.py:1091", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "lnavu5f4ajo0z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:06.224012382Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T06:30:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1289", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "lnavu5f4ajo10", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:40:06.225145020Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "map-index": "-1", - "process": "taskinstance.py:1290", - "try-number": "1", - "execution-date": "2023-09-13T06:30:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "lnavu5f4ajo11", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:06.226396974Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T06:30:00+00:00", - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "process": "taskinstance.py:1291", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "lnavu5f4ajo12", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:40:06.742665164Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "lnavu5f4ajo13", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:06.742731821Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "lnavu5f4ajo14", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:40:06.840205558Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "lnavu5f4ajo15", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:40:06.840268872Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Executing on 2023-09-13 06:30:00+00:00", - "insertId": "lnavu5f4ajo16", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:40:08.019038258Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "process": "taskinstance.py:1310", - "execution-date": "2023-09-13T06:30:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Started process 534 to run task", - "insertId": "lnavu5f4ajo17", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:08.104781273Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:30:00+00:00", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "process": "standard_task_runner.py:55", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T06:30:00+00:00', '--job-id', '897', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpxcu4l00x']", - "insertId": "lnavu5f4ajo18", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:08.109078312Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T06:30:00+00:00", - "task-id": "echo", - "process": "standard_task_runner.py:82", - "worker_id": "airflow-worker-bwrv5", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Job 897: Subtask echo", - "insertId": "lnavu5f4ajo19", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:40:08.110194318Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "task-id": "echo", - "process": "standard_task_runner.py:83", - "map-index": "-1", - "execution-date": "2023-09-13T06:30:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Running on host airflow-worker-bwrv5", - "insertId": "lnavu5f4ajo1a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:40:08.653285912Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "execution-date": "2023-09-13T06:30:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T06:30:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T06:30:00+00:00", - "insertId": "lnavu5f4ajo1b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:09.059280825Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:30:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1518", - "worker_id": "airflow-worker-bwrv5", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "lnavu5f4ajo1c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:09.060628409Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:30:00+00:00", - "map-index": "-1", - "process": "subprocess.py:63", - "try-number": "1", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "lnavu5f4ajo1d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:09.062959906Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:75", - "execution-date": "2023-09-13T06:30:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "workflow": "airflow_monitoring", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Output:", - "insertId": "lnavu5f4ajo1e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:09.316760621Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T06:30:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "task-id": "echo", - "process": "subprocess.py:86", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "test", - "insertId": "lnavu5f4ajo1f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:09.326754449Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "process": "subprocess.py:93", - "execution-date": "2023-09-13T06:30:00+00:00", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "lnavu5f4ajo1g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:40:09.329353337Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:30:00+00:00", - "map-index": "-1", - "process": "subprocess.py:97" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T063000, start_date=20230913T064006, end_date=20230913T064009", - "insertId": "lnavu5f4ajo1h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:09.503616520Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:30:00+00:00", - "map-index": "-1", - "process": "taskinstance.py:1328", - "try-number": "1", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:10.850497601Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1inz1aqfhubn81", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:40:10.648367583Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T06:30:00+00:00", - "process": "local_task_job.py:212" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:15.953472013Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1inz1aqfhubn82", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:40:10.844720611Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T06:30:00+00:00", - "worker_id": "airflow-worker-bwrv5", - "process": "taskinstance.py:2599" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:15.953472013Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[a8133082-189a-4bb7-8a45-3c67dd779e10] succeeded in 10.04122254397953s: None", - "insertId": "1inz1aqfhubn83", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:40:11.139367284Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:15.953472013Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1pc0eszfonnpod", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:40:35.876328795Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:40:42.144686302Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1pw3kolfoodqdi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:45:38.044606644Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:45:44.502946888Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[929a3201-40d3-4f1f-aee2-b1849727834b] received", - "insertId": "y55jdnfhohrnx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:50:01.197041950Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "[929a3201-40d3-4f1f-aee2-b1849727834b] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T06:40:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "y55jdnfhohrny", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:50:01.204299322Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "y55jdnfhohrnz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:50:01.755141982Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "y55jdnfhohro0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:50:01.757586663Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "y55jdnfhohro1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:50:01.911483666Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "y55jdnfhohro2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:50:03.268866715Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Running on host airflow-worker-bwrv5", - "insertId": "y55jdnfhohro3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:50:04.119691206Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "y55jdnfhohro4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:50:04.242901855Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:40:00+00:00", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-bwrv5", - "try-number": "1", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "y55jdnfhohro5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:50:04.261110267Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T06:40:00+00:00", - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "y55jdnfhohro6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:50:04.261625471Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1289", - "execution-date": "2023-09-13T06:40:00+00:00", - "task-id": "echo", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "y55jdnfhohro7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:50:04.262151616Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "execution-date": "2023-09-13T06:40:00+00:00", - "task-id": "echo", - "process": "taskinstance.py:1290", - "workflow": "airflow_monitoring", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "y55jdnfhohro8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:50:04.262688361Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-bwrv5", - "execution-date": "2023-09-13T06:40:00+00:00", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "y55jdnfhohro9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:50:04.532291177Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "y55jdnfhohroa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:50:04.532331963Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "y55jdnfhohrob", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:50:04.552657131Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "y55jdnfhohroc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:50:04.552701329Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Executing on 2023-09-13 06:40:00+00:00", - "insertId": "y55jdnfhohrod", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:50:05.549665031Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-bwrv5", - "process": "taskinstance.py:1310", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:40:00+00:00", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Started process 767 to run task", - "insertId": "y55jdnfhohroe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:50:05.597911504Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "map-index": "-1", - "execution-date": "2023-09-13T06:40:00+00:00", - "process": "standard_task_runner.py:55", - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T06:40:00+00:00', '--job-id', '898', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp9kpe41th']", - "insertId": "y55jdnfhohrof", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:50:05.598261200Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T06:40:00+00:00", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "standard_task_runner.py:82", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Job 898: Subtask echo", - "insertId": "y55jdnfhohrog", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:50:05.599281795Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T06:40:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-bwrv5", - "process": "standard_task_runner.py:83", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Running on host airflow-worker-bwrv5", - "insertId": "y55jdnfhohroh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:50:05.956578745Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T06:40:00+00:00", - "worker_id": "airflow-worker-bwrv5", - "process": "task_command.py:393", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T06:40:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T06:40:00+00:00", - "insertId": "y55jdnfhohroi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:50:06.132163388Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "echo", - "process": "taskinstance.py:1518", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "try-number": "1", - "execution-date": "2023-09-13T06:40:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "y55jdnfhohroj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:50:06.134550137Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T06:40:00+00:00", - "process": "subprocess.py:63" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "y55jdnfhohrok", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:50:06.136601280Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "task-id": "echo", - "process": "subprocess.py:75", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T06:40:00+00:00", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Output:", - "insertId": "y55jdnfhohrol", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:50:06.276380189Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "subprocess.py:86", - "map-index": "-1", - "worker_id": "airflow-worker-bwrv5", - "try-number": "1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:40:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "test", - "insertId": "y55jdnfhohrom", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:50:06.284294051Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:40:00+00:00", - "process": "subprocess.py:93", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "y55jdnfhohron", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:50:06.285047267Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:97", - "worker_id": "airflow-worker-bwrv5", - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T06:40:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T064000, start_date=20230913T065004, end_date=20230913T065006", - "insertId": "y55jdnfhohroo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:50:06.332792981Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1328", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "execution-date": "2023-09-13T06:40:00+00:00", - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:07.288207426Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "4g4plkfhxpjnl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T06:50:07.101450642Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "local_task_job.py:212", - "map-index": "-1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "execution-date": "2023-09-13T06:40:00+00:00", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:13.426253890Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "4g4plkfhxpjnm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T06:50:07.179284386Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "taskinstance.py:2599", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:40:00+00:00", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:13.426253890Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[929a3201-40d3-4f1f-aee2-b1849727834b] succeeded in 6.115362359007122s: None", - "insertId": "4g4plkfhxpjnn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:50:07.317827551Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:13.426253890Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "i1ikokfii1nc5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:50:51.329133154Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:50:56.923037528Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1gqfok7fosjkjk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T06:55:44.113293921Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T06:55:47.078252746Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[2e44d28e-a3b8-4202-a386-3e26f0091b0c] received", - "insertId": "14in42fofevt1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:00:00.300040037Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "[2e44d28e-a3b8-4202-a386-3e26f0091b0c] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T06:50:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "14in42fofevt2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:00:00.307369526Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "14in42fofevt3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:00:00.654652040Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "14in42fofevt4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:00:00.704294970Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "14in42fofevt5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:00:00.834263335Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "14in42fofevt6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:00:01.735321689Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "Running on host airflow-worker-bwrv5", - "insertId": "14in42fofevt7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:00:02.367943850Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "14in42fofevt8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:00:02.646295398Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T06:50:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-bwrv5", - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "14in42fofevt9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:00:02.729988739Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "execution-date": "2023-09-13T06:50:00+00:00", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "14in42fofevta", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:00:02.730740602Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1289", - "execution-date": "2023-09-13T06:50:00+00:00", - "map-index": "-1", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "14in42fofevtb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:00:02.731323811Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T06:50:00+00:00", - "process": "taskinstance.py:1290", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "14in42fofevtc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:00:02.732443407Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T06:50:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "process": "taskinstance.py:1291", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "14in42fofevtd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:00:03.101778613Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "14in42fofevte", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:00:03.101977229Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "14in42fofevtf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:00:03.127205586Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "14in42fofevtg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:00:03.127276472Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "Executing on 2023-09-13 06:50:00+00:00", - "insertId": "14in42fofevth", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:00:04.292026421Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "process": "taskinstance.py:1310", - "task-id": "echo", - "execution-date": "2023-09-13T06:50:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "Started process 1002 to run task", - "insertId": "14in42fofevti", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:00:04.334501605Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "standard_task_runner.py:55", - "try-number": "1", - "execution-date": "2023-09-13T06:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T06:50:00+00:00', '--job-id', '899', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpqtklgx7x']", - "insertId": "14in42fofevtj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:00:04.350293523Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:50:00+00:00", - "process": "standard_task_runner.py:82", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "Job 899: Subtask echo", - "insertId": "14in42fofevtk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:00:04.350320364Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "standard_task_runner.py:83", - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T06:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "Running on host airflow-worker-bwrv5", - "insertId": "14in42fofevtl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:00:04.703607509Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T06:50:00+00:00", - "try-number": "1", - "process": "task_command.py:393", - "workflow": "airflow_monitoring", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:05.706523007Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T06:50:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T06:50:00+00:00", - "insertId": "12az88nfhuaxlx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:00:04.893426956Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1518", - "worker_id": "airflow-worker-bwrv5", - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T06:50:00+00:00", - "map-index": "-1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:10.813466175Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "12az88nfhuaxly", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:00:04.899023645Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T06:50:00+00:00", - "task-id": "echo", - "map-index": "-1", - "process": "subprocess.py:63", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:10.813466175Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "12az88nfhuaxlz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:00:04.901087705Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "process": "subprocess.py:75", - "execution-date": "2023-09-13T06:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:10.813466175Z" - }, - { - "textPayload": "Output:", - "insertId": "12az88nfhuaxm0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:00:05.059254149Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T06:50:00+00:00", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "process": "subprocess.py:86", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:10.813466175Z" - }, - { - "textPayload": "test", - "insertId": "12az88nfhuaxm1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:00:05.070937248Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:50:00+00:00", - "process": "subprocess.py:93", - "map-index": "-1", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:10.813466175Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "12az88nfhuaxm2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:00:05.072463934Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T06:50:00+00:00", - "try-number": "1", - "process": "subprocess.py:97", - "worker_id": "airflow-worker-bwrv5", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:10.813466175Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T065000, start_date=20230913T070002, end_date=20230913T070005", - "insertId": "12az88nfhuaxm3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:00:05.120905600Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "execution-date": "2023-09-13T06:50:00+00:00", - "process": "taskinstance.py:1328", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:10.813466175Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "12az88nfhuaxm4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:00:05.882566643Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "process": "local_task_job.py:212", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T06:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:10.813466175Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "12az88nfhuaxm5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:00:05.958058930Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T06:50:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "process": "taskinstance.py:2599" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:10.813466175Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[2e44d28e-a3b8-4202-a386-3e26f0091b0c] succeeded in 5.81403498997679s: None", - "insertId": "12az88nfhuaxm6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:00:06.119226436Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:10.813466175Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1lq842cfd19m9l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:00:45.944392917Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:00:49.243215773Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1flx675fifse3q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:05:52.813638925Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:05:54.243322809Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[7ae52f8e-fa01-4c9e-b6ea-07d7a9cfb85e] received", - "insertId": "77u6kkflfi2kp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:10:00.750709791Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:05.510881254Z" - }, - { - "textPayload": "[7ae52f8e-fa01-4c9e-b6ea-07d7a9cfb85e] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T07:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "77u6kkflfi2kq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:10:00.759962204Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:05.510881254Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "77u6kkflfi2kr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:10:01.120982359Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:05.510881254Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "77u6kkflfi2ks", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:10:01.122931777Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:05.510881254Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "77u6kkflfi2kt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:10:01.314548649Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:05.510881254Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "77u6kkflfi2ku", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:10:03.927375928Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:05.510881254Z" - }, - { - "textPayload": "Running on host airflow-worker-bwrv5", - "insertId": "14ik42efiikw1h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:10:05.041834588Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:10.662361757Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "14ik42efiikw1i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:10:05.420817Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-bwrv5", - "execution-date": "2023-09-13T07:00:00+00:00", - "map-index": "-1", - "process": "taskinstance.py:1091", - "workflow": "airflow_monitoring", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:10.662361757Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "14ik42efiikw1j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:10:05.512824842Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:00:00+00:00", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1091", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:10.662361757Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "14ik42efiikw1k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:10:05.513779458Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-bwrv5", - "process": "taskinstance.py:1289", - "map-index": "-1", - "execution-date": "2023-09-13T07:00:00+00:00", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:10.662361757Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "14ik42efiikw1l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:10:05.514253779Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T07:00:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "process": "taskinstance.py:1290", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:10.662361757Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "14ik42efiikw1m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:10:05.514778811Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "task-id": "echo", - "execution-date": "2023-09-13T07:00:00+00:00", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:10.662361757Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "14ik42efiikw1n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:10:06.020497210Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:10.662361757Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "14ik42efiikw1o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:10:06.020643716Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:10.662361757Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "14ik42efiikw1p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:10:06.111696529Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:10.662361757Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "14ik42efiikw1q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:10:06.111775679Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:10.662361757Z" - }, - { - "textPayload": "Executing on 2023-09-13 07:00:00+00:00", - "insertId": "14ik42efiikw1r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:10:08.423713715Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:00:00+00:00", - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1310", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:10.662361757Z" - }, - { - "textPayload": "Started process 1235 to run task", - "insertId": "14ik42efiikw1s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:10:08.460989109Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1", - "execution-date": "2023-09-13T07:00:00+00:00", - "process": "standard_task_runner.py:55" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:10.662361757Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T07:00:00+00:00', '--job-id', '900', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpqv8sblcj']", - "insertId": "14ik42efiikw1t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:10:08.507461019Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T07:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "process": "standard_task_runner.py:82", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:10.662361757Z" - }, - { - "textPayload": "Job 900: Subtask echo", - "insertId": "14ik42efiikw1u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:10:08.508949641Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "standard_task_runner.py:83", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T07:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:10.662361757Z" - }, - { - "textPayload": "Running on host airflow-worker-bwrv5", - "insertId": "14ik42efiikw1v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:10:09.349114710Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "execution-date": "2023-09-13T07:00:00+00:00", - "process": "task_command.py:393", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:10.662361757Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T07:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T07:00:00+00:00", - "insertId": "1o3eswfeto01z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:10:09.928923305Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T07:00:00+00:00", - "process": "taskinstance.py:1518", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:15.772819162Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1o3eswfeto020", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:10:09.931330081Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:63", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T07:00:00+00:00", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:15.772819162Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1o3eswfeto021", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:10:09.933399970Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-bwrv5", - "task-id": "echo", - "execution-date": "2023-09-13T07:00:00+00:00", - "map-index": "-1", - "process": "subprocess.py:75" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:15.772819162Z" - }, - { - "textPayload": "Output:", - "insertId": "1o3eswfeto022", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:10:10.203051043Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T07:00:00+00:00", - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "process": "subprocess.py:86", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:15.772819162Z" - }, - { - "textPayload": "test", - "insertId": "1o3eswfeto023", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:10:10.214030654Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "execution-date": "2023-09-13T07:00:00+00:00", - "process": "subprocess.py:93" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:15.772819162Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1o3eswfeto024", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:10:10.215499899Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T07:00:00+00:00", - "worker_id": "airflow-worker-bwrv5", - "process": "subprocess.py:97", - "task-id": "echo", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:15.772819162Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T070000, start_date=20230913T071005, end_date=20230913T071010", - "insertId": "1o3eswfeto025", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:10:10.316841233Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T07:00:00+00:00", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1328", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:15.772819162Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1o3eswfeto026", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:10:12.043587302Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "local_task_job.py:212", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "execution-date": "2023-09-13T07:00:00+00:00", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:15.772819162Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1o3eswfeto027", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:10:12.147754395Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "task-id": "echo", - "map-index": "-1", - "execution-date": "2023-09-13T07:00:00+00:00", - "process": "taskinstance.py:2599", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:15.772819162Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[7ae52f8e-fa01-4c9e-b6ea-07d7a9cfb85e] succeeded in 11.6675271220156s: None", - "insertId": "1o3eswfeto028", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:10:12.425172102Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:15.772819162Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "pj7n8ff69mt61", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:10:52.484381359Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:10:56.065375241Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "d1xhuyflkmmr1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:15:54.537934315Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:15:59.687797308Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[e04d57cd-83ae-4426-87dc-b7d007f31499] received", - "insertId": "o54nzmfbm0ojj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:20:01.413892072Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "[e04d57cd-83ae-4426-87dc-b7d007f31499] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T07:10:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "o54nzmfbm0ojk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:01.431030476Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "o54nzmfbm0ojl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:01.843556962Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "o54nzmfbm0ojm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:01.845792632Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "o54nzmfbm0ojn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:20:02.010854094Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "o54nzmfbm0ojo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:20:03.039279495Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "Running on host airflow-worker-bwrv5", - "insertId": "o54nzmfbm0ojp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:03.629772703Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "o54nzmfbm0ojq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:03.848342086Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:10:00+00:00", - "task-id": "echo", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "o54nzmfbm0ojr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:03.876866323Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "execution-date": "2023-09-13T07:10:00+00:00", - "map-index": "-1", - "process": "taskinstance.py:1091", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "o54nzmfbm0ojs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:03.877323703Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1289", - "execution-date": "2023-09-13T07:10:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "o54nzmfbm0ojt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:03.877998361Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "process": "taskinstance.py:1290", - "execution-date": "2023-09-13T07:10:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "o54nzmfbm0oju", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:03.878481784Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:1291", - "execution-date": "2023-09-13T07:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "o54nzmfbm0ojv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:20:04.374641493Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "o54nzmfbm0ojw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:20:04.374719059Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "o54nzmfbm0ojx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:20:04.394313627Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "o54nzmfbm0ojy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:20:04.394394703Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "Executing on 2023-09-13 07:10:00+00:00", - "insertId": "o54nzmfbm0ojz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:20:05.576760025Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1310", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "execution-date": "2023-09-13T07:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "Started process 1474 to run task", - "insertId": "o54nzmfbm0ok0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:20:05.615082880Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "map-index": "-1", - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-13T07:10:00+00:00", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T07:10:00+00:00', '--job-id', '903', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp6zwk8bge']", - "insertId": "o54nzmfbm0ok1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:20:05.615915764Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:82", - "map-index": "-1", - "execution-date": "2023-09-13T07:10:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "Job 903: Subtask echo", - "insertId": "o54nzmfbm0ok2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:20:05.616769093Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "standard_task_runner.py:83", - "execution-date": "2023-09-13T07:10:00+00:00", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:06.859373415Z" - }, - { - "textPayload": "Running on host airflow-worker-bwrv5", - "insertId": "mhj7p8flepidi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:20:05.959148241Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1", - "worker_id": "airflow-worker-bwrv5", - "process": "task_command.py:393", - "execution-date": "2023-09-13T07:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:11.986472593Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T07:10:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T07:10:00+00:00", - "insertId": "mhj7p8flepidj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:20:06.142107716Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-bwrv5", - "process": "taskinstance.py:1518", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T07:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:11.986472593Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "mhj7p8flepidk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:20:06.144270255Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:10:00+00:00", - "task-id": "echo", - "process": "subprocess.py:63", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:11.986472593Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "mhj7p8flepidl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:06.145853532Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "subprocess.py:75", - "task-id": "echo", - "execution-date": "2023-09-13T07:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:11.986472593Z" - }, - { - "textPayload": "Output:", - "insertId": "mhj7p8flepidm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:20:06.375412145Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:86", - "map-index": "-1", - "worker_id": "airflow-worker-bwrv5", - "execution-date": "2023-09-13T07:10:00+00:00", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:11.986472593Z" - }, - { - "textPayload": "test", - "insertId": "mhj7p8flepidn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:20:06.419427744Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:10:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1", - "task-id": "echo", - "try-number": "1", - "process": "subprocess.py:93" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:11.986472593Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "mhj7p8flepido", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:06.420485463Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "subprocess.py:97", - "worker_id": "airflow-worker-bwrv5", - "execution-date": "2023-09-13T07:10:00+00:00", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:11.986472593Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T071000, start_date=20230913T072003, end_date=20230913T072006", - "insertId": "mhj7p8flepidp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:06.471821988Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1328", - "worker_id": "airflow-worker-bwrv5", - "workflow": "airflow_monitoring", - "map-index": "-1", - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T07:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:11.986472593Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "mhj7p8flepidq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:20:07.454300726Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "local_task_job.py:212", - "try-number": "1", - "execution-date": "2023-09-13T07:10:00+00:00", - "worker_id": "airflow-worker-bwrv5", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:11.986472593Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "mhj7p8flepidr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:20:07.531903181Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "taskinstance.py:2599", - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T07:10:00+00:00", - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:11.986472593Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[e04d57cd-83ae-4426-87dc-b7d007f31499] succeeded in 6.266873739979928s: None", - "insertId": "mhj7p8flepids", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:20:07.687660714Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:11.986472593Z" - }, - { - "textPayload": "I0913 07:20:55.838223 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ukhks0fi10czo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:20:55.838475333Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:02.180612377Z" - }, - { - "textPayload": "I0913 07:20:55.839889 1 airflowworkerset_controller.go:268] \"controllers/AirflowWorkerSet: Worker uses old template. Recreating.\" worker name=\"airflow-worker-bwrv5\"", - "insertId": "ukhks0fi10czp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:55.840042062Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:02.180612377Z" - }, - { - "textPayload": "I0913 07:20:55.858671 1 airflowworkerset_controller.go:77] \"controllers/AirflowWorkerSet: Template changed, workers recreated.\"", - "insertId": "ukhks0fi10czq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:20:55.858948900Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:02.180612377Z" - }, - { - "textPayload": "I0913 07:20:55.859590 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ukhks0fi10czr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:20:55.859703809Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:02.180612377Z" - }, - { - "textPayload": "Caught SIGTERM signal!", - "insertId": "17kddp0fonr136", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:55.899308278Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:59.926781014Z" - }, - { - "textPayload": "Passing SIGTERM to Airflow process.", - "insertId": "17kddp0fonr137", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:55.899781986Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:59.926781014Z" - }, - { - "textPayload": "", - "insertId": "17kddp0fonr138", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:20:55.907483102Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:59.926781014Z" - }, - { - "textPayload": "worker: Warm shutdown (MainProcess)", - "insertId": "17kddp0fonr139", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:20:55.907493679Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:59.926781014Z" - }, - { - "textPayload": "I0913 07:20:55.919604 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ukhks0fi10czs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:20:55.919829590Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:02.180612377Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "17kddp0fonr13a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:20:56.530844812Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:20:59.926781014Z" - }, - { - "textPayload": "Exiting due to SIGTERM.", - "insertId": "1gq7m9wflfsizi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:21:03.468432823Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bwrv5" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:21:04.848567761Z" - }, - { - "textPayload": "I0913 07:21:04.558131 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ut3abqflkixl8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:21:04.558375248Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:11.344695763Z" - }, - { - "textPayload": "I0913 07:21:04.560111 1 airflowworkerset_controller.go:97] \"controllers/AirflowWorkerSet: Workers scale up needed.\" current number of workers=0 desired=1 scaling up by=1", - "insertId": "ut3abqflkixl9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:21:04.560274048Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:11.344695763Z" - }, - { - "textPayload": "I0913 07:21:04.809137 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ut3abqflkixla", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:21:04.809379285Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:11.344695763Z" - }, - { - "textPayload": "I0913 07:21:04.933879 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ut3abqflkixlb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:21:04.934064771Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:11.344695763Z" - }, - { - "textPayload": "I0913 07:21:05.068745 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ut3abqflkixlc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:21:05.068976413Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:11.344695763Z" - }, - { - "textPayload": "I0913 07:21:05.212538 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ut3abqflkixld", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:21:05.212751378Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:11.344695763Z" - }, - { - "textPayload": "I0913 07:21:05.251256 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ut3abqflkixle", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:21:05.251476299Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:11.344695763Z" - }, - { - "textPayload": "I0913 07:21:05.273247 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ut3abqflkixlf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:21:05.273432522Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:11.344695763Z" - }, - { - "textPayload": "Starting the process, got command: worker", - "insertId": "ibf40mflfk73m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:21:06.469258197Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "Initializing airflow.cfg.", - "insertId": "ibf40mflfk73n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:21:06.472392094Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "airflow.cfg initialization is done.", - "insertId": "ibf40mflfk73o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:21:06.488902474Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "I0913 07:21:06.978798 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ut3abqflkixlg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:21:06.979054189Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:11.344695763Z" - }, - { - "textPayload": "I0913 07:21:07.014193 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ut3abqflkixlh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:21:07.014372267Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:11.344695763Z" - }, - { - "textPayload": "Setupping GCS Fuse.", - "insertId": "ibf40mflfk73p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:21:13.827054053Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "gcsfuse mount seems ready, proceeding.", - "insertId": "ibf40mflfk73q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:21:13.827841654Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "Initializing kube_config.", - "insertId": "ibf40mflfk73r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:21:13.845119256Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "Fetching cluster endpoint and auth data.", - "insertId": "ibf40mflfk73s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:21:20.903054738Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "kubeconfig entry generated for us-west1-openlineage-1614b57c-gke.", - "insertId": "ibf40mflfk73t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:21:21.168907130Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "I0913 07:21:24.377663 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "12ku3vkfhxjgy3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:21:24.378859764Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:21:31.454951831Z" - }, - { - "textPayload": "/home/airflow/composer_kube_config is initialized", - "insertId": "ibf40mflfk73u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:21:26.468429175Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "Waiting for dags and plugins synchronization.", - "insertId": "ibf40mflfk73v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:21:26.469012305Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "ibf40mflfk73w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:21:26.469183654Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "Searching for recent worker pod evictions", - "insertId": "ibf40mflfk73x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:21:26.480315466Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "ibf40mflfk73y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:21:31.503372480Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "Finished searching for recent worker pod evictions", - "insertId": "ibf40mflfk73z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:21:33.504674559Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "ibf40mflfk740", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:21:36.516083600Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "ibf40mflfk741", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:21:41.522606509Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "ibf40mflfk742", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:21:46.528922736Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "ibf40mflfk743", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:21:51.535569741Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "ibf40mflfk744", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:21:56.543257259Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "ibf40mflfk745", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:22:01.549554178Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:03.844455698Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "ezr363f3d7acg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:22:06.556313440Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:08.947390013Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "sb49dzf692nvi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:22:11.562800549Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:16.044384775Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1xnm6w0fhub92b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:22:16.569882947Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:21.822538934Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "o53etrfhyutsn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:22:21.576955685Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:26.827115621Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1v5s60kfhwuode", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:22:26.583308258Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:31.821708814Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1hacxjwflijrf9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:22:31.589459994Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:36.825537841Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1wjh87sf6dgjgo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:22:36.597445387Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:41.820826063Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1y5899fiouaa4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:22:41.609470387Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:46.821332380Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1h0aqx3filpavm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:22:46.621636844Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:51.824403428Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1am6letfhtryv5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:22:51.638977676Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:22:56.838428948Z" - }, - { - "textPayload": "Dags and plugins are synced", - "insertId": "1x3qk5qf8c51m7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:22:56.645076630Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:01.937863967Z" - }, - { - "textPayload": "Starting Airflow Celery Flower API.", - "insertId": "1x3qk5qf8c51m8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:22:56.646365237Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:01.937863967Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "sb8y4qfigmlox", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:23:19.118303229Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:25.158752598Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "sb8y4qfigmloy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:23:19.133073647Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:25.158752598Z" - }, - { - "textPayload": " ", - "insertId": "2s7aazfig6g8a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:23:34.208021944Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": " -------------- celery@airflow-worker-n79fs v5.2.7 (dawn-chorus)", - "insertId": "2s7aazfig6g8b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:23:34.208085404Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "2s7aazfig6g8c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:23:34.208094212Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": "-- ******* ---- Linux-5.15.109+-x86_64-with-glibc2.27 2023-09-13 07:23:33", - "insertId": "2s7aazfig6g8d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:23:34.208101378Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": "- *** --- * --- ", - "insertId": "2s7aazfig6g8e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:23:34.208107791Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": "- ** ---------- [config]", - "insertId": "2s7aazfig6g8f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:23:34.208113705Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": "- ** ---------- .> app: airflow.executors.celery_executor:0x7eb1c35c93d0", - "insertId": "2s7aazfig6g8g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:23:34.208119025Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": "- ** ---------- .> transport: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "2s7aazfig6g8h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:23:34.208143395Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": "- ** ---------- .> results: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "2s7aazfig6g8i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:23:34.208155552Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": "- *** --- * --- .> concurrency: 6 (prefork)", - "insertId": "2s7aazfig6g8j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:23:34.208161863Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": "-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)", - "insertId": "2s7aazfig6g8k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:23:34.208167924Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "2s7aazfig6g8l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:23:34.208173813Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": " -------------- [queues]", - "insertId": "2s7aazfig6g8m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:23:34.208179011Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": " .> default exchange=default(direct) key=default", - "insertId": "2s7aazfig6g8n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:23:34.208184039Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": " ", - "insertId": "2s7aazfig6g8o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:23:34.208189381Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": "", - "insertId": "2s7aazfig6g8p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:23:34.208194335Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": "[tasks]", - "insertId": "2s7aazfig6g8q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:23:34.208199469Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": " . airflow.executors.celery_executor.execute_command", - "insertId": "2s7aazfig6g8r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:23:34.208204833Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": "", - "insertId": "2s7aazfig6g8s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:23:34.208266742Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:39.199563587Z" - }, - { - "textPayload": "Connected to redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "469mhifhvsjji", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:23:40.727738230Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "connection.py:22" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:44.832615809Z" - }, - { - "textPayload": "mingle: searching for neighbors", - "insertId": "469mhifhvsjjj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:23:40.811407180Z", - "severity": "INFO", - "labels": { - "process": "mingle.py:40", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:44.832615809Z" - }, - { - "textPayload": "mingle: all alone", - "insertId": "469mhifhvsjjk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:23:41.834573426Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "mingle.py:49" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:44.832615809Z" - }, - { - "textPayload": "celery@airflow-worker-n79fs ready.", - "insertId": "469mhifhvsjjl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:23:41.877521408Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "worker.py:176" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:44.832615809Z" - }, - { - "textPayload": "Events of group {task} enabled by remote.", - "insertId": "469mhifhvsjjm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:23:43.423295388Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "control.py:277" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:23:44.832615809Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1rjtgdefi5672e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:28:06.041523719Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:28:09.961915292Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[0a97ecf9-d21f-4a75-92d1-71ddcef4fe99] received", - "insertId": "bnxvyhf9xgxo0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:30:01.883128901Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "[0a97ecf9-d21f-4a75-92d1-71ddcef4fe99] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T07:20:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "bnxvyhf9xgxo1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:30:01.926791738Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "bnxvyhf9xgxo2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:30:02.331263989Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "bnxvyhf9xgxo3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:30:02.336715973Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "bnxvyhf9xgxo4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:30:02.532111577Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "bnxvyhf9xgxo5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:30:03.452623262Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "bnxvyhf9xgxo6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:30:04.516182559Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "bnxvyhf9xgxo7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:30:04.664674487Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T07:20:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "task-id": "echo", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "bnxvyhf9xgxo8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:30:04.702202046Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "taskinstance.py:1091", - "task-id": "echo", - "execution-date": "2023-09-13T07:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "bnxvyhf9xgxo9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:30:04.702955595Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:20:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "process": "taskinstance.py:1289", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "bnxvyhf9xgxoa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:30:04.703651768Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "taskinstance.py:1290", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-13T07:20:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "bnxvyhf9xgxob", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:30:04.704935403Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:20:00+00:00", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1291", - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "bnxvyhf9xgxoc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:30:05.028419975Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "bnxvyhf9xgxod", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:30:05.028465833Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "bnxvyhf9xgxoe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:30:05.059333036Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "bnxvyhf9xgxof", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:30:05.059377199Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "Executing on 2023-09-13 07:20:00+00:00", - "insertId": "bnxvyhf9xgxog", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:30:06.045241823Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1310", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "airflow_monitoring", - "try-number": "1", - "execution-date": "2023-09-13T07:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "Started process 318 to run task", - "insertId": "bnxvyhf9xgxoh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:30:06.145255852Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1", - "execution-date": "2023-09-13T07:20:00+00:00", - "process": "standard_task_runner.py:55", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T07:20:00+00:00', '--job-id', '905', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmptsvkdpeq']", - "insertId": "bnxvyhf9xgxoi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:30:06.205746271Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:82", - "execution-date": "2023-09-13T07:20:00+00:00", - "map-index": "-1", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "Job 905: Subtask echo", - "insertId": "bnxvyhf9xgxoj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:30:06.206609771Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:83", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-13T07:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:07.874117873Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1xn8s00fov4i49", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:30:06.965269934Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T07:20:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:12.957596295Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T07:20:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T07:20:00+00:00", - "insertId": "1xn8s00fov4i4a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:30:07.604111827Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T07:20:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1", - "process": "taskinstance.py:1518", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:12.957596295Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1xn8s00fov4i4b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:30:07.610170839Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T07:20:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "subprocess.py:63", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:12.957596295Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1xn8s00fov4i4c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:30:07.614120074Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:75", - "execution-date": "2023-09-13T07:20:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:12.957596295Z" - }, - { - "textPayload": "Output:", - "insertId": "1xn8s00fov4i4d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:30:07.826701681Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T07:20:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "subprocess.py:86" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:12.957596295Z" - }, - { - "textPayload": "test", - "insertId": "1xn8s00fov4i4e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:30:07.830820113Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "map-index": "-1", - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T07:20:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "subprocess.py:93" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:12.957596295Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1xn8s00fov4i4f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:30:07.831942605Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T07:20:00+00:00", - "task-id": "echo", - "map-index": "-1", - "process": "subprocess.py:97", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:12.957596295Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T072000, start_date=20230913T073004, end_date=20230913T073007", - "insertId": "1xn8s00fov4i4g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:30:07.944007391Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:20:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "process": "taskinstance.py:1328", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:12.957596295Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1xn8s00fov4i4h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:30:09.047262326Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "execution-date": "2023-09-13T07:20:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "local_task_job.py:212", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:12.957596295Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1xn8s00fov4i4i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:30:09.220617325Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:2599", - "task-id": "echo", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T07:20:00+00:00", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:12.957596295Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[0a97ecf9-d21f-4a75-92d1-71ddcef4fe99] succeeded in 7.620338036998874s: None", - "insertId": "1xn8s00fov4i4j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:30:09.507685515Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:30:12.957596295Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1ie3n4efhqsten", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:33:13.521273777Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:33:15.951346439Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "17utahpfi6r1e5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:38:09.912652584Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:38:16.077881741Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[65d2c6a1-88d1-4014-a179-d76aa302304c] received", - "insertId": "ozcxlgfhux9a5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:40:00.756086317Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "[65d2c6a1-88d1-4014-a179-d76aa302304c] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T07:30:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "ozcxlgfhux9a6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:40:00.761577323Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "ozcxlgfhux9a7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:40:01.118890359Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "ozcxlgfhux9a8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:40:01.121020829Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "ozcxlgfhux9a9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:40:01.249887856Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "ozcxlgfhux9aa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:40:02.233684541Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "ozcxlgfhux9ab", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:40:02.831970734Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "ozcxlgfhux9ac", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:40:02.961010965Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "try-number": "1", - "execution-date": "2023-09-13T07:30:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "ozcxlgfhux9ad", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:40:02.979075867Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:30:00+00:00", - "task-id": "echo", - "process": "taskinstance.py:1091", - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ozcxlgfhux9ae", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:40:02.979661341Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1289", - "execution-date": "2023-09-13T07:30:00+00:00", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "ozcxlgfhux9af", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:40:02.980150892Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T07:30:00+00:00", - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ozcxlgfhux9ag", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:40:02.980618191Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T07:30:00+00:00", - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1291" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "ozcxlgfhux9ah", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:40:03.255806538Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "ozcxlgfhux9ai", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:40:03.255847521Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "ozcxlgfhux9aj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:40:03.275953934Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "ozcxlgfhux9ak", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:40:03.276000554Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "Executing on 2023-09-13 07:30:00+00:00", - "insertId": "ozcxlgfhux9al", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:40:04.191962537Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "taskinstance.py:1310", - "execution-date": "2023-09-13T07:30:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "Started process 546 to run task", - "insertId": "ozcxlgfhux9am", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:40:04.225708804Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T07:30:00+00:00", - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:55" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T07:30:00+00:00', '--job-id', '906', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpvxvncl0f']", - "insertId": "ozcxlgfhux9an", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:40:04.226585866Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T07:30:00+00:00", - "process": "standard_task_runner.py:82", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "Job 906: Subtask echo", - "insertId": "ozcxlgfhux9ao", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:40:04.227626560Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:83", - "execution-date": "2023-09-13T07:30:00+00:00", - "map-index": "-1", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "ozcxlgfhux9ap", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:40:04.639875930Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "process": "task_command.py:393", - "execution-date": "2023-09-13T07:30:00+00:00", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:05.858644094Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T07:30:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T07:30:00+00:00", - "insertId": "1p1njdvfllh5w7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:40:04.889470199Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:1518", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T07:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:10.942319978Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1p1njdvfllh5w8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:40:04.892505228Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:30:00+00:00", - "process": "subprocess.py:63", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:10.942319978Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1p1njdvfllh5w9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:40:04.894236849Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:30:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "subprocess.py:75", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:10.942319978Z" - }, - { - "textPayload": "Output:", - "insertId": "1p1njdvfllh5wa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:40:05.052710314Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "subprocess.py:86", - "task-id": "echo", - "execution-date": "2023-09-13T07:30:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:10.942319978Z" - }, - { - "textPayload": "test", - "insertId": "1p1njdvfllh5wb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:40:05.059277796Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "process": "subprocess.py:93", - "execution-date": "2023-09-13T07:30:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:10.942319978Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1p1njdvfllh5wc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:40:05.060538061Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T07:30:00+00:00", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "process": "subprocess.py:97", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:10.942319978Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T073000, start_date=20230913T074002, end_date=20230913T074005", - "insertId": "1p1njdvfllh5wd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:40:05.118894562Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T07:30:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1328" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:10.942319978Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1p1njdvfllh5we", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:40:05.992604744Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T07:30:00+00:00", - "map-index": "-1", - "process": "local_task_job.py:212", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:10.942319978Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1p1njdvfllh5wf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:40:06.172440873Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:2599", - "try-number": "1", - "execution-date": "2023-09-13T07:30:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:10.942319978Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[65d2c6a1-88d1-4014-a179-d76aa302304c] succeeded in 5.573148157011019s: None", - "insertId": "1p1njdvfllh5wg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:40:06.332530297Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:40:10.942319978Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "10mylthflojxvk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:43:12.121909243Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:43:15.963103309Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[a70bdbc9-86f1-4042-8323-603cf6d3b85f] received", - "insertId": "e5k16ffhuemhr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:23.813069858Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:30.240705921Z" - }, - { - "textPayload": "[a70bdbc9-86f1-4042-8323-603cf6d3b85f] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'run_bq_external_ingestion', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "e5k16ffhuemhs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:23.819688606Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:30.240705921Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "e5k16ffhuemht", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:24.320878179Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:30.240705921Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "e5k16ffhuemhu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:24.323264785Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:30.240705921Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "e5k16ffhuemhv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:24.446131483Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:30.240705921Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "e5k16ffhuemhw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:25.354027491Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:30.240705921Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1yhpff8fi05zyf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:32.170466541Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1yhpff8fi05zyg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:32.286076899Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "task-id": "run_bq_external_ingestion", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1091", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1yhpff8fi05zyh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:32.305607980Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "run_bq_external_ingestion", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1yhpff8fi05zyi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:32.306224943Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "process": "taskinstance.py:1289", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "1yhpff8fi05zyj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:32.306870201Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1290", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "run_bq_external_ingestion", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1yhpff8fi05zyk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:32.307444370Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "run_bq_external_ingestion", - "workflow": "data_analytics_dag", - "try-number": "1", - "process": "taskinstance.py:1291", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1yhpff8fi05zyl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:32.717032964Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1yhpff8fi05zym", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:32.717108090Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1yhpff8fi05zyn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:32.737745753Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1yhpff8fi05zyo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:32.737788714Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "1yhpff8fi05zyp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:34.460582089Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "process": "taskinstance.py:1310", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Started process 700 to run task", - "insertId": "1yhpff8fi05zyq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:34.495965461Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "1", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'run_bq_external_ingestion', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '907', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpwtz_d2tn']", - "insertId": "1yhpff8fi05zyr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:34.498760640Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:82", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Job 907: Subtask run_bq_external_ingestion", - "insertId": "1yhpff8fi05zys", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:34.499455810Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "standard_task_runner.py:83", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1yhpff8fi05zyt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:34.947311974Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "1", - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=run_bq_external_ingestion\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "1yhpff8fi05zyu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:35.226171462Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1518" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "1yhpff8fi05zyv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:35.265582580Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "base.py:73", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Using existing BigQuery table for storing data...", - "insertId": "1yhpff8fi05zyw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:35.268168041Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "run_bq_external_ingestion", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "gcs_to_bigquery.py:375" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "1yhpff8fi05zyx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:35.268884094Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "run_bq_external_ingestion", - "process": "credentials_provider.py:353", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Project is not included in destination_project_dataset_table: holiday_weather.holidays; using project \"acceldata-acm\"", - "insertId": "1yhpff8fi05zyy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:35.315750996Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "bigquery.py:2314", - "try-number": "1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "run_bq_external_ingestion", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Executing: {'load': {'autodetect': True, 'createDisposition': 'CREATE_IF_NEEDED', 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays'}, 'sourceFormat': 'CSV', 'sourceUris': ['gs://openlineagedemo/holidays.csv'], 'writeDisposition': 'WRITE_TRUNCATE', 'ignoreUnknownValues': False, 'schema': {'fields': [{'name': 'Date', 'type': 'DATE'}, {'name': 'Holiday', 'type': 'STRING'}]}, 'skipLeadingRows': 1, 'fieldDelimiter': ',', 'quote': None, 'allowQuotedNewlines': False, 'encoding': 'UTF-8'}}", - "insertId": "1yhpff8fi05zyz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:35.317200567Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "gcs_to_bigquery.py:379", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_run_bq_external_ingestion_2023_09_12T00_00_00_00_00_f8877769c02bde46a9d84ad1ae6296f0", - "insertId": "1yhpff8fi05zz0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:35.320172085Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "bigquery.py:1596", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:38.372314339Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=run_bq_external_ingestion, execution_date=20230912T000000, start_date=20230913T074632, end_date=20230913T074638", - "insertId": "xkvz1hfil30mf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:38.333628407Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:1328", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[8d1590a1-4567-4299-8339-d716ec2335ff] received", - "insertId": "xkvz1hfil30mg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:39.608976081Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[55b8a496-178f-4dd3-816b-0bfbfe1d327b] received", - "insertId": "xkvz1hfil30mh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:39.617705460Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[7c7f6973-93f3-4487-9adc-3e9182326a2f] received", - "insertId": "xkvz1hfil30mi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:39.707699641Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[854535b4-e8ef-4613-877f-b8fd68c43d72] received", - "insertId": "xkvz1hfil30mj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:39.719506192Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "[8d1590a1-4567-4299-8339-d716ec2335ff] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_1997', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "xkvz1hfil30mk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:39.724938847Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "[55b8a496-178f-4dd3-816b-0bfbfe1d327b] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_1998', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "xkvz1hfil30ml", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:39.726409514Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "[7c7f6973-93f3-4487-9adc-3e9182326a2f] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_1999', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "xkvz1hfil30mm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:39.821659966Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "[854535b4-e8ef-4613-877f-b8fd68c43d72] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2000', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "xkvz1hfil30mn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:39.822946761Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[1e6d53e8-8877-476f-9df9-204ca308de47] received", - "insertId": "xkvz1hfil30mo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:39.912026528Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "[1e6d53e8-8877-476f-9df9-204ca308de47] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2018', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "xkvz1hfil30mp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:40.016091268Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "xkvz1hfil30mq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:40.210125602Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "workflow": "data_analytics_dag", - "process": "local_task_job.py:212", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "xkvz1hfil30mr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:41.038620678Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "workflow": "data_analytics_dag", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:2599" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[a70bdbc9-86f1-4042-8323-603cf6d3b85f] succeeded in 18.109488398011308s: None", - "insertId": "xkvz1hfil30ms", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:41.926319529Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[4d57712f-369a-43cc-9df8-67312555c0b5] received", - "insertId": "xkvz1hfil30mt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:42.415008922Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "[4d57712f-369a-43cc-9df8-67312555c0b5] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2019', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "xkvz1hfil30mu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:42.427351563Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "xkvz1hfil30mv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:43.635433177Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "xkvz1hfil30mw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:43.638115818Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "xkvz1hfil30mx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:43.713085249Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "xkvz1hfil30my", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:43.715021060Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:44.477576060Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1vzutvifilpx6c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:44.008362982Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:49.942561407Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1vzutvifilpx6d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:44.013929068Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:49.942561407Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1vzutvifilpx6e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:44.113558504Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:49.942561407Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1vzutvifilpx6f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:44.119296315Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:49.942561407Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1vzutvifilpx6g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:44.222922905Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:49.942561407Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1vzutvifilpx6h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:44.310514676Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:49.942561407Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1vzutvifilpx6i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:44.829435707Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:49.942561407Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1vzutvifilpx6j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:44.920058535Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:49.942561407Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1vzutvifilpx6k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:45.315879111Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:49.942561407Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1vzutvifilpx6l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:45.328174329Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:49.942561407Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1vzutvifilpx6m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:45.416284629Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:49.942561407Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1vzutvifilpx6n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:45.429737374Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:49.942561407Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1vzutvifilpx6o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:45.833954732Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:49.942561407Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1vzutvifilpx6p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:46:46.813112044Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:49.942561407Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "w70twffhv3ucj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:53.324329592Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:55.008377243Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "w70twffhv3uck", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:53.622963122Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:55.008377243Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "w70twffhv3ucl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:54.210452083Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:55.008377243Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "w70twffhv3ucm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:54.432613027Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:46:55.008377243Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1j7x8mbfhzbxuu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:46:55.005168885Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:00.933107790Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1j7x8mbfhzbxuv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:46:56.220788102Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:00.933107790Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "ozhw4pfhuenj3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:23.106731915Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:25.929575711Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "ozhw4pfhuenj4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:24.216615666Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:25.929575711Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "ozhw4pfhuenj5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:24.425319657Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:25.929575711Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ozhw4pfhuenj6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:24.509315720Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1289" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:25.929575711Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "ozhw4pfhuenj7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:24.512137988Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "taskinstance.py:1290", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:25.929575711Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ozhw4pfhuenj8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:24.515906540Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1291", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:25.929575711Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1nxv07lfous39s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:26.436558307Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1nxv07lfous39t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:26.436621161Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1nxv07lfous39u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:26.912796892Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1nxv07lfous39v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:26.912879390Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1nxv07lfous39w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:28.211815280Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "1nxv07lfous39x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:28.815677801Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "taskinstance.py:1310", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "Started process 739 to run task", - "insertId": "1nxv07lfous39y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:29.112959161Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "standard_task_runner.py:55", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1nxv07lfous39z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:29.216027782Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2019", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_1997', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '908', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpmi8h6ghs']", - "insertId": "1nxv07lfous3a0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:29.312736455Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:82", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "Job 908: Subtask join_bq_datasets.bq_join_holidays_weather_data_1997", - "insertId": "1nxv07lfous3a1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:29.313302067Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "process": "standard_task_runner.py:83", - "try-number": "1", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1nxv07lfous3a2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:29.519916765Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2019", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1nxv07lfous3a3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:29.527109945Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1289", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2019", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "1nxv07lfous3a4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:29.530415572Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1290", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2019", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1nxv07lfous3a5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:29.532917656Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1291", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2019", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1nxv07lfous3a6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:30.714053278Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1nxv07lfous3a7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:30.714493757Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1nxv07lfous3a8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:30.807420466Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:31.044599700Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "jpkf5cf6cp6o6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:31.602133633Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "jpkf5cf6cp6o7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:31.602170530Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "jpkf5cf6cp6o8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:31.808785748Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2018", - "process": "taskinstance.py:1091", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "jpkf5cf6cp6o9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:31.827390497Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1091", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1999", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "jpkf5cf6cp6oa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:31.905357570Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "try-number": "1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "jpkf5cf6cp6ob", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:31.909875543Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2000" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "jpkf5cf6cp6oc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:32.110577404Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2018", - "workflow": "data_analytics_dag", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "jpkf5cf6cp6od", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:32.113741930Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1289", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2018" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "jpkf5cf6cp6oe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:32.114526159Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2018", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1290", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "jpkf5cf6cp6of", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:32.115268319Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "process": "taskinstance.py:1291", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2018" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "jpkf5cf6cp6og", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:32.116802395Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1999", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "jpkf5cf6cp6oh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:32.119321845Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2000", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "jpkf5cf6cp6oi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:32.122740997Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "taskinstance.py:1289", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1999", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "jpkf5cf6cp6oj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:32.124571712Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "process": "taskinstance.py:1290", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1999", - "workflow": "data_analytics_dag", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "jpkf5cf6cp6ok", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:32.125236658Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1289", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2000", - "try-number": "1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "jpkf5cf6cp6ol", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:32.125989632Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2000", - "map-index": "-1", - "process": "taskinstance.py:1290", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "jpkf5cf6cp6om", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:32.126715605Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1999", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1291", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "jpkf5cf6cp6on", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:32.130446777Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2000", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "jpkf5cf6cp6oo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:32.218686444Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "jpkf5cf6cp6op", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:32.218759574Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "jpkf5cf6cp6oq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:32.498272235Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "jpkf5cf6cp6or", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:32.929397261Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1998", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "jpkf5cf6cp6os", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:33.108080760Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1998", - "map-index": "-1", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "jpkf5cf6cp6ot", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:33.110983255Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1289", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1998", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "jpkf5cf6cp6ou", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:33.111578554Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1290", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1998", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "jpkf5cf6cp6ov", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:33.112025926Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1998", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_1997\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "jpkf5cf6cp6ow", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:33.266345481Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "process": "taskinstance.py:1518", - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:47:37.206340058Z" - }, - { - "textPayload": "I0913 07:47:36.082741 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ohlp18f69fny1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:36.083021301Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:47:42.358793954Z" - }, - { - "textPayload": "Starting the process, got command: worker", - "insertId": "uj4u1jfimggj0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:36.275519623Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:06.457767173Z" - }, - { - "textPayload": "Initializing airflow.cfg.", - "insertId": "uj4u1jfimggj1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:36.277439403Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:06.457767173Z" - }, - { - "textPayload": "airflow.cfg initialization is done.", - "insertId": "uj4u1jfimggj2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:47:36.286721654Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:06.457767173Z" - }, - { - "textPayload": "I0913 07:47:37.103852 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ohlp18f69fny2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:37.104039689Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:47:42.358793954Z" - }, - { - "textPayload": "Setupping GCS Fuse.", - "insertId": "uj4u1jfimggj3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:43.214239322Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:06.457767173Z" - }, - { - "textPayload": "gcsfuse mount seems ready, proceeding.", - "insertId": "uj4u1jfimggj4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:43.215147807Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:06.457767173Z" - }, - { - "textPayload": "Initializing kube_config.", - "insertId": "uj4u1jfimggj5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:43.215890405Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:06.457767173Z" - }, - { - "textPayload": "Fetching cluster endpoint and auth data.", - "insertId": "uj4u1jfimggj6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:50.126655925Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:06.457767173Z" - }, - { - "textPayload": "kubeconfig entry generated for us-west1-openlineage-1614b57c-gke.", - "insertId": "uj4u1jfimggj7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:50.316103994Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:06.457767173Z" - }, - { - "textPayload": "/home/airflow/composer_kube_config is initialized", - "insertId": "uj4u1jfimggj8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:55.832698495Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:06.457767173Z" - }, - { - "textPayload": "Waiting for dags and plugins synchronization.", - "insertId": "uj4u1jfimggj9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:47:55.833881119Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:06.457767173Z" - }, - { - "textPayload": "Dags and plugins are synced", - "insertId": "uj4u1jfimggja", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:55.833998830Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:06.457767173Z" - }, - { - "textPayload": "Starting Airflow Celery Flower API.", - "insertId": "uj4u1jfimggjb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:55.835674093Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:06.457767173Z" - }, - { - "textPayload": "Searching for recent worker pod evictions", - "insertId": "uj4u1jfimggjc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:47:55.857671917Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:06.457767173Z" - }, - { - "textPayload": "I0913 07:48:13.312504 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "18yhbjhf4gvwuh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:13.312738903Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:48:20.670714345Z" - }, - { - "textPayload": "Finished searching for recent worker pod evictions", - "insertId": "atkwylfhw8u7f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:23.417326847Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:29.697196600Z" - }, - { - "textPayload": "I0913 07:48:25.961000 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ck748kfhudlv3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:25.961258868Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:48:32.758177135Z" - }, - { - "textPayload": "I0913 07:48:25.962724 1 airflowworkerset_controller.go:97] \"controllers/AirflowWorkerSet: Workers scale up needed.\" current number of workers=1 desired=3 scaling up by=2", - "insertId": "1ck748kfhudlv4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:25.962858559Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:48:32.758177135Z" - }, - { - "textPayload": "I0913 07:48:26.481701 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ck748kfhudlv5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:26.481873037Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:48:32.758177135Z" - }, - { - "textPayload": "I0913 07:48:26.831881 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ck748kfhudlv6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:26.832090432Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:48:32.758177135Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "atkwylfhw8u7g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:27.116270163Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:29.697196600Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "atkwylfhw8u7h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:27.127904022Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:29.697196600Z" - }, - { - "textPayload": "I0913 07:48:27.149109 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ck748kfhudlv7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:27.149347777Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:48:32.758177135Z" - }, - { - "textPayload": " ", - "insertId": "s1dpk1floz5f8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:41.923191167Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": " -------------- celery@airflow-worker-n79fs v5.2.7 (dawn-chorus)", - "insertId": "s1dpk1floz5f9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:41.923270539Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "s1dpk1floz5fa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:41.923280123Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": "-- ******* ---- Linux-5.15.109+-x86_64-with-glibc2.27 2023-09-13 07:48:41", - "insertId": "s1dpk1floz5fb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:41.923287107Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": "- *** --- * --- ", - "insertId": "s1dpk1floz5fc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:41.923292222Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": "- ** ---------- [config]", - "insertId": "s1dpk1floz5fd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:41.923298193Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": "- ** ---------- .> app: airflow.executors.celery_executor:0x7c936f799370", - "insertId": "s1dpk1floz5fe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:41.923303855Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": "- ** ---------- .> transport: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "s1dpk1floz5ff", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:41.923310303Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": "- ** ---------- .> results: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "s1dpk1floz5fg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:41.923318086Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": "- *** --- * --- .> concurrency: 6 (prefork)", - "insertId": "s1dpk1floz5fh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:41.923324326Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": "-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)", - "insertId": "s1dpk1floz5fi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:41.923329930Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "s1dpk1floz5fj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:41.923335980Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": " -------------- [queues]", - "insertId": "s1dpk1floz5fk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:41.923341423Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": " .> default exchange=default(direct) key=default", - "insertId": "s1dpk1floz5fl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:41.923347022Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": " ", - "insertId": "s1dpk1floz5fm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:41.923352982Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": "", - "insertId": "s1dpk1floz5fn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:41.923358278Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": "[tasks]", - "insertId": "s1dpk1floz5fo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:41.923363904Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": " . airflow.executors.celery_executor.execute_command", - "insertId": "s1dpk1floz5fp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:41.923370632Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": "", - "insertId": "s1dpk1floz5fq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:41.923376543Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:46.828893370Z" - }, - { - "textPayload": "Connected to redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "1cu2uvqf7xjhpd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:47.517274864Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "connection.py:22" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "mingle: searching for neighbors", - "insertId": "1cu2uvqf7xjhpe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:47.542699091Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "mingle.py:40" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "mingle: all alone", - "insertId": "1cu2uvqf7xjhpf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:48.631184726Z", - "severity": "INFO", - "labels": { - "process": "mingle.py:49", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "celery@airflow-worker-n79fs ready.", - "insertId": "1cu2uvqf7xjhpg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:48.675498067Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "worker.py:176" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[eaf6432f-e7e4-4218-ac6e-756bdef9bd15] received", - "insertId": "1cu2uvqf7xjhph", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:48.680852436Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[8605e031-78fe-462f-8488-7f105853b37a] received", - "insertId": "1cu2uvqf7xjhpi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:48.685417472Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[fb1e0f38-5c53-49c8-9608-b8287b663b90] received", - "insertId": "1cu2uvqf7xjhpj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:48.697001169Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[9c050754-fc38-48c6-866a-196c5aedb09a] received", - "insertId": "1cu2uvqf7xjhpk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:48.706684214Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[3a85d47c-4e58-456f-a66c-65e19979265f] received", - "insertId": "1cu2uvqf7xjhpl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:48.712200908Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[f9c30b5e-e8a7-4171-8b16-8b510a808927] received", - "insertId": "1cu2uvqf7xjhpm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:48.715868528Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "[fb1e0f38-5c53-49c8-9608-b8287b663b90] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2003', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "1cu2uvqf7xjhpn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:48.724720401Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "[eaf6432f-e7e4-4218-ac6e-756bdef9bd15] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2001', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "1cu2uvqf7xjhpo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:48.750338899Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "[9c050754-fc38-48c6-866a-196c5aedb09a] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2004', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "1cu2uvqf7xjhpp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:48.759468228Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "[f9c30b5e-e8a7-4171-8b16-8b510a808927] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2006', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "1cu2uvqf7xjhpq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:48.802113578Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "[8605e031-78fe-462f-8488-7f105853b37a] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2002', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "1cu2uvqf7xjhpr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:48.811099724Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "[3a85d47c-4e58-456f-a66c-65e19979265f] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2005', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "1cu2uvqf7xjhps", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:48.812112650Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1cu2uvqf7xjhpt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:51.213283284Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1cu2uvqf7xjhpu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:51.218047163Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "Events of group {task} enabled by remote.", - "insertId": "1cu2uvqf7xjhpv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:51.302883717Z", - "severity": "INFO", - "labels": { - "process": "control.py:277", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1cu2uvqf7xjhpw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:51.628248922Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1cu2uvqf7xjhpx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:51.632373409Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1cu2uvqf7xjhpy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:51.637101816Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1cu2uvqf7xjhpz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:51.646662608Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1cu2uvqf7xjhq0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:51.712239349Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1cu2uvqf7xjhq1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:51.717537735Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1cu2uvqf7xjhq2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:51.824137405Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1cu2uvqf7xjhq3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:51.826833358Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1cu2uvqf7xjhq4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:51.831258237Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1cu2uvqf7xjhq5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:51.837400704Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1cu2uvqf7xjhq6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:52.321475165Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1cu2uvqf7xjhq7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:52.557754709Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:52.958691014Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1e5e7vfii2xaj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:52.916682266Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:59.055167282Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1e5e7vfii2xak", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:52.930135735Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:59.055167282Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1e5e7vfii2xal", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:48:53.128098Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:59.055167282Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1e5e7vfii2xam", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:48:53.129550993Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:48:59.055167282Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1awewy0fi06lj0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:48:59.306517247Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:05.185576002Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1awewy0fi06lj1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:00.216109284Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:05.185576002Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1awewy0fi06lj2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:00.328571386Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:05.185576002Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1awewy0fi06lj3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:00.922731187Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:05.185576002Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1awewy0fi06lj4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:01.124317814Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:05.185576002Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1awewy0fi06lj5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:01.213854215Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:05.185576002Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1rjtgdefi6zb4e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:26.907103513Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1rjtgdefi6zb4f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:27.307871437Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1rjtgdefi6zb4g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:28.028244188Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1rjtgdefi6zb4h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:28.207893061Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1091", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1rjtgdefi6zb4i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:28.310168456Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1rjtgdefi6zb4j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:28.310249885Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "process": "taskinstance.py:1289" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "1rjtgdefi6zb4k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:28.330645526Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1rjtgdefi6zb4l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:28.330696830Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:1291" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1rjtgdefi6zb4m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:28.418229169Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1rjtgdefi6zb4n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:28.420099170Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1289", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "1rjtgdefi6zb4o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:28.431170026Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "try-number": "1", - "process": "taskinstance.py:1290", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1rjtgdefi6zb4p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:28.436127623Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "workflow": "data_analytics_dag", - "try-number": "1", - "process": "taskinstance.py:1291", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1rjtgdefi6zb4q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:29.054391497Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1rjtgdefi6zb4r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:29.222405570Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1rjtgdefi6zb4s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:29.227093874Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1rjtgdefi6zb4t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:29.445626102Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1rjtgdefi6zb4u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:29.445716387Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1rjtgdefi6zb4v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:29.621071503Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1rjtgdefi6zb4w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:29.621128920Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1rjtgdefi6zb4x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:29.730161102Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1rjtgdefi6zb4y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:29.811911426Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2001", - "process": "taskinstance.py:1091", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1rjtgdefi6zb4z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:29.921113548Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2006", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "taskinstance.py:1091", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1rjtgdefi6zb50", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:29.932619760Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2005", - "process": "taskinstance.py:1091", - "try-number": "1", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1rjtgdefi6zb51", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:29.936794808Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2001", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1rjtgdefi6zb52", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:29.939741591Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1289", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2001", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "1rjtgdefi6zb53", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:29.940655116Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2001", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1290", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1rjtgdefi6zb54", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:29.941369075Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2001", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1291", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1rjtgdefi6zb55", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:30.020868806Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "process": "taskinstance.py:1091", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2006", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1rjtgdefi6zb56", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:30.021882155Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1289", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2006" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "1rjtgdefi6zb57", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:30.022394550Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2006", - "process": "taskinstance.py:1290", - "try-number": "1", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1rjtgdefi6zb58", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:30.023285785Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2006", - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1rjtgdefi6zb59", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:30.129981099Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2005", - "process": "taskinstance.py:1091", - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1rjtgdefi6zb5a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:30.130602644Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1289", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2005", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "1rjtgdefi6zb5b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:30.131156053Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2005", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1290", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1rjtgdefi6zb5c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:30.131530161Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2005", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1rjtgdefi6zb5d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:30.235352137Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "process": "taskinstance.py:1091", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1rjtgdefi6zb5e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:30.254862155Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1091", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1rjtgdefi6zb5f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:30.255237588Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1289", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "1rjtgdefi6zb5g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:30.255606074Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1290", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1rjtgdefi6zb5h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:30.255939835Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "process": "taskinstance.py:1291", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1rjtgdefi6zb5i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:30.605805923Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1rjtgdefi6zb5j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:30.605865653Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1rjtgdefi6zb5k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:30.640911534Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1rjtgdefi6zb5l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:30.640975644Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:31.686528143Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "1qpq14ufopi9fz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:31.087313978Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1310", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Started process 194 to run task", - "insertId": "1qpq14ufopi9g0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:31.102146884Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "try-number": "1", - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2004', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '914', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmp1g93s23b']", - "insertId": "1qpq14ufopi9g1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:31.113913662Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:82", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Job 914: Subtask join_bq_datasets.bq_join_holidays_weather_data_2004", - "insertId": "1qpq14ufopi9g2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:31.114951572Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "standard_task_runner.py:83" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "1qpq14ufopi9g3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:31.436101664Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1310", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Started process 196 to run task", - "insertId": "1qpq14ufopi9g4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:31.448517016Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:55", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2002', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '919', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmp7vbybsot']", - "insertId": "1qpq14ufopi9g5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:31.505946404Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "standard_task_runner.py:82", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Job 919: Subtask join_bq_datasets.bq_join_holidays_weather_data_2002", - "insertId": "1qpq14ufopi9g6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:31.506982005Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "standard_task_runner.py:83", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1qpq14ufopi9g7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:31.612240204Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1qpq14ufopi9g8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:31.612332164Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1qpq14ufopi9g9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:31.719426295Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1qpq14ufopi9ga", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:31.719466373Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1qpq14ufopi9gb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:31.738835536Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393", - "workflow": "data_analytics_dag", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2004\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "1qpq14ufopi9gc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:32.261760067Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1518", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1qpq14ufopi9gd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:32.277007206Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "process": "task_command.py:393", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "1qpq14ufopi9ge", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:32.317803009Z", - "severity": "INFO", - "labels": { - "process": "base.py:73", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2004 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "1qpq14ufopi9gf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:32.323223802Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1", - "process": "bigquery.py:2710", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "1qpq14ufopi9gg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:32.324741017Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "process": "credentials_provider.py:353", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2004_2023_09_12T00_00_00_00_00_2d9374469e1ebb7b1b041a23d8bd981a", - "insertId": "1qpq14ufopi9gh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:32.405894371Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "workflow": "data_analytics_dag", - "try-number": "1", - "process": "bigquery.py:1596", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "1qpq14ufopi9gi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:32.661992425Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "try-number": "1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1310" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Started process 200 to run task", - "insertId": "1qpq14ufopi9gj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:32.810852078Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:55" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2003', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '915', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpurj7yd32']", - "insertId": "1qpq14ufopi9gk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:32.912746770Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "standard_task_runner.py:82", - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Job 915: Subtask join_bq_datasets.bq_join_holidays_weather_data_2003", - "insertId": "1qpq14ufopi9gl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:32.913767408Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "standard_task_runner.py:83", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2002\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "1qpq14ufopi9gm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:33.217381348Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "try-number": "1", - "process": "taskinstance.py:1518" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:49:37.836022289Z" - }, - { - "textPayload": "I0913 07:49:36.668043 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "chi97xfhnm78t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:36.668325768Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:49:42.956567182Z" - }, - { - "textPayload": "I0913 07:49:46.720126 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1vflbw9f6acox8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:46.720314612Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:49:53.071869022Z" - }, - { - "textPayload": "I0913 07:49:46.721660 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1vflbw9f6acox9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:46.721775064Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:49:53.071869022Z" - }, - { - "textPayload": "I0913 07:49:46.766436 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1vflbw9f6acoxa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:46.766699997Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:49:53.071869022Z" - }, - { - "textPayload": "I0913 07:49:46.865880 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1vflbw9f6acoxb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:46.866132924Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:49:53.071869022Z" - }, - { - "textPayload": "I0913 07:49:48.746547 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1vflbw9f6acoxc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:48.746772286Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:49:53.071869022Z" - }, - { - "textPayload": "Starting the process, got command: worker", - "insertId": "1u1fqzzet76aq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:48.974939193Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:01.168524346Z" - }, - { - "textPayload": "Initializing airflow.cfg.", - "insertId": "1u1fqzzet76ar", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:48.976718276Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:01.168524346Z" - }, - { - "textPayload": "airflow.cfg initialization is done.", - "insertId": "1u1fqzzet76as", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:48.985561209Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:01.168524346Z" - }, - { - "textPayload": "I0913 07:49:49.702153 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1vflbw9f6acoxd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:49.702412117Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:49:53.071869022Z" - }, - { - "textPayload": "Setupping GCS Fuse.", - "insertId": "1u1fqzzet76at", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:49:56.409738163Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:01.168524346Z" - }, - { - "textPayload": "gcsfuse mount seems ready, proceeding.", - "insertId": "1u1fqzzet76au", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:49:56.410616889Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:01.168524346Z" - }, - { - "textPayload": "Initializing kube_config.", - "insertId": "1u1fqzzet76av", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:49:56.411846077Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:01.168524346Z" - }, - { - "textPayload": "Fetching cluster endpoint and auth data.", - "insertId": "1ndsfnlfhyn57n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:50:04.023153489Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:07.247718962Z" - }, - { - "textPayload": "kubeconfig entry generated for us-west1-openlineage-1614b57c-gke.", - "insertId": "1ndsfnlfhyn57o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:50:04.313779887Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:07.247718962Z" - }, - { - "textPayload": "/home/airflow/composer_kube_config is initialized", - "insertId": "1o7zi4yf6c1xt2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:50:09.833647800Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:13.335629235Z" - }, - { - "textPayload": "Waiting for dags and plugins synchronization.", - "insertId": "1o7zi4yf6c1xt3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:50:09.834263446Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:13.335629235Z" - }, - { - "textPayload": "Dags and plugins are synced", - "insertId": "1o7zi4yf6c1xt4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:50:09.834289718Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:13.335629235Z" - }, - { - "textPayload": "Starting Airflow Celery Flower API.", - "insertId": "1o7zi4yf6c1xt5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:50:09.835375342Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:13.335629235Z" - }, - { - "textPayload": "Searching for recent worker pod evictions", - "insertId": "1o7zi4yf6c1xt6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:50:09.855868311Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:13.335629235Z" - }, - { - "textPayload": "Finished searching for recent worker pod evictions", - "insertId": "16gfkftf6evkek", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:50:37.120643315Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:43.831239893Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "16gfkftf6evkel", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:50:42.137838209Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:43.831239893Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "16gfkftf6evkem", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:50:42.138965271Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:50:43.831239893Z" - }, - { - "textPayload": " ", - "insertId": "1c0dabhf6cbttb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:50:57.620278597Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": " -------------- celery@airflow-worker-n79fs v5.2.7 (dawn-chorus)", - "insertId": "1c0dabhf6cbttc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:50:57.620311574Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "1c0dabhf6cbttd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:50:57.620363378Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": "-- ******* ---- Linux-5.15.109+-x86_64-with-glibc2.27 2023-09-13 07:50:57", - "insertId": "1c0dabhf6cbtte", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:50:57.620372265Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": "- *** --- * --- ", - "insertId": "1c0dabhf6cbttf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:50:57.620378493Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": "- ** ---------- [config]", - "insertId": "1c0dabhf6cbttg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:50:57.620384505Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": "- ** ---------- .> app: airflow.executors.celery_executor:0x7ab429e7d370", - "insertId": "1c0dabhf6cbtth", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:50:57.620395348Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": "- ** ---------- .> transport: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "1c0dabhf6cbtti", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:50:57.620402657Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": "- ** ---------- .> results: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "1c0dabhf6cbttj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:50:57.620409866Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": "- *** --- * --- .> concurrency: 6 (prefork)", - "insertId": "1c0dabhf6cbttk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:50:57.620416292Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": "-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)", - "insertId": "1c0dabhf6cbttl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:50:57.620421714Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "1c0dabhf6cbttm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:50:57.620427387Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": " -------------- [queues]", - "insertId": "1c0dabhf6cbttn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:50:57.620433077Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": " .> default exchange=default(direct) key=default", - "insertId": "1c0dabhf6cbtto", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:50:57.620439700Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": " ", - "insertId": "1c0dabhf6cbttp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:50:57.620445972Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": "", - "insertId": "1c0dabhf6cbttq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:50:57.620451249Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": "[tasks]", - "insertId": "1c0dabhf6cbttr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:50:57.620457298Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": " . airflow.executors.celery_executor.execute_command", - "insertId": "1c0dabhf6cbtts", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:50:57.620463573Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": "", - "insertId": "1c0dabhf6cbttt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:50:57.620489985Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:01.209959632Z" - }, - { - "textPayload": "Starting the process, got command: worker", - "insertId": "hrnihgfoqc03z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:01.411109970Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:35.871759002Z" - }, - { - "textPayload": "Initializing airflow.cfg.", - "insertId": "hrnihgfoqc040", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:01.413021126Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:35.871759002Z" - }, - { - "textPayload": "Starting the process, got command: worker", - "insertId": "hrnihgfoqc04i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:01.426961991Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:35.871759002Z" - }, - { - "textPayload": "airflow.cfg initialization is done.", - "insertId": "hrnihgfoqc041", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:01.451977967Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:35.871759002Z" - }, - { - "textPayload": "Initializing airflow.cfg.", - "insertId": "hrnihgfoqc04j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:01.464581341Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:35.871759002Z" - }, - { - "textPayload": "airflow.cfg initialization is done.", - "insertId": "hrnihgfoqc04k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:01.579480746Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:35.871759002Z" - }, - { - "textPayload": "Connected to redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "jpbb1qfoq60fg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:03.331442347Z", - "severity": "INFO", - "labels": { - "process": "connection.py:22", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "mingle: searching for neighbors", - "insertId": "jpbb1qfoq60fh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:03.409984625Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "mingle.py:40" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "mingle: all alone", - "insertId": "jpbb1qfoq60fi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:04.439609819Z", - "severity": "INFO", - "labels": { - "process": "mingle.py:49", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "celery@airflow-worker-n79fs ready.", - "insertId": "jpbb1qfoq60fj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:04.477390027Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "worker.py:176" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[eef448e2-cd5c-442b-9686-3c71f2567cea] received", - "insertId": "jpbb1qfoq60fk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:04.482590846Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[83ca6064-4e72-4a03-8257-8c6e2959b9a7] received", - "insertId": "jpbb1qfoq60fl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:04.491881174Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "[eef448e2-cd5c-442b-9686-3c71f2567cea] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2007', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "jpbb1qfoq60fm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:04.493081465Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[b5864fe6-616b-4542-8682-fb5442ec4585] received", - "insertId": "jpbb1qfoq60fn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:04.509104538Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "[83ca6064-4e72-4a03-8257-8c6e2959b9a7] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2008', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "jpbb1qfoq60fo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:04.521367311Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "[b5864fe6-616b-4542-8682-fb5442ec4585] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2009', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "jpbb1qfoq60fp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:04.528125933Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[dfc64cdc-41e3-4cdf-8ef9-641c21e14a4b] received", - "insertId": "jpbb1qfoq60fq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:04.531587527Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[ecacd5a0-fc4e-4879-bc98-6ba49b829bf4] received", - "insertId": "jpbb1qfoq60fr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:04.620127299Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "[dfc64cdc-41e3-4cdf-8ef9-641c21e14a4b] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2010', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "jpbb1qfoq60fs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:04.624942828Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[e3f52823-5cf4-4509-b9f3-434eb5a93a9c] received", - "insertId": "jpbb1qfoq60ft", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:04.627709698Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "[e3f52823-5cf4-4509-b9f3-434eb5a93a9c] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2012', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "jpbb1qfoq60fu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:04.723776797Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "[ecacd5a0-fc4e-4879-bc98-6ba49b829bf4] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2011', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "jpbb1qfoq60fv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:04.726226592Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:07.299502040Z" - }, - { - "textPayload": "Events of group {task} enabled by remote.", - "insertId": "kjkcdnfp1zr8k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:06.820794205Z", - "severity": "INFO", - "labels": { - "process": "control.py:277", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "I0913 07:51:07.172681 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "n18v79finzvfn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:07.172925246Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:51:13.571980784Z" - }, - { - "textPayload": "I0913 07:51:07.231209 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "n18v79finzvfo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:07.231433955Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:51:13.571980784Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "kjkcdnfp1zr8m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:07.237744844Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "I0913 07:51:07.307032 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "n18v79finzvfp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:07.307315756Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:51:13.571980784Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "kjkcdnfp1zr8o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:07.320060641Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "kjkcdnfp1zr8p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:07.411764523Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "kjkcdnfp1zr8q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:07.430778942Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "kjkcdnfp1zr8t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:07.611847392Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "kjkcdnfp1zr8v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:07.622022322Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "kjkcdnfp1zr8y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:07.640431841Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "kjkcdnfp1zr90", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:07.718439854Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "kjkcdnfp1zr94", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:07.746917565Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "kjkcdnfp1zr96", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:07.814909214Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "kjkcdnfp1zr98", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:07.822124002Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "kjkcdnfp1zr9b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:07.835169888Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "kjkcdnfp1zr9d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:08.728693063Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "kjkcdnfp1zr9g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:08.729741400Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "kjkcdnfp1zr9h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:08.737269030Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "kjkcdnfp1zr9i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:08.811641178Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "kjkcdnfp1zr9j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:08.816642411Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "kjkcdnfp1zr9k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:08.839073610Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:13.467818134Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "ut3z4sfhxw7fc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:16.405857401Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:19.629052847Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "ut3z4sfhxw7fd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:16.524450459Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:19.629052847Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "ut3z4sfhxw7fe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:16.525353661Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:19.629052847Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "ut3z4sfhxw7ff", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:16.802868959Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:19.629052847Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "ut3z4sfhxw7fg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:16.824244310Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:19.629052847Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "ut3z4sfhxw7fh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:17.111369924Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:19.629052847Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "xkvz1hfilhn2o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:43.741451229Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "xkvz1hfilhn2p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:43.909272977Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "xkvz1hfilhn2q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:44.632126468Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "xkvz1hfilhn2r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:44.704304860Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "xkvz1hfilhn2s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:44.722336739Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "xkvz1hfilhn2t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:44.906756709Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "xkvz1hfilhn2u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:44.909310872Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1289", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "xkvz1hfilhn2v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:44.912113082Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1290", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "xkvz1hfilhn2w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:44.917140774Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1291", - "map-index": "-1", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "xkvz1hfilhn2x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:45.026677558Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "xkvz1hfilhn2y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:45.032143991Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "process": "taskinstance.py:1289", - "workflow": "data_analytics_dag", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "xkvz1hfilhn2z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:45.033171333Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1290", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "xkvz1hfilhn30", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:45.033995179Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1291", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "xkvz1hfilhn31", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:45.312323513Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1091", - "map-index": "-1", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2008" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "xkvz1hfilhn32", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:45.341261514Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2008", - "process": "taskinstance.py:1091", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "xkvz1hfilhn33", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:45.341809890Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1289", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2008", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "xkvz1hfilhn34", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:45.342389024Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "taskinstance.py:1290", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2008", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "xkvz1hfilhn35", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:45.342874477Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1291", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2008", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "xkvz1hfilhn36", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:45.371526053Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "xkvz1hfilhn37", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:45.374652037Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "xkvz1hfilhn38", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:45.414997660Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "xkvz1hfilhn39", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:45.623713139Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "xkvz1hfilhn3a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:45.623773529Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "xkvz1hfilhn3b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:45.814392761Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "xkvz1hfilhn3c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:45.814425944Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "xkvz1hfilhn3d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:45.832560006Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2010", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "xkvz1hfilhn3e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:46.022404051Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2010", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1091", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "xkvz1hfilhn3f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:46.022763034Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "taskinstance.py:1289", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2010", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "xkvz1hfilhn3g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:46.023189233Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "process": "taskinstance.py:1290", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2010" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "xkvz1hfilhn3h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:46.023937493Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "process": "taskinstance.py:1091", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "xkvz1hfilhn3i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:46.026347265Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2010", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1291", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "xkvz1hfilhn3j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:46.032357285Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2011", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "xkvz1hfilhn3k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:46.135861138Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "xkvz1hfilhn3l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:46.137048424Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "process": "taskinstance.py:1289", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "xkvz1hfilhn3m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:46.137596970Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "workflow": "data_analytics_dag", - "try-number": "1", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "xkvz1hfilhn3n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:46.138035987Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2011", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "xkvz1hfilhn3o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:46.138261901Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "taskinstance.py:1291", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "xkvz1hfilhn3p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:46.139233653Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2011", - "process": "taskinstance.py:1289" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "xkvz1hfilhn3q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:46.140190571Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2011", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "xkvz1hfilhn3r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:46.143037407Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2011", - "process": "taskinstance.py:1291", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "xkvz1hfilhn3s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:47.542577050Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:1310", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Started process 186 to run task", - "insertId": "xkvz1hfilhn3t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:47.556490422Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:55", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2007', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '920', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpdlyz221o']", - "insertId": "xkvz1hfilhn3u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:47.612244460Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:82", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Job 920: Subtask join_bq_datasets.bq_join_holidays_weather_data_2007", - "insertId": "xkvz1hfilhn3v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:47.614026808Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:83", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "xkvz1hfilhn3w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:48.017572696Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "xkvz1hfilhn3x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:48.017614578Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "xkvz1hfilhn3y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:48.118102720Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "xkvz1hfilhn3z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:48.118139467Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "xkvz1hfilhn40", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:48.217168648Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2007\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "xkvz1hfilhn41", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:48.588237506Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1518" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "xkvz1hfilhn42", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:48.635108880Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "process": "base.py:73", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2007 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "xkvz1hfilhn43", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:48.638451102Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "bigquery.py:2710", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "xkvz1hfilhn44", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:48.639247314Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "credentials_provider.py:353", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2007_2023_09_12T00_00_00_00_00_ad335356efae408efbe7b9e132042a8f", - "insertId": "xkvz1hfilhn45", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:48.699334959Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "process": "bigquery.py:1596" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:49.863656102Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "y4obl8f6e2kqc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:48.974807329Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "process": "taskinstance.py:1310", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Started process 191 to run task", - "insertId": "y4obl8f6e2kqd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:48.987582154Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "standard_task_runner.py:55", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2009', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '921', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmprkrjs_ut']", - "insertId": "y4obl8f6e2kqe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:49.014431200Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "standard_task_runner.py:82", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Job 921: Subtask join_bq_datasets.bq_join_holidays_weather_data_2009", - "insertId": "y4obl8f6e2kqf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:49.015983346Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:83", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "y4obl8f6e2kqg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:49.449188877Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "y4obl8f6e2kqh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:49.449242585Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "y4obl8f6e2kqi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:49.624490504Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "y4obl8f6e2kqj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:49.624519365Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "y4obl8f6e2kqk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:49.898455577Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2009\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "y4obl8f6e2kql", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:50.227373342Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "taskinstance.py:1518", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "y4obl8f6e2kqm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:50.270768685Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "process": "base.py:73", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2009 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "y4obl8f6e2kqn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:50.273852491Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "1", - "process": "bigquery.py:2710", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "y4obl8f6e2kqo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:50.275205493Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "credentials_provider.py:353", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2009_2023_09_12T00_00_00_00_00_ef22f768a03ac4d52dbdd05c1aa98e42", - "insertId": "y4obl8f6e2kqp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:50.305325793Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "process": "bigquery.py:1596" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "y4obl8f6e2kqq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:50.654947645Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1310", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Started process 197 to run task", - "insertId": "y4obl8f6e2kqr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:50.702754664Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2012', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '924', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmp64lt4o6b']", - "insertId": "y4obl8f6e2kqs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:50.740015174Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "standard_task_runner.py:82", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Job 924: Subtask join_bq_datasets.bq_join_holidays_weather_data_2012", - "insertId": "y4obl8f6e2kqt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:50.741775559Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:83", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:54.060299767Z" - }, - { - "textPayload": "Setupping GCS Fuse.", - "insertId": "1qpx54ffioqop0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:52.562050225Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:59.028305711Z" - }, - { - "textPayload": "gcsfuse mount seems ready, proceeding.", - "insertId": "1qpx54ffioqop1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:51:52.571945641Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:59.028305711Z" - }, - { - "textPayload": "Setupping GCS Fuse.", - "insertId": "1qpx54ffioqop3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:52.618653170Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:59.028305711Z" - }, - { - "textPayload": "gcsfuse mount seems ready, proceeding.", - "insertId": "1qpx54ffioqop4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:52.618670857Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:59.028305711Z" - }, - { - "textPayload": "Initializing kube_config.", - "insertId": "1qpx54ffioqop2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:52.706627141Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:59.028305711Z" - }, - { - "textPayload": "Initializing kube_config.", - "insertId": "1qpx54ffioqop5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:51:52.722579149Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:51:59.028305711Z" - }, - { - "textPayload": "I0913 07:51:54.306001 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1yhp8opfilk662", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:51:54.306216493Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:52:01.851052765Z" - }, - { - "textPayload": "I0913 07:52:05.727578 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1l5vuxbfejvp4p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:05.727840286Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:52:11.982345586Z" - }, - { - "textPayload": "I0913 07:52:07.474617 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1l5vuxbfejvp4q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:07.474871386Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:52:11.982345586Z" - }, - { - "textPayload": "I0913 07:52:07.566192 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1l5vuxbfejvp4r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:07.566350057Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:52:11.982345586Z" - }, - { - "textPayload": "Fetching cluster endpoint and auth data.", - "insertId": "1fm24p7fi5n13c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:52:10.514337192Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:17.228448596Z" - }, - { - "textPayload": "Fetching cluster endpoint and auth data.", - "insertId": "1fm24p7fi5n13e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:10.529323846Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:17.228448596Z" - }, - { - "textPayload": "kubeconfig entry generated for us-west1-openlineage-1614b57c-gke.", - "insertId": "1fm24p7fi5n13d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:12.020295472Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:17.228448596Z" - }, - { - "textPayload": "kubeconfig entry generated for us-west1-openlineage-1614b57c-gke.", - "insertId": "1fm24p7fi5n13f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:52:12.027649536Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:17.228448596Z" - }, - { - "textPayload": "Starting the process, got command: worker", - "insertId": "toxcrgfoqh6ce", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:52:17.955396607Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:05.283255223Z" - }, - { - "textPayload": "Initializing airflow.cfg.", - "insertId": "toxcrgfoqh6cf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:17.957571171Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:05.283255223Z" - }, - { - "textPayload": "airflow.cfg initialization is done.", - "insertId": "toxcrgfoqh6cg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:52:17.967255605Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:05.283255223Z" - }, - { - "textPayload": "I0913 07:52:18.302886 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "tz7k5rf7x1c23", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:18.303099715Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:52:25.070776461Z" - }, - { - "textPayload": "/home/airflow/composer_kube_config is initialized", - "insertId": "17kqxg6fitmlnv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:21.133999226Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:26.458798273Z" - }, - { - "textPayload": "Waiting for dags and plugins synchronization.", - "insertId": "17kqxg6fitmlnw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:21.134395311Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:26.458798273Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "17kqxg6fitmlnx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:21.141904072Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:26.458798273Z" - }, - { - "textPayload": "Searching for recent worker pod evictions", - "insertId": "17kqxg6fitmlny", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:21.172914693Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:26.458798273Z" - }, - { - "textPayload": "/home/airflow/composer_kube_config is initialized", - "insertId": "17kqxg6fitmlo0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:21.956616492Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:26.458798273Z" - }, - { - "textPayload": "Waiting for dags and plugins synchronization.", - "insertId": "17kqxg6fitmlo1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:21.956651651Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:26.458798273Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "17kqxg6fitmlo2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:21.956665724Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:26.458798273Z" - }, - { - "textPayload": "Searching for recent worker pod evictions", - "insertId": "17kqxg6fitmlo3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:22.030086307Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:26.458798273Z" - }, - { - "textPayload": "Setupping GCS Fuse.", - "insertId": "toxcrgfoqh6ch", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:25.604190373Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:05.283255223Z" - }, - { - "textPayload": "gcsfuse mount seems ready, proceeding.", - "insertId": "toxcrgfoqh6ci", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:52:25.608526921Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:05.283255223Z" - }, - { - "textPayload": "Initializing kube_config.", - "insertId": "toxcrgfoqh6cj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:52:25.609090672Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:05.283255223Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "17kqxg6fitmlnz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:26.183974037Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:26.458798273Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "uj1nqnfi08gp7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:27.100297322Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:32.613140407Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "uj1nqnfi08gp9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:31.254971954Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:32.613140407Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "uj1nqnfi08gp8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:32.178102877Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:32.613140407Z" - }, - { - "textPayload": "Fetching cluster endpoint and auth data.", - "insertId": "toxcrgfoqh6ck", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:52:33.061989769Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:05.283255223Z" - }, - { - "textPayload": "kubeconfig entry generated for us-west1-openlineage-1614b57c-gke.", - "insertId": "toxcrgfoqh6cl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:33.238183542Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:05.283255223Z" - }, - { - "textPayload": "Finished searching for recent worker pod evictions", - "insertId": "1mjt34cf49pgcz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:34.044053851Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:39.703097780Z" - }, - { - "textPayload": "Finished searching for recent worker pod evictions", - "insertId": "1mjt34cf49pgcx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:34.321350759Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:39.703097780Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1mjt34cf49pgcy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:36.300081697Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:39.703097780Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1mjt34cf49pgd0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:37.245998900Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:39.703097780Z" - }, - { - "textPayload": "/home/airflow/composer_kube_config is initialized", - "insertId": "toxcrgfoqh6cm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:38.850549455Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:05.283255223Z" - }, - { - "textPayload": "Waiting for dags and plugins synchronization.", - "insertId": "toxcrgfoqh6cn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:38.851081244Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:05.283255223Z" - }, - { - "textPayload": "Dags and plugins are synced", - "insertId": "toxcrgfoqh6co", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:52:38.851368637Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:05.283255223Z" - }, - { - "textPayload": "Starting Airflow Celery Flower API.", - "insertId": "toxcrgfoqh6cp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:52:38.852464685Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:05.283255223Z" - }, - { - "textPayload": "Searching for recent worker pod evictions", - "insertId": "toxcrgfoqh6cq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:38.923243079Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:05.283255223Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1v5rezxfhz8plm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:41.306219342Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:46.813818589Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1v5rezxfhz8plo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:42.324159085Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:46.813818589Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1v5rezxfhz8pln", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:46.311211445Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:46.813818589Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1iybm61fi1ynio", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:52:47.331625489Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:53.018871039Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1iybm61fi1yniq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:52:51.325874051Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:53.018871039Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1iybm61fi1ynip", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:52.364479188Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:52:53.018871039Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "mr5tjaf6d9il4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:56.333243580Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:02.124273116Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "mr5tjaf6d9il6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:52:57.378282643Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:02.124273116Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "mr5tjaf6d9il5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:01.350454966Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:02.124273116Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "11gorr5fovh0yk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:02.386000698Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:09.294317733Z" - }, - { - "textPayload": "Finished searching for recent worker pod evictions", - "insertId": "vd39njf3zxari", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:05.410323903Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:10.355915420Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "11gorr5fovh0ym", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:06.357601520Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:09.294317733Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "11gorr5fovh0yl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:07.392980067Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:09.294317733Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "uhdj2f874pfu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:11.022760975Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:15.446963498Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "uhdj2f874pfv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:11.111683309Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:15.446963498Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "yyo99efisvyg1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:11.374349366Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:18.381620491Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "yyo99efisvyg3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:12.401922669Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:18.381620491Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "yyo99efisvyg2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:16.381389379Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:18.381620491Z" - }, - { - "textPayload": "Dags and plugins are synced", - "insertId": "epq13ofdtkft3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:17.410439058Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:23.679381649Z" - }, - { - "textPayload": "Starting Airflow Celery Flower API.", - "insertId": "epq13ofdtkft4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:17.412095867Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:23.679381649Z" - }, - { - "textPayload": "Dags and plugins are synced", - "insertId": "epq13ofdtkft5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:21.395966069Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:23.679381649Z" - }, - { - "textPayload": "Starting Airflow Celery Flower API.", - "insertId": "epq13ofdtkft6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:21.396154045Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:23.679381649Z" - }, - { - "textPayload": " ", - "insertId": "hhpgkgechodo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:26.434634425Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": " -------------- celery@airflow-worker-n79fs v5.2.7 (dawn-chorus)", - "insertId": "hhpgkgechodp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:26.434711772Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "hhpgkgechodq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:26.434723796Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": "-- ******* ---- Linux-5.15.109+-x86_64-with-glibc2.27 2023-09-13 07:53:26", - "insertId": "hhpgkgechodr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:26.434731217Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": "- *** --- * --- ", - "insertId": "hhpgkgechods", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:26.434737383Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": "- ** ---------- [config]", - "insertId": "hhpgkgechodt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:26.434743909Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": "- ** ---------- .> app: airflow.executors.celery_executor:0x7af7eedd6370", - "insertId": "hhpgkgechodu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:26.434750407Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": "- ** ---------- .> transport: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "hhpgkgechodv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:26.434757152Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": "- ** ---------- .> results: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "hhpgkgechodw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:26.434765386Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": "- *** --- * --- .> concurrency: 6 (prefork)", - "insertId": "hhpgkgechodx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:26.434771993Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": "-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)", - "insertId": "hhpgkgechody", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:26.434777738Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "hhpgkgechodz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:26.434783109Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": " -------------- [queues]", - "insertId": "hhpgkgechoe0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:26.434788378Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": " .> default exchange=default(direct) key=default", - "insertId": "hhpgkgechoe1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:26.434793544Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": " ", - "insertId": "hhpgkgechoe2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:26.434799144Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": "", - "insertId": "hhpgkgechoe3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:26.434803824Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": "[tasks]", - "insertId": "hhpgkgechoe4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:26.434809567Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": " . airflow.executors.celery_executor.execute_command", - "insertId": "hhpgkgechoe5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:26.434814733Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": "", - "insertId": "hhpgkgechoe6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:26.434977977Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:27.656032750Z" - }, - { - "textPayload": "Connected to redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "1thilvdf6gjymm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:32.130845299Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "connection.py:22" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:33.758748953Z" - }, - { - "textPayload": "mingle: searching for neighbors", - "insertId": "1thilvdf6gjymn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:32.217133919Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "mingle.py:40" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:33.758748953Z" - }, - { - "textPayload": "mingle: all alone", - "insertId": "ozhw4pfhuvlnh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:33.239787167Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "mingle.py:49" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "celery@airflow-worker-n79fs ready.", - "insertId": "ozhw4pfhuvlni", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:33.282876968Z", - "severity": "INFO", - "labels": { - "process": "worker.py:176", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[0f493eeb-3d57-4cb5-b47d-d866fd9af38a] received", - "insertId": "ozhw4pfhuvlnj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:33.287683004Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[430d2ad6-73b1-4228-a37e-5282814dbe8f] received", - "insertId": "ozhw4pfhuvlnk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:33.292467284Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[f0a2cca8-ba75-4a53-8948-17c7c4049212] received", - "insertId": "ozhw4pfhuvlnl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:33.300506327Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[f3a05d68-8343-4c35-a2c8-a96bea125c06] received", - "insertId": "ozhw4pfhuvlnm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:33.306151240Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "[430d2ad6-73b1-4228-a37e-5282814dbe8f] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2014', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "ozhw4pfhuvlnn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:33.309141501Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "[0f493eeb-3d57-4cb5-b47d-d866fd9af38a] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2013', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "ozhw4pfhuvlno", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:33.314096451Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "[f0a2cca8-ba75-4a53-8948-17c7c4049212] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2015', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "ozhw4pfhuvlnp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:33.321801620Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[31cede7d-7419-45e4-aa47-24a11668b889] received", - "insertId": "ozhw4pfhuvlnq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:33.329657688Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "[f3a05d68-8343-4c35-a2c8-a96bea125c06] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2020', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "ozhw4pfhuvlnr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:33.333916658Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[51bc0e0f-19fb-4f8d-a26b-5b6b7b83aa78] received", - "insertId": "ozhw4pfhuvlns", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:33.406758544Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "[31cede7d-7419-45e4-aa47-24a11668b889] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2021', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "ozhw4pfhuvlnt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:33.422889884Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "[51bc0e0f-19fb-4f8d-a26b-5b6b7b83aa78] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2016', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "ozhw4pfhuvlnu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:33.426906086Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "Events of group {task} enabled by remote.", - "insertId": "ozhw4pfhuvlnv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:35.905388696Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "control.py:277" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "ozhw4pfhuvlnw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:36.211829557Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "ozhw4pfhuvlnx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:36.214140Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "ozhw4pfhuvlny", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:36.223375517Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "ozhw4pfhuvlnz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:36.225306874Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "ozhw4pfhuvlo0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:36.232811119Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "ozhw4pfhuvlo1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:36.306554454Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "ozhw4pfhuvlo2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:36.309792156Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "ozhw4pfhuvlo3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:36.320271352Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "ozhw4pfhuvlo4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:36.339134836Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "ozhw4pfhuvlo5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:36.409387981Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "ozhw4pfhuvlo6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:36.414966499Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "ozhw4pfhuvlo7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:36.425466511Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "ozhw4pfhuvlo8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:37.409066245Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "ozhw4pfhuvlo9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:37.512431292Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "ozhw4pfhuvloa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:37.513159717Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "ozhw4pfhuvlob", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:37.516392403Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "ozhw4pfhuvloc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:37.521648399Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "ozhw4pfhuvlod", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:37.616632172Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:38.910214198Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1io2stafihqvh8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:45.424753475Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:50.901162753Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1io2stafihqvh9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:45.514034749Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:50.901162753Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1io2stafihqvha", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:45.533061618Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:50.901162753Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1io2stafihqvhb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:53:45.534970580Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:50.901162753Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1io2stafihqvhc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:45.731740977Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:50.901162753Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1io2stafihqvhd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:46.109724225Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:53:50.901162753Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1n46tx3fi67f36", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:58.493670882Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:05.115451468Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1n46tx3fi67f34", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:58.509229399Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:05.115451468Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1n46tx3fi67f35", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:53:58.510685945Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:05.115451468Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1n46tx3fi67f37", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:53:58.516449162Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:05.115451468Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "ezoezkfi139bq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:13.535508002Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "ezoezkfi139br", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:13.545311414Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "ezoezkfi139bs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:13.979115212Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "ezoezkfi139bt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:14.047454533Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "ezoezkfi139bu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:14.064854276Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "ezoezkfi139bv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:14.220167705Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "ezoezkfi139bw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:14.321270773Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1091", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "ezoezkfi139bx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:14.323169552Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "ezoezkfi139by", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:14.431173915Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ezoezkfi139bz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:14.432852924Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1289", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "ezoezkfi139c0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:14.433489304Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "process": "taskinstance.py:1290", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ezoezkfi139c1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:14.504508869Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "try-number": "1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "ezoezkfi139c2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:14.511572507Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1091", - "map-index": "-1", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "ezoezkfi139c3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:14.518300828Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "process": "taskinstance.py:1091", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ezoezkfi139c4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:14.525628311Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "process": "taskinstance.py:1289", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "ezoezkfi139c5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:14.526355272Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "taskinstance.py:1290", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "workflow": "data_analytics_dag", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ezoezkfi139c6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:14.528524319Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "process": "taskinstance.py:1291", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "ezoezkfi139c7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:14.729293459Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1091", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ezoezkfi139c8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:14.804872119Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "process": "taskinstance.py:1289", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "ezoezkfi139c9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:14.807443609Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "taskinstance.py:1290", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "ezoezkfi139ca", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:14.816605403Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2016", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ezoezkfi139cb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:14.817438981Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "try-number": "1", - "process": "taskinstance.py:1291" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "ezoezkfi139cc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:14.837200370Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "ezoezkfi139cd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:15.019066959Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2014", - "try-number": "1", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "ezoezkfi139ce", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:15.021636878Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2016", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ezoezkfi139cf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:15.023488232Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1289", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2016", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "ezoezkfi139cg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:15.024390794Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "ezoezkfi139ch", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:15.024771334Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2016", - "try-number": "1", - "process": "taskinstance.py:1290", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ezoezkfi139ci", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:15.025582276Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "process": "taskinstance.py:1289", - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ezoezkfi139cj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:15.026368922Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2016", - "map-index": "-1", - "process": "taskinstance.py:1291" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "ezoezkfi139ck", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:15.029259947Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1290", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ezoezkfi139cl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:15.030076831Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1291", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "ezoezkfi139cm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:15.202982230Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1091", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2014", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ezoezkfi139cn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:15.206035626Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2014", - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1289", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "ezoezkfi139co", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:15.207656737Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1290", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2014", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "ezoezkfi139cp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:15.208807844Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2014", - "try-number": "1", - "process": "taskinstance.py:1291", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "ezoezkfi139cq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:15.435034736Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "ezoezkfi139cr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:15.435101677Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "ezoezkfi139cs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:15.504724815Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "ezoezkfi139ct", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:15.504783644Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "ezoezkfi139cu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:16.697122097Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "process": "taskinstance.py:1310", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Started process 188 to run task", - "insertId": "ezoezkfi139cv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:16.739867223Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "standard_task_runner.py:55", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2015', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '926', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpi6ycph4b']", - "insertId": "ezoezkfi139cw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:16.745786428Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:82", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Job 926: Subtask join_bq_datasets.bq_join_holidays_weather_data_2015", - "insertId": "ezoezkfi139cx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:16.746926157Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:83", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "ezoezkfi139cy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:17.166701438Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "ezoezkfi139cz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:17.166774604Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "ezoezkfi139d0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:17.235876418Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "ezoezkfi139d1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:17.235971890Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "ezoezkfi139d2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:17.337952209Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "process": "task_command.py:393", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2015\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "ezoezkfi139d3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:17.668718489Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1518", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "ezoezkfi139d4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:17.716568023Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "base.py:73", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2015 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "ezoezkfi139d5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:17.721207628Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "process": "bigquery.py:2710", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "ezoezkfi139d6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:17.723296135Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "try-number": "1", - "process": "credentials_provider.py:353" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2015_2023_09_12T00_00_00_00_00_75e56798825313cc8a2df29f3fb7d96d", - "insertId": "ezoezkfi139d7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:17.768694680Z", - "severity": "INFO", - "labels": { - "process": "bigquery.py:1596", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:18.191205583Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "1yhpff8fi0v9sj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:18.061720612Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1310" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "Started process 193 to run task", - "insertId": "1yhpff8fi0v9sk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:18.075518985Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "process": "standard_task_runner.py:55" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2013', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '927', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpzdcra4d5']", - "insertId": "1yhpff8fi0v9sl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:18.113642699Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:82" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "Job 927: Subtask join_bq_datasets.bq_join_holidays_weather_data_2013", - "insertId": "1yhpff8fi0v9sm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:18.114325839Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:83", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1yhpff8fi0v9sn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:18.624625296Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1yhpff8fi0v9so", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:18.624658047Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1yhpff8fi0v9sp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:18.726987649Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1yhpff8fi0v9sq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:18.727030709Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1yhpff8fi0v9sr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:18.960338755Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2013\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "1yhpff8fi0v9ss", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:19.316963281Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "process": "taskinstance.py:1518", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "1yhpff8fi0v9st", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:19.356282705Z", - "severity": "INFO", - "labels": { - "process": "base.py:73", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2013 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "1yhpff8fi0v9su", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:19.359952747Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "bigquery.py:2710", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "1yhpff8fi0v9sv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:19.360931737Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "credentials_provider.py:353", - "workflow": "data_analytics_dag", - "try-number": "1", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2013_2023_09_12T00_00_00_00_00_07ac9f5aa0a2c1ac58604b7733667db2", - "insertId": "1yhpff8fi0v9sw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:19.387313793Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "process": "bigquery.py:1596", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "1yhpff8fi0v9sx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:19.734928982Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1310", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "Started process 198 to run task", - "insertId": "1yhpff8fi0v9sy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:19.817099271Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:55" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2020', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '928', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmp0sd0d1ym']", - "insertId": "1yhpff8fi0v9sz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:19.837618456Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "map-index": "-1", - "process": "standard_task_runner.py:82", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "Job 928: Subtask join_bq_datasets.bq_join_holidays_weather_data_2020", - "insertId": "1yhpff8fi0v9t0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:19.838146255Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:83", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:23.102125029Z" - }, - { - "textPayload": "I0913 07:54:24.057965 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ivqnvbflotjme", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:24.058220715Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:54:30.650470450Z" - }, - { - "textPayload": "I0913 07:54:36.735819 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "vcwa06forsdcx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:36.736027190Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:54:43.766642296Z" - }, - { - "textPayload": " ", - "insertId": "1plm3bsfedfoo4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:41.367362335Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": " -------------- celery@airflow-worker-j2x68 v5.2.7 (dawn-chorus)", - "insertId": "1plm3bsfedfoo5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:41.367398498Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "1plm3bsfedfoo6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:41.367405088Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "-- ******* ---- Linux-5.15.109+-x86_64-with-glibc2.27 2023-09-13 07:54:39", - "insertId": "1plm3bsfedfoo7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:41.367448528Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "- *** --- * --- ", - "insertId": "1plm3bsfedfoo8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:41.367454951Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "- ** ---------- [config]", - "insertId": "1plm3bsfedfoo9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:41.367460607Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "- ** ---------- .> app: airflow.executors.celery_executor:0x7c821ae513d0", - "insertId": "1plm3bsfedfooa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:41.367465187Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "- ** ---------- .> transport: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "1plm3bsfedfoob", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:41.367471280Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "- ** ---------- .> results: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "1plm3bsfedfooc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:41.367478142Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "- *** --- * --- .> concurrency: 6 (prefork)", - "insertId": "1plm3bsfedfood", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:41.367509175Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)", - "insertId": "1plm3bsfedfooe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:41.367515081Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "1plm3bsfedfoof", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:41.368277670Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": " -------------- [queues]", - "insertId": "1plm3bsfedfoog", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:41.368294358Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": " .> default exchange=default(direct) key=default", - "insertId": "1plm3bsfedfooh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:41.368318973Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": " ", - "insertId": "1plm3bsfedfooi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:41.368324992Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "", - "insertId": "1plm3bsfedfooj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:41.368329783Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "[tasks]", - "insertId": "1plm3bsfedfook", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:41.368338438Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": " . airflow.executors.celery_executor.execute_command", - "insertId": "1plm3bsfedfool", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:41.368343437Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "", - "insertId": "1plm3bsfedfoom", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:41.368347338Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": " ", - "insertId": "1plm3bsfedfoon", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:41.378294843Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": " -------------- celery@airflow-worker-bbfqt v5.2.7 (dawn-chorus)", - "insertId": "1plm3bsfedfooo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:41.378374316Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "1plm3bsfedfoop", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:41.378383062Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "-- ******* ---- Linux-5.15.109+-x86_64-with-glibc2.27 2023-09-13 07:54:39", - "insertId": "1plm3bsfedfooq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:41.378388641Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "- *** --- * --- ", - "insertId": "1plm3bsfedfoor", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:41.378393110Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "- ** ---------- [config]", - "insertId": "1plm3bsfedfoos", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:41.378398960Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "- ** ---------- .> app: airflow.executors.celery_executor:0x7c1b8ffc93d0", - "insertId": "1plm3bsfedfoot", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:41.378403374Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "- ** ---------- .> transport: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "1plm3bsfedfoou", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:41.378408918Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "- ** ---------- .> results: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "1plm3bsfedfoov", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:41.378415900Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "- *** --- * --- .> concurrency: 6 (prefork)", - "insertId": "1plm3bsfedfoow", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:41.378421140Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)", - "insertId": "1plm3bsfedfoox", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:41.378425844Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "1plm3bsfedfooy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:41.378447583Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": " -------------- [queues]", - "insertId": "1plm3bsfedfooz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:41.378452521Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": " .> default exchange=default(direct) key=default", - "insertId": "1plm3bsfedfop0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:41.378457462Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": " ", - "insertId": "1plm3bsfedfop1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:41.378461762Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "", - "insertId": "1plm3bsfedfop2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:41.378465913Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "[tasks]", - "insertId": "1plm3bsfedfop3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:41.378470841Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": " . airflow.executors.celery_executor.execute_command", - "insertId": "1plm3bsfedfop4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:41.378475595Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "", - "insertId": "1plm3bsfedfop5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:41.379913236Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:47.522724478Z" - }, - { - "textPayload": "Connected to redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "l3ar7kfhy3y1h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:50.783264833Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "connection.py:22" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "mingle: searching for neighbors", - "insertId": "l3ar7kfhy3y1i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:50.857750408Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "mingle.py:40" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "Connected to redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "l3ar7kfhy3y1u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:51.488772823Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68", - "process": "connection.py:22" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "mingle: searching for neighbors", - "insertId": "l3ar7kfhy3y1v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:51.549612330Z", - "severity": "INFO", - "labels": { - "process": "mingle.py:40", - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "mingle: all alone", - "insertId": "l3ar7kfhy3y1j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:52.005558026Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "mingle.py:49" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "celery@airflow-worker-bbfqt ready.", - "insertId": "l3ar7kfhy3y1k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:52.534264429Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "worker.py:176" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[f4e8b731-a51e-4e26-a267-614a4481ac2b] received", - "insertId": "l3ar7kfhy3y1l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:52.599129068Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[cec168ed-573e-4979-add0-69b17415b3a2] received", - "insertId": "l3ar7kfhy3y1m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:52.611545271Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "mingle: all alone", - "insertId": "l3ar7kfhy3y1w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:52.712153656Z", - "severity": "INFO", - "labels": { - "process": "mingle.py:49", - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "[f4e8b731-a51e-4e26-a267-614a4481ac2b] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2017', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "l3ar7kfhy3y1n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:52.731258290Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "[cec168ed-573e-4979-add0-69b17415b3a2] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T07:40:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "l3ar7kfhy3y1o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:52.760677911Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "celery@airflow-worker-j2x68 ready.", - "insertId": "l3ar7kfhy3y1x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:53.131280545Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68", - "process": "worker.py:176" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "Events of group {task} enabled by remote.", - "insertId": "l3ar7kfhy3y1y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:53.368076351Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68", - "process": "control.py:277" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "Events of group {task} enabled by remote.", - "insertId": "l3ar7kfhy3y1p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:53.371737351Z", - "severity": "INFO", - "labels": { - "process": "control.py:277", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "l3ar7kfhy3y1q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:55.160079109Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "l3ar7kfhy3y1r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:55.168026628Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "l3ar7kfhy3y1s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:54:55.213319475Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "l3ar7kfhy3y1t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:55.214425562Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:54:57.448362244Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1mjrh8kfljfmo5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:54:56.834016598Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:03.572397709Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1mjrh8kfljfmo6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:54:56.835687799Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:03.572397709Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1amoeagfimkhej", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:05.325643779Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:11.429922530Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "1amoeagfimkhek", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:05.327705856Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:11.429922530Z" - }, - { - "textPayload": "Running on host airflow-worker-bbfqt", - "insertId": "1amoeagfimkhel", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:07.010244964Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:11.429922530Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1amoeagfimkhem", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:07.531801741Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-bbfqt", - "execution-date": "2023-09-13T07:40:00+00:00", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:11.429922530Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1amoeagfimkhen", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:07.563113888Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "execution-date": "2023-09-13T07:40:00+00:00", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1091", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:11.429922530Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1amoeagfimkheo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:07.563541914Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "map-index": "-1", - "task-id": "echo", - "execution-date": "2023-09-13T07:40:00+00:00", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1289", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:11.429922530Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "1amoeagfimkhep", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:07.563560436Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T07:40:00+00:00", - "worker_id": "airflow-worker-bbfqt", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:11.429922530Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1amoeagfimkheq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:07.563566069Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "taskinstance.py:1291", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-bbfqt", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T07:40:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:11.429922530Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1amoeagfimkher", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:08.955483245Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:11.429922530Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1amoeagfimkhes", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:08.955528310Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:11.429922530Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1amoeagfimkhet", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:09.110386218Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:11.429922530Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1amoeagfimkheu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:09.110458167Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:11.429922530Z" - }, - { - "textPayload": "Executing on 2023-09-13 07:40:00+00:00", - "insertId": "1amoeagfimkhev", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:10.352047665Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:40:00+00:00", - "process": "taskinstance.py:1310", - "try-number": "1", - "worker_id": "airflow-worker-bbfqt", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:11.429922530Z" - }, - { - "textPayload": "Started process 208 to run task", - "insertId": "uj4u1jfin0s1b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:10.480520216Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "execution-date": "2023-09-13T07:40:00+00:00", - "process": "standard_task_runner.py:55", - "workflow": "airflow_monitoring", - "map-index": "-1", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:17.520876424Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T07:40:00+00:00', '--job-id', '932', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmphf24mcrc']", - "insertId": "uj4u1jfin0s1c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:10.510857568Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T07:40:00+00:00", - "map-index": "-1", - "process": "standard_task_runner.py:82", - "task-id": "echo", - "try-number": "1", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:17.520876424Z" - }, - { - "textPayload": "Job 932: Subtask echo", - "insertId": "uj4u1jfin0s1d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:10.510914958Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T07:40:00+00:00", - "task-id": "echo", - "process": "standard_task_runner.py:83", - "worker_id": "airflow-worker-bbfqt", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:17.520876424Z" - }, - { - "textPayload": "Running on host airflow-worker-bbfqt", - "insertId": "uj4u1jfin0s1e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:11.883368053Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-bbfqt", - "map-index": "-1", - "execution-date": "2023-09-13T07:40:00+00:00", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:17.520876424Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T07:40:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T07:40:00+00:00", - "insertId": "uj4u1jfin0s1f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:12.793965953Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "taskinstance.py:1518", - "execution-date": "2023-09-13T07:40:00+00:00", - "worker_id": "airflow-worker-bbfqt", - "workflow": "airflow_monitoring", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:17.520876424Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "uj4u1jfin0s1g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:12.797199640Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:63", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-bbfqt", - "execution-date": "2023-09-13T07:40:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:17.520876424Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "uj4u1jfin0s1h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:12.812360782Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:75", - "workflow": "airflow_monitoring", - "try-number": "1", - "execution-date": "2023-09-13T07:40:00+00:00", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:17.520876424Z" - }, - { - "textPayload": "Output:", - "insertId": "uj4u1jfin0s1i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:13.285012997Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:40:00+00:00", - "worker_id": "airflow-worker-bbfqt", - "map-index": "-1", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "subprocess.py:86" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:17.520876424Z" - }, - { - "textPayload": "test", - "insertId": "uj4u1jfin0s1j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:13.526199682Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:40:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-bbfqt", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "process": "subprocess.py:93" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:17.520876424Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "uj4u1jfin0s1k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:13.526230556Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "subprocess.py:97", - "try-number": "1", - "worker_id": "airflow-worker-bbfqt", - "execution-date": "2023-09-13T07:40:00+00:00", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:17.520876424Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T074000, start_date=20230913T075507, end_date=20230913T075513", - "insertId": "uj4u1jfin0s1l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:13.720772091Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:1328", - "task-id": "echo", - "worker_id": "airflow-worker-bbfqt", - "execution-date": "2023-09-13T07:40:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:17.520876424Z" - }, - { - "textPayload": "Starting the process, got command: worker", - "insertId": "zt5vdmff9ethb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:13.944726396Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": "Initializing airflow.cfg.", - "insertId": "zt5vdmff9ethc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:13.947569806Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": "airflow.cfg initialization is done.", - "insertId": "zt5vdmff9ethd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:14.001252955Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": "I0913 07:55:14.060406 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "2sglxqfchmqhk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:14.060718967Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:55:20.884251659Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "uj4u1jfin0s1m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:14.935077347Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "local_task_job.py:212", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T07:40:00+00:00", - "task-id": "echo", - "worker_id": "airflow-worker-bbfqt", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:17.520876424Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "uj4u1jfin0s1n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:15.058806651Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-bbfqt", - "execution-date": "2023-09-13T07:40:00+00:00", - "map-index": "-1", - "task-id": "echo", - "try-number": "1", - "process": "taskinstance.py:2599" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:17.520876424Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[cec168ed-573e-4979-add0-69b17415b3a2] succeeded in 22.761442101s: None", - "insertId": "uj4u1jfin0s1o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:15.378314879Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:17.520876424Z" - }, - { - "textPayload": "Setupping GCS Fuse.", - "insertId": "zt5vdmff9ethe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:21.025339346Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": "gcsfuse mount seems ready, proceeding.", - "insertId": "zt5vdmff9ethf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:21.025605419Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": "Initializing kube_config.", - "insertId": "zt5vdmff9ethg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:21.025883394Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": "Running on host airflow-worker-bbfqt", - "insertId": "hhkopgfop6yuh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:27.768608373Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Fetching cluster endpoint and auth data.", - "insertId": "zt5vdmff9ethh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:27.839258594Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "hhkopgfop6yui", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:27.897026164Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "taskinstance.py:1091", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "hhkopgfop6yuj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:27.919731441Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "taskinstance.py:1091", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-bbfqt", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "hhkopgfop6yuk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:27.920438960Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1289", - "worker_id": "airflow-worker-bbfqt", - "try-number": "1", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "hhkopgfop6yul", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:27.920800262Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-bbfqt", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1290", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "hhkopgfop6yum", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:27.921132594Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1291", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "kubeconfig entry generated for us-west1-openlineage-1614b57c-gke.", - "insertId": "zt5vdmff9ethi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:28.007581242Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "hhkopgfop6yun", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:28.281296705Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "hhkopgfop6yuo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:28.281361478Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "hhkopgfop6yup", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:28.344540544Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "hhkopgfop6yuq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:28.344573833Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "hhkopgfop6yur", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:29.201614858Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1310", - "worker_id": "airflow-worker-bbfqt", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Started process 224 to run task", - "insertId": "hhkopgfop6yus", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:29.211856070Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "map-index": "-1", - "process": "standard_task_runner.py:55", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2017', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '933', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpy0xysj9i']", - "insertId": "hhkopgfop6yut", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:29.220760447Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "worker_id": "airflow-worker-bbfqt", - "map-index": "-1", - "process": "standard_task_runner.py:82", - "try-number": "1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Job 933: Subtask join_bq_datasets.bq_join_holidays_weather_data_2017", - "insertId": "hhkopgfop6yuu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:29.222531976Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-bbfqt", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "standard_task_runner.py:83" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Running on host airflow-worker-bbfqt", - "insertId": "hhkopgfop6yuv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:29.678427705Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "task_command.py:393", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2017\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "hhkopgfop6yuw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:29.949558095Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1518", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-bbfqt", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "hhkopgfop6yux", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:29.991029528Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-bbfqt", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "process": "base.py:73" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2017 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "hhkopgfop6yuy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:29.993308291Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "process": "bigquery.py:2710", - "worker_id": "airflow-worker-bbfqt", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "hhkopgfop6yuz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:29.994095664Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "try-number": "1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-bbfqt", - "process": "credentials_provider.py:353" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2017_2023_09_12T00_00_00_00_00_77c2adebd04a98f7d8f866cd88c82a1d", - "insertId": "hhkopgfop6yv0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:30.042533033Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-bbfqt", - "map-index": "-1", - "process": "bigquery.py:1596", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2017, execution_date=20230912T000000, start_date=20230913T075527, end_date=20230913T075533", - "insertId": "hhkopgfop6yv1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:33.161291549Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "process": "taskinstance.py:1328", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:34.664135989Z" - }, - { - "textPayload": "/home/airflow/composer_kube_config is initialized", - "insertId": "zt5vdmff9ethj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:33.331509672Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": "Waiting for dags and plugins synchronization.", - "insertId": "zt5vdmff9ethk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:33.332249309Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": "Dags and plugins are synced", - "insertId": "zt5vdmff9ethl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:33.332438140Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": "Starting Airflow Celery Flower API.", - "insertId": "zt5vdmff9ethm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:33.334017166Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": "Searching for recent worker pod evictions", - "insertId": "zt5vdmff9ethn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:33.419467006Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "8m2vcafoshvie", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:35.392041043Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-bbfqt", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "local_task_job.py:212" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:41.768201724Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "8m2vcafoshvif", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:35.472559530Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "worker_id": "airflow-worker-bbfqt", - "process": "taskinstance.py:2599", - "workflow": "data_analytics_dag", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:41.768201724Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[f4e8b731-a51e-4e26-a267-614a4481ac2b] succeeded in 43.070133529999964s: None", - "insertId": "8m2vcafoshvig", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:35.671454502Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:41.768201724Z" - }, - { - "textPayload": "I0913 07:55:46.384408 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ohtpqvf6c3gwi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:46.384633085Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:55:53.162899272Z" - }, - { - "textPayload": "I0913 07:55:46.386051 1 airflowworkerset_controller.go:101] \"controllers/AirflowWorkerSet: Workers scale down needed.\" current number of workers=3 desired=2 scaling down by=1", - "insertId": "1ohtpqvf6c3gwj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:46.386225938Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:55:53.162899272Z" - }, - { - "textPayload": "I0913 07:55:46.426203 1 pod_tasks_checker.go:55] \"controllers/PodTaskChecker: Worker seems not be running any task. Doing double check.\" worker name=\"airflow-worker-j2x68\"", - "insertId": "1ohtpqvf6c3gwk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:46.426404238Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:55:53.162899272Z" - }, - { - "textPayload": "I0913 07:55:47.016971 1 airflowworkerset_controller.go:195] \"controllers/AirflowWorkerSet: Workers deleted.\" number of workers=1", - "insertId": "1ohtpqvf6c3gwl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:47.017193666Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:55:53.162899272Z" - }, - { - "textPayload": "I0913 07:55:47.017902 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ohtpqvf6c3gwm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:47.018049129Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:55:53.162899272Z" - }, - { - "textPayload": "Caught SIGTERM signal!", - "insertId": "uj1nqnfi0hgo7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:47.048855076Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:47.888442912Z" - }, - { - "textPayload": "", - "insertId": "uj1nqnfi0hgo9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:47.049076817Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:47.888442912Z" - }, - { - "textPayload": "Passing SIGTERM to Airflow process.", - "insertId": "uj1nqnfi0hgo8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:55:47.049285467Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:47.888442912Z" - }, - { - "textPayload": "worker: Warm shutdown (MainProcess)", - "insertId": "uj1nqnfi0hgoa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:47.049311127Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:47.888442912Z" - }, - { - "textPayload": "I0913 07:55:47.072575 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ohtpqvf6c3gwn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:47.072774837Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:55:53.162899272Z" - }, - { - "textPayload": "Exiting due to SIGTERM.", - "insertId": "r701dhfikh20u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:50.639850413Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-j2x68" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:55:56.963272724Z" - }, - { - "textPayload": "I0913 07:55:51.674706 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ohtpqvf6c3gwo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:51.674944213Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:55:53.162899272Z" - }, - { - "textPayload": "I0913 07:55:51.806385 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ohtpqvf6c3gwp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:55:51.806543943Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:55:53.162899272Z" - }, - { - "textPayload": "I0913 07:55:52.396393 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "2ib20lfos5bg4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:52.396657986Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:55:59.270621542Z" - }, - { - "textPayload": "I0913 07:55:52.418225 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "2ib20lfos5bg5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:52.418474626Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:55:59.270621542Z" - }, - { - "textPayload": "I0913 07:55:52.426912 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "2ib20lfos5bg6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:52.427160800Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:55:59.270621542Z" - }, - { - "textPayload": "Finished searching for recent worker pod evictions", - "insertId": "zt5vdmff9etho", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:55:59.516691708Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "zt5vdmff9ethp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:56:04.509644733Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "zt5vdmff9ethq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:56:04.530855844Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:05.887782504Z" - }, - { - "textPayload": " ", - "insertId": "lng589flnt7md", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:56:19.614790261Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": " -------------- celery@airflow-worker-n79fs v5.2.7 (dawn-chorus)", - "insertId": "lng589flnt7me", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:56:19.614822461Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "lng589flnt7mf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:56:19.614827512Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": "-- ******* ---- Linux-5.15.109+-x86_64-with-glibc2.27 2023-09-13 07:56:19", - "insertId": "lng589flnt7mg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:56:19.614832146Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": "- *** --- * --- ", - "insertId": "lng589flnt7mh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:56:19.614835820Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": "- ** ---------- [config]", - "insertId": "lng589flnt7mi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:56:19.614861984Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": "- ** ---------- .> app: airflow.executors.celery_executor:0x7811f5e373d0", - "insertId": "lng589flnt7mj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:56:19.614868588Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": "- ** ---------- .> transport: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "lng589flnt7mk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:56:19.614875147Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": "- ** ---------- .> results: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "lng589flnt7ml", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:56:19.614882617Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": "- *** --- * --- .> concurrency: 6 (prefork)", - "insertId": "lng589flnt7mm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:56:19.614889203Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": "-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)", - "insertId": "lng589flnt7mn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:56:19.614895236Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "lng589flnt7mo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:56:19.614901167Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": " -------------- [queues]", - "insertId": "lng589flnt7mp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:56:19.614908783Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": " .> default exchange=default(direct) key=default", - "insertId": "lng589flnt7mq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:56:19.614919432Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": " ", - "insertId": "lng589flnt7mr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:56:19.614946549Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": "", - "insertId": "lng589flnt7ms", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:56:19.614952480Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": "[tasks]", - "insertId": "lng589flnt7mt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:56:19.614958005Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": " . airflow.executors.celery_executor.execute_command", - "insertId": "lng589flnt7mu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:56:19.614963940Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": "", - "insertId": "lng589flnt7mv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:56:19.615071443Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:24.166566594Z" - }, - { - "textPayload": "Connected to redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "utcc45fhu4zqd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:56:25.227263364Z", - "severity": "INFO", - "labels": { - "process": "connection.py:22", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:30.225015667Z" - }, - { - "textPayload": "mingle: searching for neighbors", - "insertId": "utcc45fhu4zqe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:56:25.320615619Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "mingle.py:40" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:30.225015667Z" - }, - { - "textPayload": "sync with celery@airflow-worker-n79fs", - "insertId": "b3qirpflngv9c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:56:25.341079496Z", - "severity": "INFO", - "labels": { - "process": "control.py:310", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:31.180248999Z" - }, - { - "textPayload": "mingle: sync with 1 nodes", - "insertId": "utcc45fhu4zqf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:56:26.349033625Z", - "severity": "INFO", - "labels": { - "process": "mingle.py:43", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:30.225015667Z" - }, - { - "textPayload": "mingle: sync complete", - "insertId": "utcc45fhu4zqg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:56:26.349525815Z", - "severity": "INFO", - "labels": { - "process": "mingle.py:47", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:30.225015667Z" - }, - { - "textPayload": "celery@airflow-worker-n79fs ready.", - "insertId": "utcc45fhu4zqh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:56:26.380813304Z", - "severity": "INFO", - "labels": { - "process": "worker.py:176", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:30.225015667Z" - }, - { - "textPayload": "Events of group {task} enabled by remote.", - "insertId": "utcc45fhu4zqi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:56:28.355242172Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "control.py:277" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:56:30.225015667Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[ad1fda90-e1f6-46b3-9b08-42e208c662e2] received", - "insertId": "1bg3igbflg1mbk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:57:32.338866491Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:36.425975469Z" - }, - { - "textPayload": "[ad1fda90-e1f6-46b3-9b08-42e208c662e2] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_1997', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "1bg3igbflg1mbl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:57:32.345562743Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:36.425975469Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1bg3igbflg1mbm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:57:33.555375655Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:36.425975469Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1bg3igbflg1mbn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:57:33.559397437Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:36.425975469Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1bg3igbflg1mbo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:57:33.945802613Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:36.425975469Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "ahi4tfow5phw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:57:35.949282276Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:42.441818645Z" - }, - { - "textPayload": "Running on host airflow-worker-bbfqt", - "insertId": "11qs004four0u9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:57:45.405654620Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "11qs004four0ua", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:57:45.537951148Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-bbfqt", - "try-number": "2", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "11qs004four0ub", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:57:45.561043574Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "taskinstance.py:1091", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "11qs004four0uc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:57:45.561839403Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "taskinstance.py:1289", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "workflow": "data_analytics_dag", - "try-number": "2", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Starting attempt 2 of 3", - "insertId": "11qs004four0ud", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:57:45.562420311Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1290", - "try-number": "2", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-bbfqt", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "11qs004four0ue", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:57:45.562944383Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "try-number": "2", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-bbfqt", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "11qs004four0uf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:57:45.869768327Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "11qs004four0ug", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:57:45.869806381Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "11qs004four0uh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:57:45.941585494Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "11qs004four0ui", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:57:45.941650938Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "11qs004four0uj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:57:47.263217473Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "process": "taskinstance.py:1310", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Started process 273 to run task", - "insertId": "11qs004four0uk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:57:47.272932900Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt", - "try-number": "2", - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_1997', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '934', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpb5ci1dby']", - "insertId": "11qs004four0ul", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:57:47.278015988Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:82", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Job 934: Subtask join_bq_datasets.bq_join_holidays_weather_data_1997", - "insertId": "11qs004four0um", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:57:47.278529538Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "worker_id": "airflow-worker-bbfqt", - "map-index": "-1", - "process": "standard_task_runner.py:83" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Running on host airflow-worker-bbfqt", - "insertId": "11qs004four0un", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:57:47.717865963Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "workflow": "data_analytics_dag", - "process": "task_command.py:393", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-bbfqt", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_1997\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=2\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "11qs004four0uo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:57:48.017711207Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-bbfqt", - "try-number": "2", - "process": "taskinstance.py:1518" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "11qs004four0up", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:57:48.063205702Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "base.py:73", - "try-number": "2", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_1997 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "11qs004four0uq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:57:48.066356212Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "bigquery.py:2710", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "try-number": "2", - "worker_id": "airflow-worker-bbfqt", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "11qs004four0ur", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:57:48.067215714Z", - "severity": "INFO", - "labels": { - "process": "credentials_provider.py:353", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_1997_2023_09_12T00_00_00_00_00_91458f1213d15e6d291c458153854b6e", - "insertId": "11qs004four0us", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:57:48.107590368Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "process": "bigquery.py:1596", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "worker_id": "airflow-worker-bbfqt", - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:52.383363661Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_1997, execution_date=20230912T000000, start_date=20230913T075745, end_date=20230913T075751", - "insertId": "1pbw1vjfhyzv1t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:57:51.391285299Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "worker_id": "airflow-worker-bbfqt", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1328", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:58.490745839Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1pbw1vjfhyzv1u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:57:52.150914335Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "local_task_job.py:212", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:58.490745839Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1pbw1vjfhyzv1v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:57:52.228727332Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:2599", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_1997", - "try-number": "2", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-bbfqt", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:58.490745839Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[ad1fda90-e1f6-46b3-9b08-42e208c662e2] succeeded in 20.09724721500004s: None", - "insertId": "1pbw1vjfhyzv1w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:57:52.443087434Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:57:58.490745839Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "wqro3mfi0e5cd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:58:30.081538988Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:58:36.739205677Z" - }, - { - "textPayload": "I0913 07:59:03.013312 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1b6jxnrfim170f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:03.013587372Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:59:08.239247958Z" - }, - { - "textPayload": "I0913 07:59:03.016301 1 airflowworkerset_controller.go:101] \"controllers/AirflowWorkerSet: Workers scale down needed.\" current number of workers=2 desired=1 scaling down by=1", - "insertId": "1b6jxnrfim170g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:03.016577658Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:59:08.239247958Z" - }, - { - "textPayload": "I0913 07:59:03.016361 1 pod_tasks_checker.go:55] \"controllers/PodTaskChecker: Worker seems not be running any task. Doing double check.\" worker name=\"airflow-worker-bbfqt\"", - "insertId": "1b6jxnrfim170h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:59:03.016643441Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:59:08.239247958Z" - }, - { - "textPayload": "I0913 07:59:03.078539 1 airflowworkerset_controller.go:195] \"controllers/AirflowWorkerSet: Workers deleted.\" number of workers=1", - "insertId": "1b6jxnrfim170i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:03.078784112Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:59:08.239247958Z" - }, - { - "textPayload": "I0913 07:59:03.079310 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1b6jxnrfim170j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:03.079478742Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:59:08.239247958Z" - }, - { - "textPayload": "Caught SIGTERM signal!", - "insertId": "1amoeagfimv2qu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:03.111994386Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:05.883757050Z" - }, - { - "textPayload": "Passing SIGTERM to Airflow process.", - "insertId": "1amoeagfimv2qv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:03.112053641Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:05.883757050Z" - }, - { - "textPayload": "I0913 07:59:03.134584 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1b6jxnrfim170k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:03.134834929Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:59:08.239247958Z" - }, - { - "textPayload": "", - "insertId": "1amoeagfimv2qw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:03.139828684Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:05.883757050Z" - }, - { - "textPayload": "worker: Warm shutdown (MainProcess)", - "insertId": "1amoeagfimv2qx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:03.139908935Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:05.883757050Z" - }, - { - "textPayload": "Exiting due to SIGTERM.", - "insertId": "hheh34flk9m9t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:09.853469149Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-bbfqt" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:16.010593995Z" - }, - { - "textPayload": "I0913 07:59:10.969589 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1o84dflfosjddt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:59:10.969807254Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:59:16.303264145Z" - }, - { - "textPayload": "I0913 07:59:11.012285 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1o84dflfosjddu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:11.012518825Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:59:16.303264145Z" - }, - { - "textPayload": "I0913 07:59:11.160855 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1o84dflfosjddv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:11.161152866Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:59:16.303264145Z" - }, - { - "textPayload": "I0913 07:59:11.234597 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1o84dflfosjddw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:11.234899355Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:59:16.303264145Z" - }, - { - "textPayload": "I0913 07:59:11.288389 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1o84dflfosjddx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:11.288688833Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T07:59:16.303264145Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[94ca0a19-dc0a-4e3c-a568-aa23591640e2] received", - "insertId": "ibin8cflrxx0p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:34.569045207Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:35.825580301Z" - }, - { - "textPayload": "[94ca0a19-dc0a-4e3c-a568-aa23591640e2] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2003', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "ibin8cflrxx0q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:34.620650766Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:35.825580301Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1iy4yxtflhhz9p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:34.949473221Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:40.832630786Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1iy4yxtflhhz9q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:34.953797780Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:40.832630786Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1iy4yxtflhhz9r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:35.122236274Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:40.832630786Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1iy4yxtflhhz9s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:36.043758958Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:40.832630786Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "13ohq6af7yheen", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:40.391509167Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "13ohq6af7yheeo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:40.536339456Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "13ohq6af7yheep", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:40.554749594Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "13ohq6af7yheeq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:40.555238172Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "try-number": "2", - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1289" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Starting attempt 2 of 3", - "insertId": "13ohq6af7yheer", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:40.555983590Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1290", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "13ohq6af7yhees", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:59:40.556489562Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1291" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "13ohq6af7yheet", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:41.145298064Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "13ohq6af7yheeu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:41.145394293Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "13ohq6af7yheev", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:41.243845674Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "13ohq6af7yheew", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:41.243972595Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "13ohq6af7yheex", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:42.344127618Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1310", - "map-index": "-1", - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Started process 218 to run task", - "insertId": "13ohq6af7yheey", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:42.436443016Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "map-index": "-1", - "process": "standard_task_runner.py:55", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2003', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '935', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmp2eo48isv']", - "insertId": "13ohq6af7yheez", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:42.470143611Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "process": "standard_task_runner.py:82", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Job 935: Subtask join_bq_datasets.bq_join_holidays_weather_data_2003", - "insertId": "13ohq6af7yhef0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:59:42.472336863Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:83", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "13ohq6af7yhef1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:59:43.392522037Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2003\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=2\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "13ohq6af7yhef2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:59:43.897224166Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "process": "taskinstance.py:1518", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "13ohq6af7yhef3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:59:43.949907100Z", - "severity": "INFO", - "labels": { - "process": "base.py:73", - "try-number": "2", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2003 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "13ohq6af7yhef4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:43.953693295Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "bigquery.py:2710" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "13ohq6af7yhef5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:43.955083392Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "process": "credentials_provider.py:353" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2003_2023_09_12T00_00_00_00_00_1bf1470e4bfcdec1a53126d7cfb70de7", - "insertId": "13ohq6af7yhef6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:44.014583163Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "workflow": "data_analytics_dag", - "process": "bigquery.py:1596" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:45.838380842Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2003, execution_date=20230912T000000, start_date=20230913T075940, end_date=20230913T075947", - "insertId": "148tgddfopoues", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:47.105045938Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "taskinstance.py:1328", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:50.924352104Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "148tgddfopouet", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:48.038833515Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "map-index": "-1", - "process": "local_task_job.py:212" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:50.924352104Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "148tgddfopoueu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:48.131422009Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2003", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:2599", - "try-number": "2", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:50.924352104Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[94ca0a19-dc0a-4e3c-a568-aa23591640e2] succeeded in 13.774755968013778s: None", - "insertId": "148tgddfopouev", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:48.347452474Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:50.924352104Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[c28c5190-02b2-4c5e-8ae9-5e7958e65e67] received", - "insertId": "1q63gwwfooizfk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:54.358149051Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:55.824954942Z" - }, - { - "textPayload": "[c28c5190-02b2-4c5e-8ae9-5e7958e65e67] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2002', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "1q63gwwfooizfl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:54.364057728Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:55.824954942Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1q63gwwfooizfm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:59:54.720518664Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:55.824954942Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1q63gwwfooizfn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:54.722429029Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T07:59:55.824954942Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1xnm6w0fhx9gve", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:54.837717911Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:00.902138151Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1xnm6w0fhx9gvf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:59:55.734987518Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:00.902138151Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1xnm6w0fhx9gvg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:59:59.256364554Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:00.902138151Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1xnm6w0fhx9gvh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:59.385394860Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:00.902138151Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1xnm6w0fhx9gvi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:59:59.405200732Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:00.902138151Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1xnm6w0fhx9gvj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:59.405867344Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "try-number": "2", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1289", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:00.902138151Z" - }, - { - "textPayload": "Starting attempt 2 of 3", - "insertId": "1xnm6w0fhx9gvk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:59.406380140Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "process": "taskinstance.py:1290", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "try-number": "2", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:00.902138151Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1xnm6w0fhx9gvl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:59:59.406951437Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "map-index": "-1", - "try-number": "2", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:00.902138151Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1xnm6w0fhx9gvm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T07:59:59.722767064Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:00.902138151Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1xnm6w0fhx9gvn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T07:59:59.722814801Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:00.902138151Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1xnm6w0fhx9gvo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:59:59.745574043Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:00.902138151Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1xnm6w0fhx9gvp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T07:59:59.745615313Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:00.902138151Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "pt2h9vf6g7px1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:00.955460415Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1310", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Started process 226 to run task", - "insertId": "pt2h9vf6g7px2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:00.977239583Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:55", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2002', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '936', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpaefa0xlz']", - "insertId": "pt2h9vf6g7px3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:00.988937715Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "process": "standard_task_runner.py:82", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Job 936: Subtask join_bq_datasets.bq_join_holidays_weather_data_2002", - "insertId": "pt2h9vf6g7px4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:00.989595987Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "process": "standard_task_runner.py:83", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "2", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[219e0582-7762-4ce2-b709-bb808b301967] received", - "insertId": "pt2h9vf6g7px5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:01.311091351Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "[219e0582-7762-4ce2-b709-bb808b301967] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T07:50:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "pt2h9vf6g7px6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:01.359356496Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "pt2h9vf6g7px7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:01.558029157Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "process": "task_command.py:393", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "pt2h9vf6g7px8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:01.933315421Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "pt2h9vf6g7px9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:01.936158532Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2002\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=2\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "pt2h9vf6g7pxa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:02.214088169Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1518" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "pt2h9vf6g7pxb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:02.278805035Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "process": "base.py:73", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2002 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "pt2h9vf6g7pxc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:02.292352288Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "process": "bigquery.py:2710", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "pt2h9vf6g7pxd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:02.292898645Z", - "severity": "INFO", - "labels": { - "process": "credentials_provider.py:353", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "workflow": "data_analytics_dag", - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "pt2h9vf6g7pxe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:02.337251347Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2002_2023_09_12T00_00_00_00_00_d5c6501010cfc8c4a7c53adbe25c5180", - "insertId": "pt2h9vf6g7pxf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:02.349131367Z", - "severity": "INFO", - "labels": { - "process": "bigquery.py:1596", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "pt2h9vf6g7pxg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:03.503585550Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "pt2h9vf6g7pxh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:04.026295480Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "pt2h9vf6g7pxi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:04.160792799Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1091", - "workflow": "airflow_monitoring", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T07:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "pt2h9vf6g7pxj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:04.179278092Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T07:50:00+00:00", - "process": "taskinstance.py:1091", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "pt2h9vf6g7pxk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:04.179527705Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "process": "taskinstance.py:1289", - "workflow": "airflow_monitoring", - "task-id": "echo", - "execution-date": "2023-09-13T07:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "pt2h9vf6g7pxl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:04.179882745Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1290", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T07:50:00+00:00", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "pt2h9vf6g7pxm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:04.180245703Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:50:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "try-number": "1", - "process": "taskinstance.py:1291" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "pt2h9vf6g7pxn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:04.431292558Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "pt2h9vf6g7pxo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:04.431338737Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "pt2h9vf6g7pxp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:04.453384663Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "pt2h9vf6g7pxq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:04.453426490Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Executing on 2023-09-13 07:50:00+00:00", - "insertId": "pt2h9vf6g7pxr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:05.416720784Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:50:00+00:00", - "process": "taskinstance.py:1310", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Started process 246 to run task", - "insertId": "pt2h9vf6g7pxs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:05.429025453Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-13T07:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T07:50:00+00:00', '--job-id', '937', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpa88jk1jd']", - "insertId": "pt2h9vf6g7pxt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:05.514479021Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "echo", - "execution-date": "2023-09-13T07:50:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "process": "standard_task_runner.py:82", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Job 937: Subtask echo", - "insertId": "pt2h9vf6g7pxu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:05.515498611Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:50:00+00:00", - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:83", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2002, execution_date=20230912T000000, start_date=20230913T075959, end_date=20230913T080005", - "insertId": "pt2h9vf6g7pxv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:05.803752299Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "taskinstance.py:1328", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:06.842688860Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "ak0wmqfihu3b4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:06.539130904Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:50:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393", - "task-id": "echo", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:11.883215857Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T07:50:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T07:50:00+00:00", - "insertId": "ak0wmqfihu3b5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:07.109950259Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T07:50:00+00:00", - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:1518", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:11.883215857Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "ak0wmqfihu3b6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:07.113679118Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:50:00+00:00", - "process": "subprocess.py:63", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:11.883215857Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "ak0wmqfihu3b7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:07.115958764Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T07:50:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "subprocess.py:75", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:11.883215857Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "ak0wmqfihu3b8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:07.224833015Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "process": "local_task_job.py:212", - "map-index": "-1", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:11.883215857Z" - }, - { - "textPayload": "Output:", - "insertId": "ak0wmqfihu3b9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:07.320459328Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T07:50:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "subprocess.py:86", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:11.883215857Z" - }, - { - "textPayload": "test", - "insertId": "ak0wmqfihu3ba", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:07.409752607Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "subprocess.py:93", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T07:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:11.883215857Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "ak0wmqfihu3bb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:07.412544763Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T07:50:00+00:00", - "task-id": "echo", - "process": "subprocess.py:97", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:11.883215857Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T075000, start_date=20230913T080004, end_date=20230913T080007", - "insertId": "ak0wmqfihu3bc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:07.526111075Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:1328", - "execution-date": "2023-09-13T07:50:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:11.883215857Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "ak0wmqfihu3bd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:07.802928596Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2002", - "process": "taskinstance.py:2599", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:11.883215857Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[c28c5190-02b2-4c5e-8ae9-5e7958e65e67] succeeded in 13.771968237007968s: None", - "insertId": "ak0wmqfihu3be", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:08.134689256Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:11.883215857Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "ak0wmqfihu3bf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:08.647807981Z", - "severity": "INFO", - "labels": { - "process": "local_task_job.py:212", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T07:50:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:11.883215857Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "ak0wmqfihu3bg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:08.959541117Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T07:50:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:2599", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:11.883215857Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[219e0582-7762-4ce2-b709-bb808b301967] succeeded in 7.994347193016438s: None", - "insertId": "ak0wmqfihu3bh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:09.314662063Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:11.883215857Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[df67510c-7794-4fce-b6c2-8b01e0e4a83c] received", - "insertId": "gdj2omfoz2jhe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:29.970907761Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:36.012049995Z" - }, - { - "textPayload": "[df67510c-7794-4fce-b6c2-8b01e0e4a83c] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2004', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "gdj2omfoz2jhf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:29.975402761Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:36.012049995Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "gdj2omfoz2jhg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:30.319503812Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:36.012049995Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "gdj2omfoz2jhh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:30.321458814Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:36.012049995Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "gdj2omfoz2jhi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:30.435597496Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:36.012049995Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "gdj2omfoz2jhj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:31.313966304Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:36.012049995Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "5a9npifikmakz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:37.915943826Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "5a9npifikmal0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:38.219685260Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "5a9npifikmal1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:38.325557458Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "5a9npifikmal2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:38.325604596Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "try-number": "2", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1289", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Starting attempt 2 of 3", - "insertId": "5a9npifikmal3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:38.325613709Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "workflow": "data_analytics_dag", - "try-number": "2", - "process": "taskinstance.py:1290", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "5a9npifikmal4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:38.325620294Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1291", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "5a9npifikmal5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:39.025538765Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "5a9npifikmal6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:39.025585800Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "5a9npifikmal7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:39.144226222Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "5a9npifikmal8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:39.144263657Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "5a9npifikmal9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:40.327677963Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1310", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "workflow": "data_analytics_dag", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Started process 268 to run task", - "insertId": "5a9npifikmala", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:40.412321702Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "2", - "process": "standard_task_runner.py:55", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2004', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '938', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpv3vzul8_']", - "insertId": "5a9npifikmalb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:40.423811310Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:82", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Job 938: Subtask join_bq_datasets.bq_join_holidays_weather_data_2004", - "insertId": "5a9npifikmalc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:40.424567187Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "workflow": "data_analytics_dag", - "try-number": "2", - "process": "standard_task_runner.py:83" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "5a9npifikmald", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:41.148904089Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "2", - "process": "task_command.py:393", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2004\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=2\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "5a9npifikmale", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:41.929713598Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1518", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "5a9npifikmalf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:42.036786179Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "base.py:73", - "workflow": "data_analytics_dag", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2004 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "5a9npifikmalg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:42.040174734Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "map-index": "-1", - "process": "bigquery.py:2710", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "5a9npifikmalh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:42.041149917Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "map-index": "-1", - "process": "credentials_provider.py:353", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2004_2023_09_12T00_00_00_00_00_a2e712524f08754f5267a0e4fc82c59c", - "insertId": "5a9npifikmali", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:42.151886513Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "bigquery.py:1596", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:44.087211449Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2004, execution_date=20230912T000000, start_date=20230913T080038, end_date=20230913T080046", - "insertId": "1q5tu8efoz06ct", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:46.315488905Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "taskinstance.py:1328", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:51.158805451Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1q5tu8efoz06cu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:46.318546155Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:51.158805451Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1q5tu8efoz06cv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:47.406001113Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "process": "local_task_job.py:212", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:51.158805451Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1q5tu8efoz06cw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:00:47.533147595Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:2599", - "try-number": "2", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2004", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:51.158805451Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[df67510c-7794-4fce-b6c2-8b01e0e4a83c] succeeded in 17.859130303986603s: None", - "insertId": "1q5tu8efoz06cx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:00:47.833196037Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:00:51.158805451Z" - }, - { - "textPayload": "I0913 08:00:48.068852 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "u9dqtsfoqg2ay", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:00:48.069175017Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T08:00:53.336844685Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[079b3135-e1a3-4ce8-93c1-698e7d0f0fad] received", - "insertId": "ptge18fok4sb2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:01:54.097374883Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:01:58.904909060Z" - }, - { - "textPayload": "[079b3135-e1a3-4ce8-93c1-698e7d0f0fad] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2007', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "ptge18fok4sb3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:01:54.097866575Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:01:58.904909060Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "ptge18fok4sb4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:01:54.449270356Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:01:58.904909060Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "ptge18fok4sb5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:01:54.451958776Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:01:58.904909060Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "ptge18fok4sb6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:01:54.622879798Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:01:58.904909060Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "ptge18fok4sb7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:01:55.442704097Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:01:58.904909060Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "7i7bbffi2hkre", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:01:58.977378204Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "7i7bbffi2hkrf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:01:59.094911257Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "7i7bbffi2hkrg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:01:59.116695569Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "7i7bbffi2hkrh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:01:59.117715148Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1289", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Starting attempt 2 of 3", - "insertId": "7i7bbffi2hkri", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:01:59.118432967Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "taskinstance.py:1290", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "7i7bbffi2hkrj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:01:59.119042912Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1291", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "7i7bbffi2hkrk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:01:59.468454587Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "7i7bbffi2hkrl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:01:59.468495418Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "7i7bbffi2hkrm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:01:59.518818388Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "7i7bbffi2hkrn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:01:59.518852128Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "7i7bbffi2hkro", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:00.628081107Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1310", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Started process 306 to run task", - "insertId": "7i7bbffi2hkrp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:00.639307991Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "2", - "process": "standard_task_runner.py:55" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2007', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '939', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmprg16mc42']", - "insertId": "7i7bbffi2hkrq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:00.644356032Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "standard_task_runner.py:82", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Job 939: Subtask join_bq_datasets.bq_join_holidays_weather_data_2007", - "insertId": "7i7bbffi2hkrr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:00.644861230Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:83", - "workflow": "data_analytics_dag", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "7i7bbffi2hkrs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:01.031827776Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2007\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=2\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "7i7bbffi2hkrt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:01.288716420Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1518", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "7i7bbffi2hkru", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:01.327104318Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "process": "base.py:73" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2007 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "7i7bbffi2hkrv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:01.329983234Z", - "severity": "INFO", - "labels": { - "process": "bigquery.py:2710", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "7i7bbffi2hkrw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:01.330929535Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "worker_id": "airflow-worker-n79fs", - "process": "credentials_provider.py:353", - "try-number": "2", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2007_2023_09_12T00_00_00_00_00_faa0a3379eda7250987abdf1ab06fead", - "insertId": "7i7bbffi2hkrx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:01.370678586Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "bigquery.py:1596", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:04.839108861Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2007, execution_date=20230912T000000, start_date=20230913T080159, end_date=20230913T080204", - "insertId": "1trej3bflleb5t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:04.083450568Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1328", - "map-index": "-1", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:09.834834247Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1trej3bflleb5u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:04.923580684Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "process": "local_task_job.py:212", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:09.834834247Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1trej3bflleb5v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:05.033951956Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2007", - "process": "taskinstance.py:2599", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:09.834834247Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[079b3135-e1a3-4ce8-93c1-698e7d0f0fad] succeeded in 11.124393017991679s: None", - "insertId": "1trej3bflleb5w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:05.216572214Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:09.834834247Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[fcee0ec2-7628-4928-8ce3-0195256225a8] received", - "insertId": "13ohq6af7yqy9h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:15.265848750Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:20.833295989Z" - }, - { - "textPayload": "[fcee0ec2-7628-4928-8ce3-0195256225a8] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2009', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "13ohq6af7yqy9i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:15.273064205Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:20.833295989Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "13ohq6af7yqy9j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:15.611892767Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:20.833295989Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "13ohq6af7yqy9k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:15.614280965Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:20.833295989Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "13ohq6af7yqy9l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:15.725069077Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:20.833295989Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "13ohq6af7yqy9m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:16.626775670Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:20.833295989Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "chq5mef6f5g8j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:20.321427795Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "chq5mef6f5g8k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:20.479688022Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "map-index": "-1", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "chq5mef6f5g8l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:20.498364856Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "chq5mef6f5g8m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:20.498684728Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "process": "taskinstance.py:1289", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Starting attempt 2 of 3", - "insertId": "chq5mef6f5g8n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:20.499129478Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "process": "taskinstance.py:1290", - "try-number": "2", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "chq5mef6f5g8o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:20.499535876Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "process": "taskinstance.py:1291", - "map-index": "-1", - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "chq5mef6f5g8p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:20.805843032Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "chq5mef6f5g8q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:20.805910630Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "chq5mef6f5g8r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:20.840253946Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "chq5mef6f5g8s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:20.840309939Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "chq5mef6f5g8t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:21.835971793Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "workflow": "data_analytics_dag", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1310" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Started process 320 to run task", - "insertId": "chq5mef6f5g8u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:21.846552307Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:55", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2009', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '940', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpjzrm223z']", - "insertId": "chq5mef6f5g8v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:21.852349753Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "standard_task_runner.py:82", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Job 940: Subtask join_bq_datasets.bq_join_holidays_weather_data_2009", - "insertId": "chq5mef6f5g8w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:21.852935712Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:83", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "try-number": "2", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "chq5mef6f5g8x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:22.250333180Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2009\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=2\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "chq5mef6f5g8y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:22.546713845Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1518", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "chq5mef6f5g8z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:22.591845249Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "workflow": "data_analytics_dag", - "try-number": "2", - "process": "base.py:73", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2009 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "chq5mef6f5g90", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:22.595816804Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "workflow": "data_analytics_dag", - "process": "bigquery.py:2710", - "try-number": "2", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "chq5mef6f5g91", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:22.596609597Z", - "severity": "INFO", - "labels": { - "process": "credentials_provider.py:353", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2009_2023_09_12T00_00_00_00_00_6cbf7f72400f73911ccc179c88ed9590", - "insertId": "chq5mef6f5g92", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:22.649085600Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "bigquery.py:1596" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:25.834655368Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2009, execution_date=20230912T000000, start_date=20230913T080220, end_date=20230913T080225", - "insertId": "134jscifll9piu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:25.462182771Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "map-index": "-1", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1328" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:30.896087773Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "134jscifll9piv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:26.132516176Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "process": "local_task_job.py:212", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:30.896087773Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "134jscifll9piw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:26.205190681Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2009", - "process": "taskinstance.py:2599", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:30.896087773Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[fcee0ec2-7628-4928-8ce3-0195256225a8] succeeded in 11.0978053540166s: None", - "insertId": "134jscifll9pix", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:26.368317928Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:30.896087773Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[50b60458-16ea-4c30-b874-dae19663c875] received", - "insertId": "6drpfnfbdueu5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:49.315898275Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:54.833068604Z" - }, - { - "textPayload": "[50b60458-16ea-4c30-b874-dae19663c875] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2012', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "6drpfnfbdueu6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:49.321193963Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:54.833068604Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "6drpfnfbdueu7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:49.723662023Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:54.833068604Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "6drpfnfbdueu8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:49.725852452Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:54.833068604Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "6drpfnfbdueu9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:49.850726635Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:54.833068604Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "6drpfnfbdueua", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:50.738699524Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:54.833068604Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "lxguaxf4hynol", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:54.299872128Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "lxguaxf4hynom", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:54.423193709Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1091", - "try-number": "2", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "lxguaxf4hynon", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:54.449182237Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "try-number": "2", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "lxguaxf4hynoo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:54.449661848Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "process": "taskinstance.py:1289", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Starting attempt 2 of 3", - "insertId": "lxguaxf4hynop", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:54.450187884Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1290", - "try-number": "2", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "lxguaxf4hynoq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:54.450737920Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "lxguaxf4hynor", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:54.828161962Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "lxguaxf4hynos", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:54.828240033Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "lxguaxf4hynot", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:54.873068066Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "lxguaxf4hynou", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:54.873112724Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "lxguaxf4hynov", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:55.850426803Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1310", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Started process 334 to run task", - "insertId": "lxguaxf4hynow", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:55.860807877Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "standard_task_runner.py:55", - "try-number": "2", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2012', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '941', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmphynl2cju']", - "insertId": "lxguaxf4hynox", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:55.866049104Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "process": "standard_task_runner.py:82", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Job 941: Subtask join_bq_datasets.bq_join_holidays_weather_data_2012", - "insertId": "lxguaxf4hynoy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:55.866589217Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "map-index": "-1", - "process": "standard_task_runner.py:83", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "lxguaxf4hynoz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:02:56.301365133Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2012\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=2\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "lxguaxf4hynp0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:56.621353495Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1518" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "lxguaxf4hynp1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:02:56.662670409Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "base.py:73", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2012 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "lxguaxf4hynp2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:56.665939712Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "process": "bigquery.py:2710", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "lxguaxf4hynp3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:56.666717157Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "2", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "process": "credentials_provider.py:353", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2012_2023_09_12T00_00_00_00_00_40d5448dabf378605edf4ce84d22b5db", - "insertId": "lxguaxf4hynp4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:56.714401018Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "try-number": "2", - "map-index": "-1", - "process": "bigquery.py:1596", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:02:59.831462837Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2012, execution_date=20230912T000000, start_date=20230913T080254, end_date=20230913T080259", - "insertId": "13eqkc5fls5pf2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:02:59.699024517Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1328", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:03:04.932375937Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "13eqkc5fls5pf3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:03:00.540470795Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "local_task_job.py:212", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:03:04.932375937Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "13eqkc5fls5pf4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:03:00.622147101Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:2599", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2012", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "2", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:03:04.932375937Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[50b60458-16ea-4c30-b874-dae19663c875] succeeded in 11.519610292016296s: None", - "insertId": "13eqkc5fls5pf5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:03:00.838907968Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:03:04.932375937Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[6d4af529-0069-4ff3-815a-ad7b107de69e] received", - "insertId": "chq5mef6fctww", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:04:58.083283951Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:00.664857485Z" - }, - { - "textPayload": "[6d4af529-0069-4ff3-815a-ad7b107de69e] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2015', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "chq5mef6fctwx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:04:58.088818474Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:00.664857485Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "chq5mef6fctwy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:04:58.463350811Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:00.664857485Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "chq5mef6fctwz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:04:58.465198041Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:00.664857485Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "chq5mef6fctx0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:04:58.621703925Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:00.664857485Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "chq5mef6fctx1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:04:59.526611672Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:00.664857485Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "14758qfinqyxr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:03.969531861Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "14758qfinqyxs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:04.138917215Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1091", - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "14758qfinqyxt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:04.169390479Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "14758qfinqyxu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:04.169434149Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1289", - "try-number": "2", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Starting attempt 2 of 3", - "insertId": "14758qfinqyxv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:04.169442499Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1290", - "map-index": "-1", - "try-number": "2", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "14758qfinqyxw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:04.169448362Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1291", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "14758qfinqyxx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:04.532273169Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "14758qfinqyxy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:04.532316950Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "14758qfinqyxz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:04.570372163Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "14758qfinqyy0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:04.570415042Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "14758qfinqyy1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:05.826421530Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1310", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Started process 391 to run task", - "insertId": "14758qfinqyy2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:05.862883777Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "workflow": "data_analytics_dag", - "try-number": "2", - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2015', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '942', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpc923f78c']", - "insertId": "14758qfinqyy3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:05.864031480Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:82", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Job 942: Subtask join_bq_datasets.bq_join_holidays_weather_data_2015", - "insertId": "14758qfinqyy4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:05.865899830Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "2", - "process": "standard_task_runner.py:83", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "14758qfinqyy5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:06.243265609Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2015\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=2\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "14758qfinqyy6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:06.538575363Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1518", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "14758qfinqyy7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:06.580269841Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "workflow": "data_analytics_dag", - "process": "base.py:73", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2015 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "14758qfinqyy8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:06.582969681Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "bigquery.py:2710", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "14758qfinqyy9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:06.583879707Z", - "severity": "INFO", - "labels": { - "process": "credentials_provider.py:353", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2015_2023_09_12T00_00_00_00_00_04887c8ce29cbce76ad3d33c3278cc87", - "insertId": "14758qfinqyya", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:06.631223286Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "bigquery.py:1596", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:09.805140484Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2015, execution_date=20230912T000000, start_date=20230913T080504, end_date=20230913T080509", - "insertId": "12kgy4ffi4jr5t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:09.355276139Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "2", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1328" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:14.981748641Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "12kgy4ffi4jr5u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:10.317845349Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "process": "local_task_job.py:212" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:14.981748641Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "12kgy4ffi4jr5v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:10.415114480Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:2599", - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2015" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:14.981748641Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[6d4af529-0069-4ff3-815a-ad7b107de69e] succeeded in 12.573819672019454s: None", - "insertId": "12kgy4ffi4jr5w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:10.660581240Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:14.981748641Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[a5892c08-ffb3-41f3-b1db-de096109a821] received", - "insertId": "f9mostfoqdfj2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:33.153645046Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:34.836204490Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[2d70a3e5-4d9c-4044-8bf5-6ec81d9ddca7] received", - "insertId": "f9mostfoqdfj3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:33.159332015Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:34.836204490Z" - }, - { - "textPayload": "[a5892c08-ffb3-41f3-b1db-de096109a821] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2013', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "f9mostfoqdfj4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:33.165345040Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:34.836204490Z" - }, - { - "textPayload": "[2d70a3e5-4d9c-4044-8bf5-6ec81d9ddca7] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2020', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "f9mostfoqdfj5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:33.165402163Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:34.836204490Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "d1uasgflhruqa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:33.927094802Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:39.833489916Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "d1uasgflhruqb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:33.929510144Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:39.833489916Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "d1uasgflhruqc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:33.935076216Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:39.833489916Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "d1uasgflhruqd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:33.937383894Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:39.833489916Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "d1uasgflhruqe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:34.233306535Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:39.833489916Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "d1uasgflhruqf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:34.315735753Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:39.833489916Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "d1uasgflhruqg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:36.630717233Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:39.833489916Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "d1uasgflhruqh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:36.637102092Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:39.833489916Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "fjs92bfovjwde", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:50.209289966Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "fjs92bfovjwdf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:50.643269087Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "process": "taskinstance.py:1091", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "fjs92bfovjwdg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:50.735079062Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "fjs92bfovjwdh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:50.736375341Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1289", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Starting attempt 2 of 3", - "insertId": "fjs92bfovjwdi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:50.737301451Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "fjs92bfovjwdj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:50.737906824Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "fjs92bfovjwdk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:51.059890356Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "fjs92bfovjwdl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:51.440381084Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "fjs92bfovjwdm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:51.528411094Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "fjs92bfovjwdn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:51.528969175Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1289", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Starting attempt 2 of 3", - "insertId": "fjs92bfovjwdo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:51.529545063Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "try-number": "2", - "process": "taskinstance.py:1290", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "fjs92bfovjwdp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:51.530244933Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "workflow": "data_analytics_dag", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1291", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "fjs92bfovjwdq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:51.723442314Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "fjs92bfovjwdr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:51.723505458Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "fjs92bfovjwds", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:51.820430838Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "fjs92bfovjwdt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:51.820466905Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "fjs92bfovjwdu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:53.030718065Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "taskinstance.py:1310", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Started process 427 to run task", - "insertId": "fjs92bfovjwdv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:53.117143372Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:55", - "try-number": "2", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2020', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '943', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpt10h42d6']", - "insertId": "fjs92bfovjwdw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:53.226108191Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:82", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Job 943: Subtask join_bq_datasets.bq_join_holidays_weather_data_2020", - "insertId": "fjs92bfovjwdx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:53.229207234Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "standard_task_runner.py:83", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "fjs92bfovjwdy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:54.110297314Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "fjs92bfovjwdz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:54.110347891Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "fjs92bfovjwe0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:54.224956938Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "fjs92bfovjwe1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:54.225005305Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "fjs92bfovjwe2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:54.551575087Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:05:55.848430462Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "o5m8rdfioag0l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:55.326009077Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "process": "taskinstance.py:1310", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2020\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=2\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "o5m8rdfioag0m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:55.404174218Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "try-number": "2", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1518", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Started process 433 to run task", - "insertId": "o5m8rdfioag0n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:55.425695932Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "process": "standard_task_runner.py:55" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2013', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '944', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpj0pa4ugd']", - "insertId": "o5m8rdfioag0o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:55.502774418Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "workflow": "data_analytics_dag", - "try-number": "2", - "process": "standard_task_runner.py:82", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Job 944: Subtask join_bq_datasets.bq_join_holidays_weather_data_2013", - "insertId": "o5m8rdfioag0p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:55.503626941Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:83", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "o5m8rdfioag0q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:55.520692766Z", - "severity": "INFO", - "labels": { - "process": "base.py:73", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2020 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "o5m8rdfioag0r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:55.526960218Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "bigquery.py:2710", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "o5m8rdfioag0s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:55.530524947Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "credentials_provider.py:353" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2020_2023_09_12T00_00_00_00_00_a7c9f390d0c365a4117d7517d3b72dac", - "insertId": "o5m8rdfioag0t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:55.715566956Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "process": "bigquery.py:1596", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "o5m8rdfioag0u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:56.402776822Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "workflow": "data_analytics_dag", - "try-number": "2", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2013\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=2\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "o5m8rdfioag0v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:57.237805660Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1518" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "o5m8rdfioag0w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:57.337966417Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "base.py:73", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2013 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "o5m8rdfioag0x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:57.338042811Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "2", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "bigquery.py:2710", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "o5m8rdfioag0y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:05:57.342873734Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "worker_id": "airflow-worker-n79fs", - "process": "credentials_provider.py:353", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2013_2023_09_12T00_00_00_00_00_04089fe3d17ffd88ff583b31bd212234", - "insertId": "o5m8rdfioag0z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:57.381357218Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "process": "bigquery.py:1596", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2020, execution_date=20230912T000000, start_date=20230913T080550, end_date=20230913T080559", - "insertId": "o5m8rdfioag10", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:05:59.203056292Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "map-index": "-1", - "process": "taskinstance.py:1328", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "o5m8rdfioag11", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:05:59.721421904Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:00.850625149Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "g3dk7hfimzclt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:06:00.245075770Z", - "severity": "INFO", - "labels": { - "process": "local_task_job.py:212", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:05.953562291Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2013, execution_date=20230912T000000, start_date=20230913T080551, end_date=20230913T080600", - "insertId": "g3dk7hfimzclu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:06:00.428606600Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "process": "taskinstance.py:1328", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:05.953562291Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "g3dk7hfimzclv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:06:00.532946238Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "taskinstance.py:2599", - "try-number": "2", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:05.953562291Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[2d70a3e5-4d9c-4044-8bf5-6ec81d9ddca7] succeeded in 27.768251899979077s: None", - "insertId": "g3dk7hfimzclw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:06:00.928034088Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:05.953562291Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "g3dk7hfimzclx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:06:01.743758560Z", - "severity": "INFO", - "labels": { - "process": "local_task_job.py:212", - "map-index": "-1", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:05.953562291Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "g3dk7hfimzcly", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:06:02.011164946Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2013", - "try-number": "2", - "process": "taskinstance.py:2599", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:05.953562291Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[a5892c08-ffb3-41f3-b1db-de096109a821] succeeded in 29.166083134012297s: None", - "insertId": "g3dk7hfimzclz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:06:02.325020377Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:06:05.953562291Z" - }, - { - "textPayload": "I0913 08:09:35.647753 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "p92a9kfhy37ri", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:09:35.648053735Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T08:09:41.342196725Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[d53c9a80-f5fc-43a6-a715-ea06e5004d21] received", - "insertId": "6drpfnfbecqnc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:10:01.456162173Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:02.489243203Z" - }, - { - "textPayload": "[d53c9a80-f5fc-43a6-a715-ea06e5004d21] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T08:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "6drpfnfbecqnd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:10:01.468100966Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:02.489243203Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "14ii4dzf6e9bnt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:10:01.844814947Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "14ii4dzf6e9bnu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:10:01.847156432Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "14ii4dzf6e9bnv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:10:02.022092753Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "14ii4dzf6e9bnw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:10:02.942729246Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "14ii4dzf6e9bnx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:10:03.497834608Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "14ii4dzf6e9bny", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:10:03.752856794Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:00:00+00:00", - "task-id": "echo", - "process": "taskinstance.py:1091", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "14ii4dzf6e9bnz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:10:03.770974257Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-13T08:00:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1091", - "task-id": "echo", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "14ii4dzf6e9bo0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:10:03.771544511Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1289", - "map-index": "-1", - "execution-date": "2023-09-13T08:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "14ii4dzf6e9bo1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:10:03.772051028Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T08:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "map-index": "-1", - "task-id": "echo", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "14ii4dzf6e9bo2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:10:03.772515207Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "execution-date": "2023-09-13T08:00:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "14ii4dzf6e9bo3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:10:04.064580750Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "14ii4dzf6e9bo4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:10:04.064628275Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "14ii4dzf6e9bo5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:10:04.119108983Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "14ii4dzf6e9bo6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:10:04.119157032Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Executing on 2023-09-13 08:00:00+00:00", - "insertId": "14ii4dzf6e9bo7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:10:04.889692439Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1310", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-13T08:00:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Started process 525 to run task", - "insertId": "14ii4dzf6e9bo8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:10:04.922317914Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-13T08:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T08:00:00+00:00', '--job-id', '945', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp_2t_quqj']", - "insertId": "14ii4dzf6e9bo9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:10:04.934028315Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:82", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Job 945: Subtask echo", - "insertId": "14ii4dzf6e9boa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:10:04.935957497Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:83", - "execution-date": "2023-09-13T08:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "14ii4dzf6e9bob", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:10:05.389381069Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-13T08:00:00+00:00", - "task-id": "echo", - "map-index": "-1", - "process": "task_command.py:393", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T08:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T08:00:00+00:00", - "insertId": "14ii4dzf6e9boc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:10:05.671448930Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1518", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "14ii4dzf6e9bod", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:10:05.691516244Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "subprocess.py:63", - "execution-date": "2023-09-13T08:00:00+00:00", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "14ii4dzf6e9boe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:10:05.699031408Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "subprocess.py:75", - "workflow": "airflow_monitoring", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Output:", - "insertId": "14ii4dzf6e9bof", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:10:05.884745829Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "subprocess.py:86", - "execution-date": "2023-09-13T08:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "test", - "insertId": "14ii4dzf6e9bog", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:10:05.894618710Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T08:00:00+00:00", - "map-index": "-1", - "process": "subprocess.py:93", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "14ii4dzf6e9boh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:10:05.895534628Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:97", - "map-index": "-1", - "execution-date": "2023-09-13T08:00:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T080000, start_date=20230913T081003, end_date=20230913T081005", - "insertId": "14ii4dzf6e9boi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:10:05.946489922Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "taskinstance.py:1328", - "execution-date": "2023-09-13T08:00:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "14ii4dzf6e9boj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:10:06.700205987Z", - "severity": "INFO", - "labels": { - "process": "local_task_job.py:212", - "execution-date": "2023-09-13T08:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "14ii4dzf6e9bok", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:10:06.788290980Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:2599", - "workflow": "airflow_monitoring", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T08:00:00+00:00", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:07.607011934Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[d53c9a80-f5fc-43a6-a715-ea06e5004d21] succeeded in 5.482254337024642s: None", - "insertId": "1htx5o8fipnoip", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:10:06.943453275Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:10:12.758165748Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1lpnz7efiodddc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:10:56.332799219Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:11:00.195388416Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1pvitbafhipqjv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:15:56.335985210Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:16:01.704962824Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[b61e11b9-fb84-4366-aa77-999671fd3e4b] received", - "insertId": "1oa419flqoclo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:20:01.348297640Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:02.827218813Z" - }, - { - "textPayload": "[b61e11b9-fb84-4366-aa77-999671fd3e4b] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T08:10:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "1oa419flqoclp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:20:01.353567700Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:02.827218813Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1oa419flqoclq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:20:01.721887850Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:02.827218813Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1oa419flqoclr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:20:01.724392482Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:02.827218813Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "5tu2igf6da3aa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:20:01.850020794Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "5tu2igf6da3ab", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:20:02.832681058Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "5tu2igf6da3ac", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:20:03.449232955Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "5tu2igf6da3ad", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:20:03.750595884Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "echo", - "map-index": "-1", - "execution-date": "2023-09-13T08:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "5tu2igf6da3ae", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:20:03.771340952Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T08:10:00+00:00", - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "5tu2igf6da3af", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:20:03.771638250Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T08:10:00+00:00", - "process": "taskinstance.py:1289", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "5tu2igf6da3ag", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:20:03.772198527Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1290", - "task-id": "echo", - "execution-date": "2023-09-13T08:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "5tu2igf6da3ah", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:20:03.772622648Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "map-index": "-1", - "execution-date": "2023-09-13T08:10:00+00:00", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1291", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "5tu2igf6da3ai", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:20:04.089732064Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "5tu2igf6da3aj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:20:04.089806170Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "5tu2igf6da3ak", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:20:04.111671910Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "5tu2igf6da3al", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:20:04.111737110Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Executing on 2023-09-13 08:10:00+00:00", - "insertId": "5tu2igf6da3am", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:20:04.846181383Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:10:00+00:00", - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:1310" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Started process 760 to run task", - "insertId": "5tu2igf6da3an", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:20:04.884615274Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "execution-date": "2023-09-13T08:10:00+00:00", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:55" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T08:10:00+00:00', '--job-id', '946', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp_r9gkvn7']", - "insertId": "5tu2igf6da3ao", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:20:04.885071074Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "execution-date": "2023-09-13T08:10:00+00:00", - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:82" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Job 946: Subtask echo", - "insertId": "5tu2igf6da3ap", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:20:04.886617781Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T08:10:00+00:00", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:83" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "5tu2igf6da3aq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:20:05.240016913Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T08:10:00+00:00", - "workflow": "airflow_monitoring", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T08:10:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T08:10:00+00:00", - "insertId": "5tu2igf6da3ar", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:20:05.419413127Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1518", - "execution-date": "2023-09-13T08:10:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "5tu2igf6da3as", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:20:05.421985704Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "process": "subprocess.py:63", - "execution-date": "2023-09-13T08:10:00+00:00", - "try-number": "1", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "5tu2igf6da3at", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:20:05.424425758Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-13T08:10:00+00:00", - "workflow": "airflow_monitoring", - "process": "subprocess.py:75", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Output:", - "insertId": "5tu2igf6da3au", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:20:05.614513869Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "task-id": "echo", - "execution-date": "2023-09-13T08:10:00+00:00", - "try-number": "1", - "process": "subprocess.py:86", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "test", - "insertId": "5tu2igf6da3av", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:20:05.624703216Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:10:00+00:00", - "process": "subprocess.py:93", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "5tu2igf6da3aw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:20:05.625371843Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "subprocess.py:97", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "try-number": "1", - "execution-date": "2023-09-13T08:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T081000, start_date=20230913T082003, end_date=20230913T082005", - "insertId": "5tu2igf6da3ax", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:20:05.670409970Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1328", - "execution-date": "2023-09-13T08:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "5tu2igf6da3ay", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:20:06.513646953Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "local_task_job.py:212", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T08:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "5tu2igf6da3az", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:20:06.590411982Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:2599", - "execution-date": "2023-09-13T08:10:00+00:00", - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[b61e11b9-fb84-4366-aa77-999671fd3e4b] succeeded in 5.449133146001259s: None", - "insertId": "5tu2igf6da3b0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:20:06.800896222Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:20:07.943265376Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1hu197lfi40o3m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:20:58.611754159Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:21:00.355322605Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "nb8c9ufos3als", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:26:00.137821609Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:26:03.998461255Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[7cbad3d8-617c-40da-9c12-de800d939536] received", - "insertId": "2ifbgsfimrhqm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:28:31.041251695Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:34.072258598Z" - }, - { - "textPayload": "[7cbad3d8-617c-40da-9c12-de800d939536] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'run_bq_external_ingestion', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "2ifbgsfimrhqn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:28:31.047047274Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:34.072258598Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "2ifbgsfimrhqo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:28:31.413442609Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:34.072258598Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "2ifbgsfimrhqp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:28:31.416259400Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:34.072258598Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "2ifbgsfimrhqq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:28:31.535034980Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:34.072258598Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "2ifbgsfimrhqr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:28:32.451137624Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:34.072258598Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "45zp0lfioqplt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:28:37.509637812Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:40.832885384Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "45zp0lfioqplu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:28:37.918268694Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1091", - "task-id": "run_bq_external_ingestion", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:40.832885384Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "45zp0lfioqplv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:28:37.945520802Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "taskinstance.py:1091", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:40.832885384Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "45zp0lfioqplw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:28:38.013476373Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:1289", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:40.832885384Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "45zp0lfioqplx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:28:38.015420821Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1290", - "worker_id": "airflow-worker-n79fs", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:40.832885384Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "45zp0lfioqply", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:28:38.020247442Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1291", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:40.832885384Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "45zp0lfioqplz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:28:38.717841204Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:40.832885384Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "45zp0lfioqpm0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:28:38.717897276Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:40.832885384Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "45zp0lfioqpm1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:28:38.824607230Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:40.832885384Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "45zp0lfioqpm2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:28:38.824649574Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:40.832885384Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "ak0wmqfik6apl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:28:39.960764335Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:1310", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:45.842511669Z" - }, - { - "textPayload": "Started process 972 to run task", - "insertId": "ak0wmqfik6apm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:28:40.015797047Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "process": "standard_task_runner.py:55" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:45.842511669Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'run_bq_external_ingestion', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '947', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmprj_c6mdn']", - "insertId": "ak0wmqfik6apn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:28:40.027812552Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "run_bq_external_ingestion", - "process": "standard_task_runner.py:82", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:45.842511669Z" - }, - { - "textPayload": "Job 947: Subtask run_bq_external_ingestion", - "insertId": "ak0wmqfik6apo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:28:40.028965180Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "standard_task_runner.py:83", - "worker_id": "airflow-worker-n79fs", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:45.842511669Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "ak0wmqfik6app", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:28:40.786419387Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "task-id": "run_bq_external_ingestion", - "process": "task_command.py:393", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:45.842511669Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=run_bq_external_ingestion\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "ak0wmqfik6apq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:28:41.529824809Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1518", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:45.842511669Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "ak0wmqfik6apr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:28:41.614253614Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "base.py:73", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:45.842511669Z" - }, - { - "textPayload": "Using existing BigQuery table for storing data...", - "insertId": "ak0wmqfik6aps", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:28:41.619363941Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "gcs_to_bigquery.py:375", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "run_bq_external_ingestion", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:45.842511669Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "ak0wmqfik6apt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:28:41.624493739Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "credentials_provider.py:353", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:45.842511669Z" - }, - { - "textPayload": "Project is not included in destination_project_dataset_table: holiday_weather.holidays; using project \"acceldata-acm\"", - "insertId": "ak0wmqfik6apu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:28:41.709437214Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "1", - "process": "bigquery.py:2314", - "task-id": "run_bq_external_ingestion", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:45.842511669Z" - }, - { - "textPayload": "Executing: {'load': {'autodetect': True, 'createDisposition': 'CREATE_IF_NEEDED', 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays'}, 'sourceFormat': 'CSV', 'sourceUris': ['gs://openlineagedemo/holidays.csv'], 'writeDisposition': 'WRITE_TRUNCATE', 'ignoreUnknownValues': False, 'schema': {'fields': [{'name': 'Date', 'type': 'DATE'}, {'name': 'Holiday', 'type': 'STRING'}]}, 'skipLeadingRows': 1, 'fieldDelimiter': ',', 'quote': None, 'allowQuotedNewlines': False, 'encoding': 'UTF-8'}}", - "insertId": "ak0wmqfik6apv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:28:41.710972204Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "run_bq_external_ingestion", - "process": "gcs_to_bigquery.py:379", - "workflow": "data_analytics_dag", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:45.842511669Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_run_bq_external_ingestion_2023_09_12T00_00_00_00_00_76450f752f278143f44209d1d2887d14", - "insertId": "ak0wmqfik6apw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:28:41.713579468Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "bigquery.py:1596", - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:45.842511669Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=run_bq_external_ingestion, execution_date=20230912T000000, start_date=20230913T082837, end_date=20230913T082844", - "insertId": "ak0wmqfik6apx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:28:44.310579729Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "process": "taskinstance.py:1328", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:45.842511669Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "281nwpfox9341", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:28:45.409137978Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "data_analytics_dag", - "process": "local_task_job.py:212", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:50.917705349Z" - }, - { - "textPayload": "[7cbad3d8-617c-40da-9c12-de800d939536] Failed to execute task No row was found when one was required.", - "insertId": "281nwpfox9342", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:28:46.139789813Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:133" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:50.917705349Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/executors/celery_executor.py\", line 130, in _execute_in_fork args.func(args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/cli/cli_parser.py\", line 52, in command return func(*args, **kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/utils/cli.py\", line 108, in wrapper return f(*args, **kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/cli/commands/task_command.py\", line 400, in task_run _run_task_by_selected_method(args, dag, ti) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/cli/commands/task_command.py\", line 194, in _run_task_by_selected_method _run_task_by_local_task_job(args, ti) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/cli/commands/task_command.py\", line 253, in _run_task_by_local_task_job run_job.run() File \"/opt/python3.8/lib/python3.8/site-packages/airflow/jobs/base_job.py\", line 258, in run self._execute() File \"/opt/python3.8/lib/python3.8/site-packages/airflow/jobs/local_task_job.py\", line 185, in _execute self.handle_task_exit(return_code) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/jobs/local_task_job.py\", line 217, in handle_task_exit self.task_instance.schedule_downstream_tasks() File \"/opt/python3.8/lib/python3.8/site-packages/airflow/utils/session.py\", line 75, in wrapper return func(*args, session=session, **kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py\", line 2554, in schedule_downstream_tasks dag_run = with_row_locks( File \"/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/orm/query.py\", line 2870, in one return self._iter().one() File \"/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/engine/result.py\", line 1522, in one return self._only_one_row( File \"/opt/python3.8/lib/python3.8/site-packages/sqlalchemy/engine/result.py\", line 562, in _only_one_row raise exc.NoResultFound(sqlalchemy.exc.NoResultFound: No row was found when one was required", - "insertId": "281nwpfox9343", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:28:46.139851363Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:50.917705349Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[7cbad3d8-617c-40da-9c12-de800d939536] raised unexpected: AirflowException('Celery command failed on host: airflow-worker-n79fs with celery_task_id 7cbad3d8-617c-40da-9c12-de800d939536')", - "insertId": "281nwpfox9344", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:28:46.508671720Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "trace.py:265" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:50.917705349Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/celery/app/trace.py\", line 451, in trace_task R = retval = fun(*args, **kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/celery/app/trace.py\", line 734, in __protected_call__ return self.run(*args, **kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/executors/celery_executor.py\", line 96, in execute_command _execute_in_fork(command_to_exec, celery_task_id) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/executors/celery_executor.py\", line 111, in _execute_in_fork raise AirflowException(msg)airflow.exceptions.AirflowException: Celery command failed on host: airflow-worker-n79fs with celery_task_id 7cbad3d8-617c-40da-9c12-de800d939536", - "insertId": "281nwpfox9345", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:28:46.508720811Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:28:50.917705349Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[eb6dd88f-a3e5-4423-b4f6-77bf1edf33dc] received", - "insertId": "1ndt0w9fozevbu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:10.475752473Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:13.884199180Z" - }, - { - "textPayload": "[eb6dd88f-a3e5-4423-b4f6-77bf1edf33dc] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'run_bq_external_ingestion', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "1ndt0w9fozevbv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:10.482400696Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:13.884199180Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1ndt0w9fozevbw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:10.958854893Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:13.884199180Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1ndt0w9fozevbx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:10.962278130Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:13.884199180Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1ndt0w9fozevby", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:11.108567603Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:13.884199180Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1ndt0w9fozevbz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:12.043258046Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:13.884199180Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1ie3n4efhvd48l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:15.611858684Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:18.830229736Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1ie3n4efhvd48m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:15.772416408Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "task-id": "run_bq_external_ingestion", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:18.830229736Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1ie3n4efhvd48n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:15.790361708Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "try-number": "1", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:18.830229736Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1ie3n4efhvd48o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:15.790817549Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1289", - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:18.830229736Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "1ie3n4efhvd48p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:15.791391072Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "taskinstance.py:1290", - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:18.830229736Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1ie3n4efhvd48q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:15.791969823Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "taskinstance.py:1291", - "map-index": "-1", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:18.830229736Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1ie3n4efhvd48r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:16.134785414Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:18.830229736Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1ie3n4efhvd48s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:16.134829255Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:18.830229736Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1ie3n4efhvd48t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:16.214275082Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:18.830229736Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1ie3n4efhvd48u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:16.214316490Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:18.830229736Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "1ie3n4efhvd48v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:17.045920012Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1310", - "workflow": "data_analytics_dag", - "map-index": "-1", - "try-number": "1", - "task-id": "run_bq_external_ingestion", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:18.830229736Z" - }, - { - "textPayload": "Started process 988 to run task", - "insertId": "1ie3n4efhvd48w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:17.060153942Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:55", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "run_bq_external_ingestion", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:18.830229736Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'run_bq_external_ingestion', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '948', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpfqmm22dy']", - "insertId": "1ie3n4efhvd48x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:17.064966214Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:82", - "task-id": "run_bq_external_ingestion", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:18.830229736Z" - }, - { - "textPayload": "Job 948: Subtask run_bq_external_ingestion", - "insertId": "1ie3n4efhvd48y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:17.065574532Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "standard_task_runner.py:83", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:18.830229736Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1ie3n4efhvd48z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:17.475259591Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393", - "task-id": "run_bq_external_ingestion", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:18.830229736Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=run_bq_external_ingestion\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "12knff2fi1xm21", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:17.825150876Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "try-number": "1", - "task-id": "run_bq_external_ingestion", - "process": "taskinstance.py:1518" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:23.832248109Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "12knff2fi1xm22", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:17.868442Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "base.py:73" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:23.832248109Z" - }, - { - "textPayload": "Using existing BigQuery table for storing data...", - "insertId": "12knff2fi1xm23", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:17.871031822Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "run_bq_external_ingestion", - "try-number": "1", - "map-index": "-1", - "process": "gcs_to_bigquery.py:375" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:23.832248109Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "12knff2fi1xm24", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:17.871755576Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-n79fs", - "process": "credentials_provider.py:353", - "workflow": "data_analytics_dag", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:23.832248109Z" - }, - { - "textPayload": "Project is not included in destination_project_dataset_table: holiday_weather.holidays; using project \"acceldata-acm\"", - "insertId": "12knff2fi1xm25", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:17.923144882Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "bigquery.py:2314", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:23.832248109Z" - }, - { - "textPayload": "Executing: {'load': {'autodetect': True, 'createDisposition': 'CREATE_IF_NEEDED', 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays'}, 'sourceFormat': 'CSV', 'sourceUris': ['gs://openlineagedemo/holidays.csv'], 'writeDisposition': 'WRITE_TRUNCATE', 'ignoreUnknownValues': False, 'schema': {'fields': [{'name': 'Date', 'type': 'DATE'}, {'name': 'Holiday', 'type': 'STRING'}]}, 'skipLeadingRows': 1, 'fieldDelimiter': ',', 'quote': None, 'allowQuotedNewlines': False, 'encoding': 'UTF-8'}}", - "insertId": "12knff2fi1xm26", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:17.924426047Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "try-number": "1", - "process": "gcs_to_bigquery.py:379", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:23.832248109Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_run_bq_external_ingestion_2023_09_12T00_00_00_00_00_9f0acc11f2ac9c69be11b1a27e0ec444", - "insertId": "12knff2fi1xm27", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:17.925975724Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "task-id": "run_bq_external_ingestion", - "workflow": "data_analytics_dag", - "try-number": "1", - "process": "bigquery.py:1596", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:23.832248109Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=run_bq_external_ingestion, execution_date=20230912T000000, start_date=20230913T082915, end_date=20230913T082921", - "insertId": "12knff2fi1xm28", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:21.734968521Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1328", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:23.832248109Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "12knff2fi1xm29", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:22.562400701Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "local_task_job.py:212", - "task-id": "run_bq_external_ingestion", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:23.832248109Z" - }, - { - "textPayload": "25 downstream tasks scheduled from follow-on schedule check", - "insertId": "12knff2fi1xm2a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:22.728595137Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "task-id": "run_bq_external_ingestion", - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:2599", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:23.832248109Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[eb6dd88f-a3e5-4423-b4f6-77bf1edf33dc] succeeded in 12.491015944018727s: None", - "insertId": "1e81nhfire8xv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:22.973662538Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:28.826389472Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[be0fdbb9-44fd-4a07-a729-b299d039f49f] received", - "insertId": "1e81nhfire8xw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:24.010258210Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:28.826389472Z" - }, - { - "textPayload": "[be0fdbb9-44fd-4a07-a729-b299d039f49f] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2017', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "1e81nhfire8xx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:24.032790776Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:28.826389472Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1e81nhfire8xy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:24.344110990Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:28.826389472Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1e81nhfire8xz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:24.346208346Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:28.826389472Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1e81nhfire8y0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:24.516254098Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:28.826389472Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1e81nhfire8y1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:25.424673613Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:28.826389472Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "sb49dzf6egpu6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:28.580992240Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "sb49dzf6egpu7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:28.770796909Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "sb49dzf6egpu8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:28.791223756Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "sb49dzf6egpu9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:28.791690541Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1289", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "sb49dzf6egpua", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:28.792279819Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1290", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "sb49dzf6egpub", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:28.792715669Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "process": "taskinstance.py:1291", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "sb49dzf6egpuc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:29.114581975Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "sb49dzf6egpud", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:29.114649148Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "sb49dzf6egpue", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:29.138019748Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "sb49dzf6egpuf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:29.138082211Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "sb49dzf6egpug", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:30.092607362Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1310", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "Started process 996 to run task", - "insertId": "sb49dzf6egpuh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:30.114520691Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "standard_task_runner.py:55", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2017', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '949', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmp8t6gitr0']", - "insertId": "sb49dzf6egpui", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:30.117910947Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:82", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "Job 949: Subtask join_bq_datasets.bq_join_holidays_weather_data_2017", - "insertId": "sb49dzf6egpuj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:30.118809018Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "process": "standard_task_runner.py:83", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "sb49dzf6egpuk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:30.510555419Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "task_command.py:393", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2017\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "sb49dzf6egpul", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:30.784166411Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1518", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "sb49dzf6egpum", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:30.825154591Z", - "severity": "INFO", - "labels": { - "process": "base.py:73", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2017 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "sb49dzf6egpun", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:30.827861479Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "process": "bigquery.py:2710", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "sb49dzf6egpuo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:30.828919437Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "credentials_provider.py:353", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2017_2023_09_12T00_00_00_00_00_b6a345cef550d1555dec74ab1cd294db", - "insertId": "sb49dzf6egpup", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:30.870076658Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "process": "bigquery.py:1596", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:33.834255248Z" - }, - { - "textPayload": "I0913 08:29:32.370097 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1wjis7ufi6561y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:29:32.370338572Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T08:29:39.251865526Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2017, execution_date=20230912T000000, start_date=20230913T082928, end_date=20230913T082933", - "insertId": "ahi4tfoymt7l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:33.646447350Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1328" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:38.929199131Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "ahi4tfoymt7o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:29:34.350704171Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017", - "process": "local_task_job.py:212", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:38.929199131Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "ahi4tfoymt7q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:34.423200517Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "taskinstance.py:2599", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2017" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:38.929199131Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[be0fdbb9-44fd-4a07-a729-b299d039f49f] succeeded in 10.56483886600472s: None", - "insertId": "ahi4tfoymt7s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:29:34.584452814Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:29:38.929199131Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[41fa9a6d-75f2-4c2a-b6d1-b5e0526bb9dd] received", - "insertId": "1f25mrnfhx7uu6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:30:01.672250417Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "[41fa9a6d-75f2-4c2a-b6d1-b5e0526bb9dd] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T08:20:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "1f25mrnfhx7uu7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:30:01.678151301Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1f25mrnfhx7uu8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:30:02.022831178Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1f25mrnfhx7uu9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:30:02.025562138Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1f25mrnfhx7uua", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:30:02.156122174Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "1f25mrnfhx7uub", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:30:03.146907079Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1f25mrnfhx7uuc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:30:03.668225118Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1f25mrnfhx7uud", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:30:03.800383979Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T08:20:00+00:00", - "process": "taskinstance.py:1091", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1f25mrnfhx7uue", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:30:03.825612575Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T08:20:00+00:00", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1f25mrnfhx7uuf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:30:03.826134855Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1289", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T08:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "1f25mrnfhx7uug", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:30:03.826694010Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1290", - "execution-date": "2023-09-13T08:20:00+00:00", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1f25mrnfhx7uuh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:30:03.827148242Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:20:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:1291", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1f25mrnfhx7uui", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:30:04.111264389Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1f25mrnfhx7uuj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:30:04.111307045Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1f25mrnfhx7uuk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:30:04.133571906Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1f25mrnfhx7uul", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:30:04.133611657Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "Executing on 2023-09-13 08:20:00+00:00", - "insertId": "1f25mrnfhx7uum", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:30:04.950706331Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T08:20:00+00:00", - "process": "taskinstance.py:1310" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "Started process 1009 to run task", - "insertId": "1f25mrnfhx7uun", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:30:04.987591309Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T08:20:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "process": "standard_task_runner.py:55", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T08:20:00+00:00', '--job-id', '950', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp93ungmjh']", - "insertId": "1f25mrnfhx7uuo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:30:04.988637163Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "standard_task_runner.py:82", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T08:20:00+00:00", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "Job 950: Subtask echo", - "insertId": "1f25mrnfhx7uup", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:30:04.989867031Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:20:00+00:00", - "process": "standard_task_runner.py:83", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1f25mrnfhx7uuq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:30:05.352795659Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T08:20:00+00:00", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:06.844015195Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T08:20:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T08:20:00+00:00", - "insertId": "5aaqinf6ilyr4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:30:06.026377375Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:20:00+00:00", - "map-index": "-1", - "process": "taskinstance.py:1518", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:11.861671303Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "5aaqinf6ilyr5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:30:06.029672034Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T08:20:00+00:00", - "try-number": "1", - "process": "subprocess.py:63", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:11.861671303Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "5aaqinf6ilyr6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:30:06.032129034Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T08:20:00+00:00", - "process": "subprocess.py:75" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:11.861671303Z" - }, - { - "textPayload": "Output:", - "insertId": "5aaqinf6ilyr7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:30:06.322224751Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T08:20:00+00:00", - "try-number": "1", - "process": "subprocess.py:86", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:11.861671303Z" - }, - { - "textPayload": "test", - "insertId": "5aaqinf6ilyr8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:30:06.329123080Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T08:20:00+00:00", - "process": "subprocess.py:93" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:11.861671303Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "5aaqinf6ilyr9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:30:06.330886323Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "process": "subprocess.py:97", - "execution-date": "2023-09-13T08:20:00+00:00", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:11.861671303Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T082000, start_date=20230913T083003, end_date=20230913T083006", - "insertId": "5aaqinf6ilyra", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:30:06.432625005Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T08:20:00+00:00", - "process": "taskinstance.py:1328", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:11.861671303Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "5aaqinf6ilyrb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:30:07.343811345Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:20:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "process": "local_task_job.py:212", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:11.861671303Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "5aaqinf6ilyrc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:30:07.519120270Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "taskinstance.py:2599", - "execution-date": "2023-09-13T08:20:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:11.861671303Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[41fa9a6d-75f2-4c2a-b6d1-b5e0526bb9dd] succeeded in 6.070064147002995s: None", - "insertId": "5aaqinf6ilyrd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:30:07.745563618Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:30:11.861671303Z" - }, - { - "textPayload": "I0913 08:30:29.816950 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1h0f4qaf9orzoi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:30:29.817164872Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T08:30:36.580790076Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1dee0mdfdpiwg0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:31:02.239252096Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:31:07.976486253Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[cdc86ce3-247d-4165-a09a-d6f8a1b01034] received", - "insertId": "1d4gjkaflm2397", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:32:09.745857255Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:15.109899644Z" - }, - { - "textPayload": "[cdc86ce3-247d-4165-a09a-d6f8a1b01034] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'run_bq_external_ingestion', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "1d4gjkaflm2398", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:32:09.751433916Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:15.109899644Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1d4gjkaflm2399", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:10.222622841Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:15.109899644Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1d4gjkaflm239a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:10.224538903Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:15.109899644Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1d4gjkaflm239b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:32:10.370772676Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:15.109899644Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1d4gjkaflm239c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:32:11.567144343Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:15.109899644Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "5aaqinf6isc6u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:32:15.263161778Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:20.823904077Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "5aaqinf6isc6v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:32:15.263227642Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:20.823904077Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "5aaqinf6isc6w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:32:15.274372574Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:20.823904077Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1thw82kf6ie0s3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:32:20.278367327Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:25.824571488Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1thw82kf6ie0s4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:20.280870417Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:25.824571488Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "1thw82kf6ie0s5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:32:20.280954769Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:25.824571488Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "1thw82kf6ie0s6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:20.281433Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:25.824571488Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1rjlexrfhz6anb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:32:25.287376985Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:30.828322552Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1rjlexrfhz6anc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:32:25.288982476Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:30.828322552Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "1rjlexrfhz6and", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:32:25.289011895Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:30.828322552Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "1rjlexrfhz6ane", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:25.289866291Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "cli.py:240" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:30.828322552Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "6nz668fp238vr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:30.295508739Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:35.826256748Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "6nz668fp238vs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:32:30.297803736Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:35.826256748Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "6nz668fp238vt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:32:30.297838526Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:35.826256748Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "6nz668fp238vu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:30.298422460Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:35.826256748Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "4q1e2df6g4j0t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:32:35.304171004Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:40.828608594Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "4q1e2df6g4j0u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:32:35.306956218Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:40.828608594Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "4q1e2df6g4j0v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:32:35.307022052Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:40.828608594Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "4q1e2df6g4j0w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:32:35.307406715Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:40.828608594Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "jpbb1qfotja3n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:32:40.312932135Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:45.828084983Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "jpbb1qfotja3o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:32:40.315023145Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:45.828084983Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "jpbb1qfotja3p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:32:40.315056310Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:45.828084983Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "jpbb1qfotja3q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:40.315570718Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:45.828084983Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1c0nyu9fifjt0b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:45.317892015Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:50.836053296Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1c0nyu9fifjt0c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:45.320316523Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:50.836053296Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "1c0nyu9fifjt0d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:45.320394837Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:50.836053296Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "1c0nyu9fifjt0e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:32:45.321172152Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:50.836053296Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "28bgxpfowlke2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:32:50.327059398Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:55.831009170Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "28bgxpfowlke3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:50.329467032Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:55.831009170Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "28bgxpfowlke4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:50.329507098Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:55.831009170Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "28bgxpfowlke5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:50.330138787Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:32:55.831009170Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1ck5nq2fi4sgpe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:32:55.336461259Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:00.825129024Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1ck5nq2fi4sgpf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:55.338134739Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:00.825129024Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "1ck5nq2fi4sgpg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:32:55.338170131Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:00.825129024Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "1ck5nq2fi4sgph", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:32:55.338765476Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "cli.py:240" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:00.825129024Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "k4w1kfiuoell", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:33:00.344373930Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:05.831437004Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "k4w1kfiuoelm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:00.346539376Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:05.831437004Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "k4w1kfiuoeln", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:33:00.346601454Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:05.831437004Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "k4w1kfiuoelo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:33:00.346974611Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:05.831437004Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "ci0rqeflnk967", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:05.351976072Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:10.897028104Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "ci0rqeflnk968", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:05.356319689Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:10.897028104Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "ci0rqeflnk969", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:05.356375790Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:10.897028104Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "ci0rqeflnk96a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:33:05.358287677Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "cli.py:240" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:10.897028104Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1pbzs3cflpj0bc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:33:10.363999849Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:16.080511957Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1pbzs3cflpj0bd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:10.367049439Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:16.080511957Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "1pbzs3cflpj0be", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:10.367085077Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:16.080511957Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "1pbzs3cflpj0bf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:33:10.367924158Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:16.080511957Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1pbzs3cflpj0bg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:15.372905159Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:16.080511957Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1pbzs3cflpj0bh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:15.376404497Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:16.080511957Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "1pbzs3cflpj0bi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:15.376477535Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:16.080511957Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "1pbzs3cflpj0bj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:15.377450582Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:16.080511957Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1t7vga0fi2g5uy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:20.382930907Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:21.988512449Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1t7vga0fi2g5uz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:20.385349388Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:21.988512449Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "1t7vga0fi2g5v0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:20.385381217Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:21.988512449Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "1t7vga0fi2g5v1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:20.385963288Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "cli.py:240" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:21.988512449Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "17uf4n4flnwv73", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:25.391853728Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:31.077023605Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "17uf4n4flnwv74", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:25.394295460Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:31.077023605Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "17uf4n4flnwv75", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:25.394349062Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:31.077023605Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "17uf4n4flnwv76", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:25.394492946Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "cli.py:240" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:31.077023605Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "17uf4n4flnwv77", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:33:30.398001786Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:31.077023605Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "17uf4n4flnwv78", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:33:30.400135063Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:31.077023605Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "17uf4n4flnwv79", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:30.400172507Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:31.077023605Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "17uf4n4flnwv7a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:33:30.401070218Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "cli.py:240" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:31.077023605Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "16qkdudfi5iknu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:35.408624814Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:41.185680355Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "16qkdudfi5iknv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:33:35.418430412Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:41.185680355Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "16qkdudfi5iknw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:35.418480912Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:41.185680355Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "16qkdudfi5iknx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:35.420885870Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:41.185680355Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "16qkdudfi5ikny", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:33:40.427088678Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:41.185680355Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "16qkdudfi5iknz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:40.429486240Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:41.185680355Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "16qkdudfi5iko0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:40.429539725Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:41.185680355Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "16qkdudfi5iko1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:40.430168781Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:41.185680355Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1ormx65fi3ygvu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:45.435130344Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:47.204441183Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1ormx65fi3ygvv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:33:45.438148446Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:47.204441183Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "1ormx65fi3ygvw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:45.438201072Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:47.204441183Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "1ormx65fi3ygvx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:33:45.439787981Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "cli.py:240" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:47.204441183Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "14iki9xfoxthdp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:50.445539156Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:56.280606456Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "14iki9xfoxthdq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:50.447610368Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:56.280606456Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "14iki9xfoxthdr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:50.447660711Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:56.280606456Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "14iki9xfoxthds", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:50.447936638Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "cli.py:240" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:56.280606456Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "14iki9xfoxthdt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:55.449927046Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:56.280606456Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "14iki9xfoxthdu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:33:55.452076078Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:56.280606456Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "14iki9xfoxthdv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:33:55.452438452Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:56.280606456Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "14iki9xfoxthdw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:33:55.452835909Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:33:56.280606456Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1lzyi66fhxj006", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:34:00.455038415Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:05.347632004Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1lzyi66fhxj007", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:34:00.458157200Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:05.347632004Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "1lzyi66fhxj008", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:00.458187651Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:05.347632004Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "1lzyi66fhxj009", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:00.458694991Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "cli.py:240" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:05.347632004Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "m77bwgfiuuwxp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:05.464576944Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:10.881743055Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "m77bwgfiuuwxq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:05.464653227Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:10.881743055Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "m77bwgfiuuwxr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:05.464665569Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:10.881743055Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "m77bwgfiuuwxs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:34:05.464866491Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:10.881743055Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "m77bwgfiuuwxt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:10.471162437Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:10.881743055Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "m77bwgfiuuwxu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:10.475824989Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:10.881743055Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "m77bwgfiuuwxv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:10.475881493Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:10.881743055Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "m77bwgfiuuwxw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:34:10.476816786Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:10.881743055Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "16gfkftf6igb2m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:15.484135869Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:15.891987434Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "16gfkftf6igb2n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:15.485156924Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:15.891987434Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "16gfkftf6igb2o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:15.485550518Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:15.891987434Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "16gfkftf6igb2p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:34:15.486074002Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:15.891987434Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "tfd0xafimvwn1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:20.491022308Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:22.968879815Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "tfd0xafimvwn2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:20.493207709Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:22.968879815Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "tfd0xafimvwn3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:20.493285832Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:22.968879815Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "tfd0xafimvwn4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:20.493711296Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "cli.py:240" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:22.968879815Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "tp3t0xflpr16z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:25.500437028Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:30.991203804Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "tp3t0xflpr170", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:25.502576898Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:30.991203804Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "tp3t0xflpr171", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:34:25.502641086Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:30.991203804Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "tp3t0xflpr172", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:25.502838904Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:30.991203804Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "tp3t0xflpr173", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:30.507717534Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:30.991203804Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "tp3t0xflpr174", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:30.512799338Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:30.991203804Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "tp3t0xflpr175", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:30.512854009Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:30.991203804Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "tp3t0xflpr176", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:30.513942714Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:30.991203804Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "14ik42efipgv1j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:35.518958421Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:41.066381045Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "14ik42efipgv1k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:35.522114363Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:41.066381045Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "14ik42efipgv1l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:34:35.522170804Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:41.066381045Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "14ik42efipgv1m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:35.524142921Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:41.066381045Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "14ik42efipgv1n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:40.530534089Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:41.066381045Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "14ik42efipgv1o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:40.538509817Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:41.066381045Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "14ik42efipgv1p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:40.538565120Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:41.066381045Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "14ik42efipgv1q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:34:40.541288862Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "cli.py:240" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:41.066381045Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1w9cjphfin6hut", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:34:45.546988656Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:47.095449781Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1w9cjphfin6huu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:45.605684630Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:47.095449781Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "1w9cjphfin6huv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:34:45.605720031Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:47.095449781Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "1w9cjphfin6huw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:45.607381173Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:47.095449781Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "dvri35fari0nk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:34:50.613015082Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:55.164975839Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "dvri35fari0nl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:50.615463453Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:55.164975839Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "dvri35fari0nm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:50.615645383Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:55.164975839Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "dvri35fari0nn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:34:50.615804729Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:34:55.164975839Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "3bxr37f6hcvrn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:55.621129840Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:01.185187901Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "3bxr37f6hcvro", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:34:55.624066901Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:01.185187901Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "3bxr37f6hcvrp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:34:55.624093725Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:01.185187901Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "3bxr37f6hcvrq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:34:55.624474214Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:01.185187901Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "3bxr37f6hcvrr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:35:00.626935254Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:01.185187901Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "3bxr37f6hcvrs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:35:00.629746480Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:01.185187901Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "3bxr37f6hcvrt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:35:00.629800152Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:01.185187901Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "3bxr37f6hcvru", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:35:00.630529568Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "cli.py:240" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:01.185187901Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "usrhlhfi2k51k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:35:05.635944225Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:10.944751128Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "usrhlhfi2k51l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:35:05.638542118Z", - "severity": "ERROR", - "labels": { - "process": "dagbag.py:341", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:10.944751128Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "usrhlhfi2k51m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:35:05.638587800Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:10.944751128Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "usrhlhfi2k51n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:35:05.639373913Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "cli.py:240" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:10.944751128Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "usrhlhfi2k51o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:35:10.651554440Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:10.944751128Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "usrhlhfi2k51p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:35:10.680124110Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:10.944751128Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "usrhlhfi2k51q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:35:10.680179399Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:10.944751128Z" - }, - { - "textPayload": "DAG is not found in loaded DAG bag. Retrying after 5 seconds.", - "insertId": "usrhlhfi2k51r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:35:10.684502874Z", - "severity": "INFO", - "labels": { - "process": "cli.py:240", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:10.944751128Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "2ifbgsfincess", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:35:15.688467067Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:16.079381690Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "2ifbgsfincest", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:35:15.690144160Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:16.079381690Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "2ifbgsfincesu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:35:15.690189486Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:16.079381690Z" - }, - { - "textPayload": "Dag 'data_analytics_dag' not found in path /home/airflow/gcs/dags/data_analytics_dag.py; trying path /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "2ifbgsfincesv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:35:15.691641480Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "cli.py:244" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:16.079381690Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "2ifbgsfincesw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:35:15.692349566Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:16.079381690Z" - }, - { - "textPayload": "Failed to import: /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "2ifbgsfincesx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:35:15.694843990Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:341" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:16.079381690Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py\", line 337, in parse loader.exec_module(new_module) File \"\", line 843, in exec_module File \"\", line 219, in _call_with_frames_removed File \"/home/airflow/gcs/dags/data_analytics_dag.py\", line 60, in create_batch = dataproc.DataprocCreateBatchOperator( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2323, in __init__ super().__init__(**kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 394, in apply_defaults result = func(self, **kwargs, default_args=default_args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 874, in __init__ self.dag = dag File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 981, in __setattr__ super().__setattr__(key, value) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/baseoperator.py\", line 1039, in dag dag.add_task(self) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/models/dag.py\", line 2349, in add_task raise AirflowException(\"DAG is missing the start_date parameter\")airflow.exceptions.AirflowException: DAG is missing the start_date parameter", - "insertId": "2ifbgsfincesy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:35:15.694881177Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:16.079381690Z" - }, - { - "textPayload": "[cdc86ce3-247d-4165-a09a-d6f8a1b01034] Failed to execute task Dag 'data_analytics_dag' could not be found; either it does not exist or it failed to parse..", - "insertId": "2ifbgsfincesz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:35:15.698408575Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:133" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:16.079381690Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/airflow/executors/celery_executor.py\", line 130, in _execute_in_fork args.func(args) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/cli/cli_parser.py\", line 52, in command return func(*args, **kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/utils/cli.py\", line 108, in wrapper return f(*args, **kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/cli/commands/task_command.py\", line 379, in task_run dag = get_dag( File \"/opt/python3.8/lib/python3.8/site-packages/airflow/utils/cli.py\", line 247, in get_dag raise AirflowException(airflow.exceptions.AirflowException: Dag 'data_analytics_dag' could not be found; either it does not exist or it failed to parse.", - "insertId": "2ifbgsfincet0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:35:15.698437961Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:16.079381690Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[cdc86ce3-247d-4165-a09a-d6f8a1b01034] raised unexpected: AirflowException('Celery command failed on host: airflow-worker-n79fs with celery_task_id cdc86ce3-247d-4165-a09a-d6f8a1b01034')", - "insertId": "2ifbgsfincet1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:35:15.732511955Z", - "severity": "ERROR", - "labels": { - "process": "trace.py:265", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:16.079381690Z" - }, - { - "textPayload": "Traceback (most recent call last): File \"/opt/python3.8/lib/python3.8/site-packages/celery/app/trace.py\", line 451, in trace_task R = retval = fun(*args, **kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/celery/app/trace.py\", line 734, in __protected_call__ return self.run(*args, **kwargs) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/executors/celery_executor.py\", line 96, in execute_command _execute_in_fork(command_to_exec, celery_task_id) File \"/opt/python3.8/lib/python3.8/site-packages/airflow/executors/celery_executor.py\", line 111, in _execute_in_fork raise AirflowException(msg)airflow.exceptions.AirflowException: Celery command failed on host: airflow-worker-n79fs with celery_task_id cdc86ce3-247d-4165-a09a-d6f8a1b01034", - "insertId": "2ifbgsfincet2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:35:15.732562856Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:35:16.079381690Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "81vdrrf6dbdrv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:36:04.819685854Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:36:10.989206571Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[2b6900dd-45cb-4d92-aeaa-f0dda7d0bf34] received", - "insertId": "14hjeyfi4o5bs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:40:00.657635151Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "[2b6900dd-45cb-4d92-aeaa-f0dda7d0bf34] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T08:30:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "14hjeyfi4o5bt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:40:00.664562375Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "14hjeyfi4o5bu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:40:00.945599282Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "14hjeyfi4o5bv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:40:00.947921317Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "14hjeyfi4o5bw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:40:01.103198347Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "14hjeyfi4o5bx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:40:01.957202348Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "14hjeyfi4o5by", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:40:02.537562385Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "14hjeyfi4o5bz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:40:02.709694055Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-13T08:30:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "14hjeyfi4o5c0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:40:02.733771906Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T08:30:00+00:00", - "process": "taskinstance.py:1091", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "14hjeyfi4o5c1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:40:02.734302781Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1289", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "execution-date": "2023-09-13T08:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "14hjeyfi4o5c2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:40:02.734743502Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:30:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "14hjeyfi4o5c3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:40:02.735150160Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1291", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T08:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "14hjeyfi4o5c4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:40:03.012845973Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "14hjeyfi4o5c5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:40:03.012903102Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "14hjeyfi4o5c6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:40:03.032141381Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "14hjeyfi4o5c7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:40:03.032224643Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Executing on 2023-09-13 08:30:00+00:00", - "insertId": "14hjeyfi4o5c8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:40:03.656047354Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T08:30:00+00:00", - "process": "taskinstance.py:1310", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Started process 1244 to run task", - "insertId": "14hjeyfi4o5c9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:40:03.691334728Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-13T08:30:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T08:30:00+00:00', '--job-id', '951', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpmkbldz0v']", - "insertId": "14hjeyfi4o5ca", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:40:03.694606561Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "process": "standard_task_runner.py:82", - "execution-date": "2023-09-13T08:30:00+00:00", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Job 951: Subtask echo", - "insertId": "14hjeyfi4o5cb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:40:03.695604905Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:83", - "task-id": "echo", - "execution-date": "2023-09-13T08:30:00+00:00", - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "14hjeyfi4o5cc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:40:04.088369433Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "execution-date": "2023-09-13T08:30:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T08:30:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T08:30:00+00:00", - "insertId": "14hjeyfi4o5cd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:40:04.295040437Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1518", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T08:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "14hjeyfi4o5ce", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:40:04.297381939Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "subprocess.py:63", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T08:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "14hjeyfi4o5cf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:40:04.299595203Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T08:30:00+00:00", - "process": "subprocess.py:75", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Output:", - "insertId": "14hjeyfi4o5cg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:40:04.439362076Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:30:00+00:00", - "process": "subprocess.py:86", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "test", - "insertId": "14hjeyfi4o5ch", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:40:04.450838209Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:30:00+00:00", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "subprocess.py:93" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "14hjeyfi4o5ci", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:40:04.451788890Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:30:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "subprocess.py:97", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T083000, start_date=20230913T084002, end_date=20230913T084004", - "insertId": "14hjeyfi4o5cj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:40:04.497303306Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1328", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T08:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:05.150694597Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1v5kbocf6k8m35", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:40:05.279899141Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T08:30:00+00:00", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "process": "local_task_job.py:212", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:11.225605152Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1v5kbocf6k8m36", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:40:05.357374875Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T08:30:00+00:00", - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:2599" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:11.225605152Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[2b6900dd-45cb-4d92-aeaa-f0dda7d0bf34] succeeded in 4.8608159930154216s: None", - "insertId": "1v5kbocf6k8m37", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:40:05.521907521Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:40:11.225605152Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "y51erefiqq70q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:41:06.719265484Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:41:12.880765688Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "xkonvff6dn6nn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:46:08.455987755Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:46:13.889791160Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[7852be89-e47f-4bb8-85ff-b4bc2d66ed70] received", - "insertId": "1ckfic4f809ywa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:50:01.178798377Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "[7852be89-e47f-4bb8-85ff-b4bc2d66ed70] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T08:40:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "1ckfic4f809ywb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:50:01.184695085Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1ckfic4f809ywc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:50:01.539605284Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1ckfic4f809ywd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:50:01.542373894Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1ckfic4f809ywe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:50:01.709466098Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "1ckfic4f809ywf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:50:02.610284158Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1ckfic4f809ywg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:50:03.113848282Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1ckfic4f809ywh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:50:03.241981742Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T08:40:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1ckfic4f809ywi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:50:03.261658641Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T08:40:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1091", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1ckfic4f809ywj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:50:03.262607287Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1289", - "map-index": "-1", - "task-id": "echo", - "execution-date": "2023-09-13T08:40:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "1ckfic4f809ywk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:50:03.263241051Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T08:40:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1ckfic4f809ywl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:50:03.263792441Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T08:40:00+00:00", - "process": "taskinstance.py:1291", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1ckfic4f809ywm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:50:03.570059282Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1ckfic4f809ywn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:50:03.570137395Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1ckfic4f809ywo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:50:03.588411909Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1ckfic4f809ywp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:50:03.588452796Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Executing on 2023-09-13 08:40:00+00:00", - "insertId": "1ckfic4f809ywq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:50:04.557286390Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T08:40:00+00:00", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1310" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Started process 1480 to run task", - "insertId": "1ckfic4f809ywr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:50:04.622058204Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T08:40:00+00:00", - "map-index": "-1", - "task-id": "echo", - "try-number": "1", - "process": "standard_task_runner.py:55", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T08:40:00+00:00', '--job-id', '952', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpbot8f4u6']", - "insertId": "1ckfic4f809yws", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:50:04.624604059Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:82", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T08:40:00+00:00", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Job 952: Subtask echo", - "insertId": "1ckfic4f809ywt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:50:04.625537456Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "standard_task_runner.py:83", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "execution-date": "2023-09-13T08:40:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1ckfic4f809ywu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:50:05.008361063Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T08:40:00+00:00", - "process": "task_command.py:393", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T08:40:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T08:40:00+00:00", - "insertId": "1ckfic4f809ywv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:50:05.190654303Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:40:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "taskinstance.py:1518", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1ckfic4f809yww", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:50:05.193017165Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:63", - "map-index": "-1", - "execution-date": "2023-09-13T08:40:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1ckfic4f809ywx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:50:05.195151857Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T08:40:00+00:00", - "process": "subprocess.py:75", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Output:", - "insertId": "1ckfic4f809ywy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:50:05.340728169Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T08:40:00+00:00", - "task-id": "echo", - "map-index": "-1", - "try-number": "1", - "process": "subprocess.py:86", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "test", - "insertId": "1ckfic4f809ywz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:50:05.345455356Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "subprocess.py:93", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T08:40:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1ckfic4f809yx0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:50:05.346583196Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T08:40:00+00:00", - "process": "subprocess.py:97", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T084000, start_date=20230913T085003, end_date=20230913T085005", - "insertId": "1ckfic4f809yx1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T08:50:05.391322848Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:1328", - "map-index": "-1", - "execution-date": "2023-09-13T08:40:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:06.842337829Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1eiai1cf6cbzrn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:50:06.124843566Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "local_task_job.py:212", - "try-number": "1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T08:40:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:11.844347549Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1eiai1cf6cbzro", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:50:06.178201264Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T08:40:00+00:00", - "map-index": "-1", - "process": "taskinstance.py:2599" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:11.844347549Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[7852be89-e47f-4bb8-85ff-b4bc2d66ed70] succeeded in 5.1763471770100296s: None", - "insertId": "1eiai1cf6cbzrp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T08:50:06.358946934Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:50:11.844347549Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "atkwylfi1cs2w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:51:20.205550558Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:51:25.571914752Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1cu2uvqf82sjmx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T08:56:15.952781512Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T08:56:18.869684616Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[13013c16-0ac0-4f90-8e82-cfeea1158749] received", - "insertId": "640cxhfiu4sf3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:03.366627618Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "[13013c16-0ac0-4f90-8e82-cfeea1158749] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T08:50:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "640cxhfiu4sf4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:03.372457140Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "640cxhfiu4sf5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:03.902190487Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "640cxhfiu4sf6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:00:03.907841308Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "640cxhfiu4sf7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:04.087860344Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "640cxhfiu4sf8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:00:05.030691949Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "640cxhfiu4sf9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:00:05.661664756Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "640cxhfiu4sfa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:06.045165684Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T08:50:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1091", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "640cxhfiu4sfb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:06.111336664Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T08:50:00+00:00", - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "640cxhfiu4sfc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:06.116148268Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T08:50:00+00:00", - "process": "taskinstance.py:1289", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "640cxhfiu4sfd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:00:06.117818892Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T08:50:00+00:00", - "task-id": "echo", - "map-index": "-1", - "process": "taskinstance.py:1290", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "640cxhfiu4sfe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:06.119272036Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T08:50:00+00:00", - "process": "taskinstance.py:1291", - "try-number": "1", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "640cxhfiu4sff", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:06.723617354Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "640cxhfiu4sfg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:06.723675511Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "640cxhfiu4sfh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:06.825250505Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "640cxhfiu4sfi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:00:06.825310525Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "Executing on 2023-09-13 08:50:00+00:00", - "insertId": "640cxhfiu4sfj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:00:07.707925660Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1310", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T08:50:00+00:00", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "Started process 1714 to run task", - "insertId": "640cxhfiu4sfk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:07.747805182Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:50:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1", - "process": "standard_task_runner.py:55", - "task-id": "echo", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T08:50:00+00:00', '--job-id', '954', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpgq1q7yri']", - "insertId": "640cxhfiu4sfl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:00:07.804759379Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T08:50:00+00:00", - "map-index": "-1", - "process": "standard_task_runner.py:82", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "Job 954: Subtask echo", - "insertId": "640cxhfiu4sfm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:00:07.805542887Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:50:00+00:00", - "process": "standard_task_runner.py:83", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:08.890692336Z" - }, - { - "textPayload": "I0913 09:00:07.908761 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "ac5afiq52rd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:00:07.908972091Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T09:00:13.354444365Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "l3jawpfow30s5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:00:08.598357415Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T08:50:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "process": "task_command.py:393", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:13.880866952Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T08:50:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T08:50:00+00:00", - "insertId": "l3jawpfow30s6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:09.127994359Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T08:50:00+00:00", - "task-id": "echo", - "process": "taskinstance.py:1518", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:13.880866952Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "l3jawpfow30s7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:09.130136424Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "process": "subprocess.py:63", - "execution-date": "2023-09-13T08:50:00+00:00", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:13.880866952Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "l3jawpfow30s8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:00:09.132096382Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "try-number": "1", - "execution-date": "2023-09-13T08:50:00+00:00", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "subprocess.py:75" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:13.880866952Z" - }, - { - "textPayload": "Output:", - "insertId": "l3jawpfow30s9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:00:09.413019263Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T08:50:00+00:00", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "process": "subprocess.py:86" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:13.880866952Z" - }, - { - "textPayload": "test", - "insertId": "l3jawpfow30sa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:00:09.421270173Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T08:50:00+00:00", - "task-id": "echo", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "subprocess.py:93", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:13.880866952Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "l3jawpfow30sb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:09.423888840Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T08:50:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "subprocess.py:97" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:13.880866952Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T085000, start_date=20230913T090006, end_date=20230913T090009", - "insertId": "l3jawpfow30sc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:00:09.523195500Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1328", - "workflow": "airflow_monitoring", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-13T08:50:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:13.880866952Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "l3jawpfow30sd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:00:10.544550671Z", - "severity": "INFO", - "labels": { - "process": "local_task_job.py:212", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T08:50:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:13.880866952Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "l3jawpfow30se", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:00:10.641250626Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1", - "execution-date": "2023-09-13T08:50:00+00:00", - "process": "taskinstance.py:2599" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:13.880866952Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[13013c16-0ac0-4f90-8e82-cfeea1158749] succeeded in 7.553143467986956s: None", - "insertId": "l3jawpfow30sf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:00:10.923610151Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:00:13.880866952Z" - }, - { - "textPayload": "I0913 09:01:06.824842 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "skvterfix7zj3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:01:06.825064041Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T09:01:12.777408660Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1bgjvxgfirkn0g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:01:16.213317447Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:01:21.946254849Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[7d7d7bc0-4b0d-488a-ad4e-b060a1c08d13] received", - "insertId": "5ud9bhflpekuf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:04:43.857691207Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:47.903320494Z" - }, - { - "textPayload": "[7d7d7bc0-4b0d-488a-ad4e-b060a1c08d13] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'run_bq_external_ingestion', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "5ud9bhflpekug", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:04:43.864995600Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:47.903320494Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "5ud9bhflpekuh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:04:44.914904648Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:47.903320494Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "5ud9bhflpekui", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:04:44.918549205Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:47.903320494Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "5ud9bhflpekuj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:04:45.424235566Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:47.903320494Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "45zp0lfirj5yh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:04:48.434227091Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:53.824833543Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "e5gyx8fi4cet9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:04:53.016466212Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "e5gyx8fi4ceta", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:04:53.213836506Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "e5gyx8fi4cetb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:04:53.233360567Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "run_bq_external_ingestion", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "e5gyx8fi4cetc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:04:53.233637994Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "process": "taskinstance.py:1289", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "e5gyx8fi4cetd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:04:53.234013030Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1290", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "e5gyx8fi4cete", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:04:53.234421302Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "run_bq_external_ingestion", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "e5gyx8fi4cetf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:04:53.550072152Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "e5gyx8fi4cetg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:04:53.550147031Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "e5gyx8fi4ceth", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:04:53.623555595Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "e5gyx8fi4ceti", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:04:53.623618613Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "e5gyx8fi4cetj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:04:54.538354151Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1310", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Started process 1825 to run task", - "insertId": "e5gyx8fi4cetk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:04:54.571320923Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "process": "standard_task_runner.py:55", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'run_bq_external_ingestion', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '955', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmp2h7h8d3m']", - "insertId": "e5gyx8fi4cetl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:04:54.571718490Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "workflow": "data_analytics_dag", - "task-id": "run_bq_external_ingestion", - "process": "standard_task_runner.py:82", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Job 955: Subtask run_bq_external_ingestion", - "insertId": "e5gyx8fi4cetm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:04:54.573906045Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:83", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "try-number": "1", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "e5gyx8fi4cetn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:04:54.953304700Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=run_bq_external_ingestion\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "e5gyx8fi4ceto", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:04:55.215342087Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "run_bq_external_ingestion", - "process": "taskinstance.py:1518", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "e5gyx8fi4cetp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:04:55.254362843Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "run_bq_external_ingestion", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "base.py:73", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Using existing BigQuery table for storing data...", - "insertId": "e5gyx8fi4cetq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:04:55.256654798Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "try-number": "1", - "process": "gcs_to_bigquery.py:375", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "e5gyx8fi4cetr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:04:55.257319152Z", - "severity": "INFO", - "labels": { - "process": "credentials_provider.py:353", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "run_bq_external_ingestion", - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Project is not included in destination_project_dataset_table: holiday_weather.holidays; using project \"acceldata-acm\"", - "insertId": "e5gyx8fi4cets", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:04:55.299752854Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "bigquery.py:2314", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Executing: {'load': {'autodetect': True, 'createDisposition': 'CREATE_IF_NEEDED', 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays'}, 'sourceFormat': 'CSV', 'sourceUris': ['gs://openlineagedemo/holidays.csv'], 'writeDisposition': 'WRITE_TRUNCATE', 'ignoreUnknownValues': False, 'schema': {'fields': [{'name': 'Date', 'type': 'DATE'}, {'name': 'Holiday', 'type': 'STRING'}]}, 'skipLeadingRows': 1, 'fieldDelimiter': ',', 'quote': None, 'allowQuotedNewlines': False, 'encoding': 'UTF-8'}}", - "insertId": "e5gyx8fi4cett", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:04:55.301295321Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "gcs_to_bigquery.py:379", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_run_bq_external_ingestion_2023_09_12T00_00_00_00_00_1cafde17117694b829a5837199a19002", - "insertId": "e5gyx8fi4cetu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:04:55.303590072Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "process": "bigquery.py:1596", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:04:58.836464617Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=run_bq_external_ingestion, execution_date=20230912T000000, start_date=20230913T090453, end_date=20230913T090457", - "insertId": "yypoeufowby2z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:04:57.892807650Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1328", - "task-id": "run_bq_external_ingestion", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:03.838569159Z" - }, - { - "textPayload": "I0913 09:04:58.627593 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "vd86qyfi4h44o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:04:58.627818207Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T09:05:05.025276596Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "yypoeufowby30", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:04:58.973392219Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "run_bq_external_ingestion", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "process": "local_task_job.py:212", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:03.838569159Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "yypoeufowby31", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:04:59.116383582Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "task-id": "run_bq_external_ingestion", - "process": "taskinstance.py:2599", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:03.838569159Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[ee7135c8-5dec-4896-af77-83c24fd35695] received", - "insertId": "yypoeufowby32", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:04:59.259152470Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:03.838569159Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[af9b7f44-e4f0-438a-b75a-1af546c8f73a] received", - "insertId": "yypoeufowby33", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:04:59.263984923Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:03.838569159Z" - }, - { - "textPayload": "[ee7135c8-5dec-4896-af77-83c24fd35695] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2020', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "yypoeufowby34", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:04:59.271151296Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:03.838569159Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[7d7d7bc0-4b0d-488a-ad4e-b060a1c08d13] succeeded in 15.54631611998775s: None", - "insertId": "yypoeufowby35", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:04:59.409824090Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:03.838569159Z" - }, - { - "textPayload": "[af9b7f44-e4f0-438a-b75a-1af546c8f73a] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2021', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "yypoeufowby36", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:04:59.506113778Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:03.838569159Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "yypoeufowby37", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:00.310022442Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:03.838569159Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "yypoeufowby38", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:00.314572064Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:03.838569159Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "yypoeufowby39", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:00.707642479Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:03.838569159Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "yypoeufowby3a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:00.719918905Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:03.838569159Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "yypoeufowby3b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:00.722951087Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:03.838569159Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "yypoeufowby3c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:01.127110283Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:03.838569159Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "11h0hh2floktln", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:03.024127066Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:08.834271402Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "11h0hh2floktlo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:03.716333255Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:08.834271402Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1ohlhx5fp1aws6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:11.528617818Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:13.877073150Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1ohlhx5fp1aws8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:12.027724209Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "taskinstance.py:1091", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:13.877073150Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1ohlhx5fp1awsa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:12.114604226Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:13.877073150Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1ohlhx5fp1awsb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:12.115170338Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1289" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:13.877073150Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "1ohlhx5fp1awsc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:12.116147023Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "process": "taskinstance.py:1290", - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:13.877073150Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1ohlhx5fp1awsd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:12.116971136Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "map-index": "-1", - "process": "taskinstance.py:1291" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:13.877073150Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "18yr8ebfidq0rl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:12.850602010Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "18yr8ebfidq0rm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:12.850668673Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "18yr8ebfidq0rn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:12.961104116Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "18yr8ebfidq0ro", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:12.961169127Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "18yr8ebfidq0rp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:13.231204304Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "18yr8ebfidq0rq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:13.560691960Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "process": "taskinstance.py:1091", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "18yr8ebfidq0rr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:13.621393285Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "18yr8ebfidq0rs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:13.622501836Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1289", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "18yr8ebfidq0rt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:13.623687885Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "process": "taskinstance.py:1290", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "18yr8ebfidq0ru", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:13.623704899Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "18yr8ebfidq0rv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:13.877563134Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "taskinstance.py:1310", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Started process 1837 to run task", - "insertId": "18yr8ebfidq0rw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:13.891720224Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "standard_task_runner.py:55", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2020', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '956', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpl4atar4j']", - "insertId": "18yr8ebfidq0rx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:13.899704305Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "1", - "process": "standard_task_runner.py:82" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Job 956: Subtask join_bq_datasets.bq_join_holidays_weather_data_2020", - "insertId": "18yr8ebfidq0ry", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:13.901103535Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:83", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "18yr8ebfidq0rz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:14.034785746Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "18yr8ebfidq0s0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:14.034849204Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "18yr8ebfidq0s1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:14.127993759Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "18yr8ebfidq0s2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:14.128042414Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "18yr8ebfidq0s3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:14.553140033Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "process": "task_command.py:393", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "18yr8ebfidq0s4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:14.767125663Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1310", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Started process 1840 to run task", - "insertId": "18yr8ebfidq0s5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:14.812435298Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:55", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2021', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '957', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpjxlwi6ub']", - "insertId": "18yr8ebfidq0s6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:14.818639239Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "process": "standard_task_runner.py:82", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Job 957: Subtask join_bq_datasets.bq_join_holidays_weather_data_2021", - "insertId": "18yr8ebfidq0s7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:14.819957894Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "process": "standard_task_runner.py:83" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2020\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "18yr8ebfidq0s8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:14.862919393Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1518", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "18yr8ebfidq0s9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:14.950842399Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag", - "process": "base.py:73" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2020 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "18yr8ebfidq0sa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:14.954647390Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "process": "bigquery.py:2710" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "18yr8ebfidq0sb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:14.955521960Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "credentials_provider.py:353" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2020_2023_09_12T00_00_00_00_00_d88ec24c536f7dd2a005e852804b5238", - "insertId": "18yr8ebfidq0sc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:15.024740399Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "bigquery.py:1596", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "18yr8ebfidq0sd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:15.423514187Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "task_command.py:393", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2021\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "18yr8ebfidq0se", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:15.681302790Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "process": "taskinstance.py:1518", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "18yr8ebfidq0sf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:15.753808546Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "process": "base.py:73", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2021 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "18yr8ebfidq0sg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:15.756312034Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "bigquery.py:2710" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "18yr8ebfidq0sh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:15.757206040Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "credentials_provider.py:353", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2021_2023_09_12T00_00_00_00_00_fa5c3d6c079bb3bb78768a9ca18fb42d", - "insertId": "18yr8ebfidq0si", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:15.803356472Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "process": "bigquery.py:1596", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:18.855416060Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2020, execution_date=20230912T000000, start_date=20230913T090512, end_date=20230913T090517", - "insertId": "1cubfonf8cstwu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:17.928227903Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1328", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:23.966286671Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1cubfonf8cstwv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:18.700976997Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "local_task_job.py:212", - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:23.966286671Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1cubfonf8cstww", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:18.777487653Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:2599", - "try-number": "1", - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:23.966286671Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2021, execution_date=20230912T000000, start_date=20230913T090513, end_date=20230913T090518", - "insertId": "1cubfonf8cstwx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:18.893024355Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "taskinstance.py:1328", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:23.966286671Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[ee7135c8-5dec-4896-af77-83c24fd35695] succeeded in 19.769922759005567s: None", - "insertId": "1cubfonf8cstwy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:19.039612313Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:23.966286671Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1cubfonf8cstwz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:19.656886827Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "process": "local_task_job.py:212" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:23.966286671Z" - }, - { - "textPayload": "1 downstream tasks scheduled from follow-on schedule check", - "insertId": "1cubfonf8cstx0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:19.725813092Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "process": "taskinstance.py:2599", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:23.966286671Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[af9b7f44-e4f0-438a-b75a-1af546c8f73a] succeeded in 20.613432483020006s: None", - "insertId": "1cubfonf8cstx1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:19.897062859Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:23.966286671Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[14b8da31-621f-4d06-9729-39e4df7fc025] received", - "insertId": "1cubfonf8cstx2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:20.063455771Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:23.966286671Z" - }, - { - "textPayload": "[14b8da31-621f-4d06-9729-39e4df7fc025] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'create_batch', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "1cubfonf8cstx3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:20.069079870Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:23.966286671Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1cubfonf8cstx4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:20.329620006Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:23.966286671Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1cubfonf8cstx5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:20.331279605Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:23.966286671Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1cubfonf8cstx6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:20.446044551Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:23.966286671Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1cubfonf8cstx7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:21.406111082Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:23.966286671Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "15mf2pyf8jg2uh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:24.928291635Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "15mf2pyf8jg2ui", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:25.148987627Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "15mf2pyf8jg2uj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:25.178372483Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1091", - "task-id": "create_batch", - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "15mf2pyf8jg2uk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:25.178916326Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1289", - "workflow": "data_analytics_dag", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "create_batch" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "15mf2pyf8jg2ul", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:25.179472030Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "15mf2pyf8jg2um", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:25.179998286Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "taskinstance.py:1291", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "create_batch" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "15mf2pyf8jg2un", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:25.598661469Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "15mf2pyf8jg2uo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:25.598705269Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "15mf2pyf8jg2up", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:25.629510450Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "15mf2pyf8jg2uq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:25.629537144Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "15mf2pyf8jg2ur", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:26.543718880Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "create_batch", - "process": "taskinstance.py:1310" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Started process 1859 to run task", - "insertId": "15mf2pyf8jg2us", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:26.554587499Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "standard_task_runner.py:55", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "task-id": "create_batch", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'create_batch', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '958', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmppa57j5k5']", - "insertId": "15mf2pyf8jg2ut", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:26.559811807Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "map-index": "-1", - "process": "standard_task_runner.py:82", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Job 958: Subtask create_batch", - "insertId": "15mf2pyf8jg2uu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:26.560309175Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:83", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "task-id": "create_batch" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "15mf2pyf8jg2uv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:26.960852054Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "map-index": "-1", - "task-id": "create_batch", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=create_batch\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "15mf2pyf8jg2uw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:27.331240193Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1518", - "try-number": "1", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "15mf2pyf8jg2ux", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:27.368862648Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "process": "base.py:73", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Creating batch data-processing-20230912t000000", - "insertId": "15mf2pyf8jg2uy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:27.370552402Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "dataproc.py:2349", - "try-number": "1", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Once started, the batch job will be available at https://console.cloud.google.com/dataproc/batches/us-west1/data-processing-20230912t000000/monitoring?project=acceldata-acm", - "insertId": "15mf2pyf8jg2uz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:27.371056651Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "1", - "map-index": "-1", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "process": "dataproc.py:2350" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "15mf2pyf8jg2v0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:27.371967852Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "credentials_provider.py:353", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:31.010389428Z" - }, - { - "textPayload": "Task failed with exception\nTraceback (most recent call last):\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/dataproc.py\", line 261, in wait_for_operation\n return operation.result(timeout=timeout, retry=result_retry)\n File \"/opt/python3.8/lib/python3.8/site-packages/google/api_core/future/polling.py\", line 261, in result\n raise self._exception\ngoogle.api_core.exceptions.Aborted: 409 Constraint constraints/compute.requireOsLogin violated for project 910708407740.\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2371, in execute\n result = hook.wait_for_operation(\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/dataproc.py\", line 264, in wait_for_operation\n raise AirflowException(error)\nairflow.exceptions.AirflowException: 409 Constraint constraints/compute.requireOsLogin violated for project 910708407740.", - "insertId": "5tw3o6fi33m04", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:45.351260243Z", - "severity": "ERROR", - "labels": { - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1778", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "create_batch", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:51.189848985Z" - }, - { - "textPayload": "Marking task as UP_FOR_RETRY. dag_id=data_analytics_dag, task_id=create_batch, execution_date=20230912T000000, start_date=20230913T090525, end_date=20230913T090545", - "insertId": "5tw3o6fi33m05", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:05:45.358837794Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1328", - "task-id": "create_batch", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:51.189848985Z" - }, - { - "textPayload": "Failed to execute job 958 for task create_batch (409 Constraint constraints/compute.requireOsLogin violated for project 910708407740.; 1859)", - "insertId": "5tw3o6fi33m06", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:45.384633687Z", - "severity": "ERROR", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "1", - "process": "standard_task_runner.py:100", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "create_batch" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:51.189848985Z" - }, - { - "textPayload": "Task exited with return code 1", - "insertId": "5tw3o6fi33m07", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:45.526320941Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "local_task_job.py:212", - "task-id": "create_batch", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:51.189848985Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "5tw3o6fi33m08", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:05:45.575946312Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "taskinstance.py:2599", - "task-id": "create_batch", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:51.189848985Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[14b8da31-621f-4d06-9729-39e4df7fc025] succeeded in 25.680595037003513s: None", - "insertId": "5tw3o6fi33m09", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:45.747187094Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:05:51.189848985Z" - }, - { - "textPayload": "I0913 09:05:54.978170 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1htx5o8fiu0ytv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:05:54.978342360Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T09:06:00.460256005Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1rjtgdefid4gqj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:06:28.929200861Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:06:35.288639437Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[d5527f0b-52a2-44ca-b699-fbf8d2a2171c] received", - "insertId": "yyz1o6fi0w3zx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:01.157961896Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "[d5527f0b-52a2-44ca-b699-fbf8d2a2171c] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T09:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "yyz1o6fi0w3zy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:01.169280791Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "yyz1o6fi0w3zz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:01.557869004Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "yyz1o6fi0w400", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:01.560654835Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "yyz1o6fi0w401", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:01.723460468Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "yyz1o6fi0w402", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:02.704913280Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "yyz1o6fi0w403", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:03.227046753Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "yyz1o6fi0w404", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:03.357357961Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T09:00:00+00:00", - "task-id": "echo", - "process": "taskinstance.py:1091", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "yyz1o6fi0w405", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:03.376116018Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-13T09:00:00+00:00", - "process": "taskinstance.py:1091", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "yyz1o6fi0w406", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:03.376614241Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T09:00:00+00:00", - "task-id": "echo", - "process": "taskinstance.py:1289", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "yyz1o6fi0w407", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:03.377147264Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1290", - "execution-date": "2023-09-13T09:00:00+00:00", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "yyz1o6fi0w408", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:03.377565262Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T09:00:00+00:00", - "process": "taskinstance.py:1291", - "map-index": "-1", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "yyz1o6fi0w409", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:03.787230386Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "yyz1o6fi0w40a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:03.787277604Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "yyz1o6fi0w40b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:03.813976612Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "yyz1o6fi0w40c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:03.814007683Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Executing on 2023-09-13 09:00:00+00:00", - "insertId": "yyz1o6fi0w40d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:05.132963461Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1310", - "workflow": "airflow_monitoring", - "try-number": "1", - "execution-date": "2023-09-13T09:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Started process 1991 to run task", - "insertId": "yyz1o6fi0w40e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:05.164599814Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:55", - "try-number": "1", - "execution-date": "2023-09-13T09:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T09:00:00+00:00', '--job-id', '959', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp514wnb_s']", - "insertId": "yyz1o6fi0w40f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:05.166912534Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T09:00:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "standard_task_runner.py:82" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Job 959: Subtask echo", - "insertId": "yyz1o6fi0w40g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:05.167881859Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1", - "process": "standard_task_runner.py:83", - "try-number": "1", - "execution-date": "2023-09-13T09:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "yyz1o6fi0w40h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:05.517006437Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T09:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T09:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T09:00:00+00:00", - "insertId": "yyz1o6fi0w40i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:05.705446139Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:1518", - "execution-date": "2023-09-13T09:00:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "yyz1o6fi0w40j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:05.708584513Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "subprocess.py:63", - "task-id": "echo", - "execution-date": "2023-09-13T09:00:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "yyz1o6fi0w40k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:05.710489788Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "subprocess.py:75", - "execution-date": "2023-09-13T09:00:00+00:00", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:06.835699703Z" - }, - { - "textPayload": "Output:", - "insertId": "1thtml5fap009v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:05.850900523Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "subprocess.py:86", - "execution-date": "2023-09-13T09:00:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:11.886815326Z" - }, - { - "textPayload": "test", - "insertId": "1thtml5fap009w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:05.859551753Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "subprocess.py:93", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T09:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:11.886815326Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1thtml5fap009x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:05.860200953Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T09:00:00+00:00", - "map-index": "-1", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "process": "subprocess.py:97" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:11.886815326Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T090000, start_date=20230913T091003, end_date=20230913T091005", - "insertId": "1thtml5fap009y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:05.912620999Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T09:00:00+00:00", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1328", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:11.886815326Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1thtml5fap009z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:06.630888566Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "local_task_job.py:212", - "execution-date": "2023-09-13T09:00:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:11.886815326Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1thtml5fap00a0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:06.713248168Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T09:00:00+00:00", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:2599" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:11.886815326Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[d5527f0b-52a2-44ca-b699-fbf8d2a2171c] succeeded in 5.703336701000808s: None", - "insertId": "1thtml5fap00a1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:06.865705017Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:11.886815326Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[9feed948-981d-472a-bcf3-0d672a7ea635] received", - "insertId": "322ktsfp6ymfj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:46.083429679Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:51.832748435Z" - }, - { - "textPayload": "[9feed948-981d-472a-bcf3-0d672a7ea635] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'create_batch', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "322ktsfp6ymfk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:46.088956154Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:51.832748435Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "322ktsfp6ymfl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:46.411305376Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:51.832748435Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "322ktsfp6ymfm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:46.413634178Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:51.832748435Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "322ktsfp6ymfn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:46.534913661Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:51.832748435Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "322ktsfp6ymfo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:47.417336665Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:51.832748435Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1huby86fj0b1nf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:51.011078575Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1huby86fj0b1ng", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:51.133746262Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1091", - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "workflow": "data_analytics_dag", - "task-id": "create_batch", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1huby86fj0b1nh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:51.153359240Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1091", - "workflow": "data_analytics_dag", - "map-index": "-1", - "try-number": "2", - "task-id": "create_batch" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1huby86fj0b1ni", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:51.153766578Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1289", - "task-id": "create_batch", - "try-number": "2", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Starting attempt 2 of 3", - "insertId": "1huby86fj0b1nj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:51.154318740Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "map-index": "-1", - "task-id": "create_batch", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1290", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1huby86fj0b1nk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:51.154810441Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "try-number": "2", - "task-id": "create_batch", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "taskinstance.py:1291" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1huby86fj0b1nl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:51.609919008Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1huby86fj0b1nm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:51.609972180Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1huby86fj0b1nn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:51.644059734Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1huby86fj0b1no", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:51.644134499Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "1huby86fj0b1np", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:52.715497276Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "task-id": "create_batch", - "try-number": "2", - "process": "taskinstance.py:1310" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Started process 2021 to run task", - "insertId": "1huby86fj0b1nq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:52.751351848Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "standard_task_runner.py:55", - "try-number": "2", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "create_batch" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'create_batch', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '960', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpunnd9y7i']", - "insertId": "1huby86fj0b1nr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:52.752656510Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "map-index": "-1", - "process": "standard_task_runner.py:82", - "task-id": "create_batch", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Job 960: Subtask create_batch", - "insertId": "1huby86fj0b1ns", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:52.753888385Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "worker_id": "airflow-worker-n79fs", - "try-number": "2", - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:83", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1huby86fj0b1nt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:53.131311115Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=create_batch\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=2\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "1huby86fj0b1nu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:54.047113241Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1518", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "task-id": "create_batch" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "1huby86fj0b1nv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:54.299066473Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "base.py:73", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Creating batch data-processing-20230912t000000", - "insertId": "1huby86fj0b1nw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:54.300986762Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "dataproc.py:2349", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "create_batch", - "worker_id": "airflow-worker-n79fs", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Once started, the batch job will be available at https://console.cloud.google.com/dataproc/batches/us-west1/data-processing-20230912t000000/monitoring?project=acceldata-acm", - "insertId": "1huby86fj0b1nx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:54.301403612Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "create_batch", - "try-number": "2", - "process": "dataproc.py:2350" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "1huby86fj0b1ny", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:54.302311281Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "credentials_provider.py:353", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Batch with given id already exists", - "insertId": "1huby86fj0b1nz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:55.694173455Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "create_batch", - "process": "dataproc.py:2394", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Attaching to the job data-processing-20230912t000000 if it is still running.", - "insertId": "1huby86fj0b1o0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:55.695264262Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "dataproc.py:2399", - "task-id": "create_batch", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:10:56.838235878Z" - }, - { - "textPayload": "Task failed with exception\nTraceback (most recent call last):\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2426, in execute\n self.handle_batch_status(context, result.state, batch_id)\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2454, in handle_batch_status\n raise AirflowException(\"Batch job %s failed. Driver Logs: %s\", batch_id, link)\nairflow.exceptions.AirflowException: ('Batch job %s failed. Driver Logs: %s', 'data-processing-20230912t000000', 'https://console.cloud.google.com/dataproc/batches/us-west1/data-processing-20230912t000000/monitoring?project=acceldata-acm')", - "insertId": "15mf2pyf8jtnhl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:55.863611811Z", - "severity": "ERROR", - "labels": { - "process": "taskinstance.py:1778", - "worker_id": "airflow-worker-n79fs", - "task-id": "create_batch", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:11:01.914580782Z" - }, - { - "textPayload": "Marking task as UP_FOR_RETRY. dag_id=data_analytics_dag, task_id=create_batch, execution_date=20230912T000000, start_date=20230913T091051, end_date=20230913T091055", - "insertId": "15mf2pyf8jtnhm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:55.873198903Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "2", - "task-id": "create_batch", - "map-index": "-1", - "process": "taskinstance.py:1328", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:11:01.914580782Z" - }, - { - "textPayload": "Failed to execute job 960 for task create_batch (('Batch job %s failed. Driver Logs: %s', 'data-processing-20230912t000000', 'https://console.cloud.google.com/dataproc/batches/us-west1/data-processing-20230912t000000/monitoring?project=acceldata-acm'); 2021)", - "insertId": "15mf2pyf8jtnhn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:55.896762321Z", - "severity": "ERROR", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:100", - "try-number": "2", - "map-index": "-1", - "task-id": "create_batch", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:11:01.914580782Z" - }, - { - "textPayload": "Task exited with return code 1", - "insertId": "15mf2pyf8jtnho", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:10:56.062156726Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "local_task_job.py:212", - "try-number": "2", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:11:01.914580782Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "15mf2pyf8jtnhp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:10:56.131723403Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:2599", - "workflow": "data_analytics_dag", - "map-index": "-1", - "try-number": "2", - "worker_id": "airflow-worker-n79fs", - "task-id": "create_batch", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:11:01.914580782Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[9feed948-981d-472a-bcf3-0d672a7ea635] succeeded in 10.21041198898456s: None", - "insertId": "15mf2pyf8jtnhq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:10:56.297260831Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:11:01.914580782Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1mjy76tfiqaiw6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:11:21.829410527Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:11:27.904566374Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[6c07aaac-8b81-4872-9335-ab0c97e694ce] received", - "insertId": "95nhkhfiwqzct", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:15:56.461971044Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:01.827042643Z" - }, - { - "textPayload": "[6c07aaac-8b81-4872-9335-ab0c97e694ce] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'create_batch', 'scheduled__2023-09-12T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "95nhkhfiwqzcu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:15:56.467447239Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:01.827042643Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "95nhkhfiwqzcv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:15:56.825804465Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:01.827042643Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "95nhkhfiwqzcw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:15:56.828142218Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:01.827042643Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "95nhkhfiwqzcx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:15:56.953891313Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:01.827042643Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "95nhkhfiwqzcy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:15:57.849634172Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:01.827042643Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "13oiz3nfllrit7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:16:01.480302363Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "13oiz3nfllrit8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:16:01.645799545Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "3", - "map-index": "-1", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "13oiz3nfllrit9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:16:01.667575037Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "create_batch", - "process": "taskinstance.py:1091", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "3", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "13oiz3nfllrita", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:16:01.668030755Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "3", - "process": "taskinstance.py:1289" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Starting attempt 3 of 3", - "insertId": "13oiz3nfllritb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:16:01.668419451Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "taskinstance.py:1290", - "try-number": "3", - "task-id": "create_batch", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "13oiz3nfllritc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:16:01.668824387Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "3", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "create_batch", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "13oiz3nfllritd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:16:02.131429868Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "13oiz3nfllrite", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:16:02.131474140Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "13oiz3nfllritf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:16:02.155833141Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "13oiz3nfllritg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:16:02.155876846Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Executing on 2023-09-12 00:00:00+00:00", - "insertId": "13oiz3nfllrith", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:16:03.422847366Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "3", - "workflow": "data_analytics_dag", - "task-id": "create_batch", - "process": "taskinstance.py:1310", - "execution-date": "2023-09-12T00:00:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'create_batch', 'scheduled__2023-09-12T00:00:00+00:00', '--job-id', '961', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpv_3p5b7o']", - "insertId": "13oiz3nfllriti", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:16:03.455931128Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "try-number": "3", - "execution-date": "2023-09-12T00:00:00+00:00", - "process": "standard_task_runner.py:82", - "task-id": "create_batch", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Started process 2155 to run task", - "insertId": "13oiz3nfllritj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:16:03.455971556Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "3", - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:55", - "task-id": "create_batch", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Job 961: Subtask create_batch", - "insertId": "13oiz3nfllritk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:16:03.457999594Z", - "severity": "INFO", - "labels": { - "try-number": "3", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "task-id": "create_batch", - "process": "standard_task_runner.py:83", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "13oiz3nfllritl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:16:04.022378473Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "3", - "workflow": "data_analytics_dag", - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs", - "task-id": "create_batch" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=create_batch\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-12T00:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=3\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-12T00:00:00+00:00", - "insertId": "13oiz3nfllritm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:16:04.420054116Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "create_batch", - "try-number": "3", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1518" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "13oiz3nfllritn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:16:04.461046055Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "try-number": "3", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "process": "base.py:73", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Creating batch data-processing-20230912t000000", - "insertId": "13oiz3nfllrito", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:16:04.462639995Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "worker_id": "airflow-worker-n79fs", - "process": "dataproc.py:2349", - "workflow": "data_analytics_dag", - "try-number": "3", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Once started, the batch job will be available at https://console.cloud.google.com/dataproc/batches/us-west1/data-processing-20230912t000000/monitoring?project=acceldata-acm", - "insertId": "13oiz3nfllritp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:16:04.463170743Z", - "severity": "INFO", - "labels": { - "try-number": "3", - "task-id": "create_batch", - "worker_id": "airflow-worker-n79fs", - "process": "dataproc.py:2350", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "13oiz3nfllritq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:16:04.463970313Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "worker_id": "airflow-worker-n79fs", - "try-number": "3", - "workflow": "data_analytics_dag", - "process": "credentials_provider.py:353", - "execution-date": "2023-09-12T00:00:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:06.838035929Z" - }, - { - "textPayload": "Batch with given id already exists", - "insertId": "1vfgxrkf8ok3ml", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:16:05.889642295Z", - "severity": "INFO", - "labels": { - "try-number": "3", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "create_batch", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "dataproc.py:2394" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:11.893365042Z" - }, - { - "textPayload": "Attaching to the job data-processing-20230912t000000 if it is still running.", - "insertId": "1vfgxrkf8ok3mm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:16:05.891387298Z", - "severity": "INFO", - "labels": { - "try-number": "3", - "task-id": "create_batch", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "dataproc.py:2399" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:11.893365042Z" - }, - { - "textPayload": "Task failed with exception\nTraceback (most recent call last):\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2426, in execute\n self.handle_batch_status(context, result.state, batch_id)\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2454, in handle_batch_status\n raise AirflowException(\"Batch job %s failed. Driver Logs: %s\", batch_id, link)\nairflow.exceptions.AirflowException: ('Batch job %s failed. Driver Logs: %s', 'data-processing-20230912t000000', 'https://console.cloud.google.com/dataproc/batches/us-west1/data-processing-20230912t000000/monitoring?project=acceldata-acm')", - "insertId": "1vfgxrkf8ok3mn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:16:06.327011149Z", - "severity": "ERROR", - "labels": { - "try-number": "3", - "process": "taskinstance.py:1778", - "task-id": "create_batch", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-12T00:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:11.893365042Z" - }, - { - "textPayload": "Marking task as FAILED. dag_id=data_analytics_dag, task_id=create_batch, execution_date=20230912T000000, start_date=20230913T091601, end_date=20230913T091606", - "insertId": "1vfgxrkf8ok3mo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:16:06.337645801Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1328", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "create_batch", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag", - "try-number": "3" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:11.893365042Z" - }, - { - "textPayload": "Failed to execute job 961 for task create_batch (('Batch job %s failed. Driver Logs: %s', 'data-processing-20230912t000000', 'https://console.cloud.google.com/dataproc/batches/us-west1/data-processing-20230912t000000/monitoring?project=acceldata-acm'); 2155)", - "insertId": "1vfgxrkf8ok3mp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:16:07.010901162Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-12T00:00:00+00:00", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:100", - "try-number": "3", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:11.893365042Z" - }, - { - "textPayload": "Task exited with return code 1", - "insertId": "1vfgxrkf8ok3mq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:16:07.195382039Z", - "severity": "INFO", - "labels": { - "process": "local_task_job.py:212", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "create_batch", - "try-number": "3", - "execution-date": "2023-09-12T00:00:00+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:11.893365042Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1vfgxrkf8ok3mr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:16:07.289402819Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-12T00:00:00+00:00", - "try-number": "3", - "process": "taskinstance.py:2599", - "map-index": "-1", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:11.893365042Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[6c07aaac-8b81-4872-9335-ab0c97e694ce] succeeded in 10.997413828998106s: None", - "insertId": "1vfgxrkf8ok3ms", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:16:07.463194964Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:11.893365042Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "au1qzhf6mcxy7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:16:24.210923566Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:16:29.011999026Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[086fd2b8-f935-40ad-ab61-e3d72193608b] received", - "insertId": "1j87f4wfi348ag", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:20:01.678458133Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "[086fd2b8-f935-40ad-ab61-e3d72193608b] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T09:10:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "1j87f4wfi348ah", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:20:01.684747817Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1j87f4wfi348ai", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:20:02.039815093Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1j87f4wfi348aj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:20:02.042104705Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1j87f4wfi348ak", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:20:02.147507215Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "1j87f4wfi348al", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:20:03.104288896Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1j87f4wfi348am", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:20:03.634172801Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1j87f4wfi348an", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:20:03.789023720Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "map-index": "-1", - "process": "taskinstance.py:1091", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T09:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1j87f4wfi348ao", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:20:03.813753845Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T09:10:00+00:00", - "process": "taskinstance.py:1091", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1j87f4wfi348ap", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:20:03.814199787Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T09:10:00+00:00", - "workflow": "airflow_monitoring", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:1289" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "1j87f4wfi348aq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:20:03.814719338Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1290", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "execution-date": "2023-09-13T09:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1j87f4wfi348ar", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:20:03.815126340Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "map-index": "-1", - "task-id": "echo", - "execution-date": "2023-09-13T09:10:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1291", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1j87f4wfi348as", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:20:04.112119597Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1j87f4wfi348at", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:20:04.112165506Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1j87f4wfi348au", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:20:04.160575625Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1j87f4wfi348av", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:20:04.160623584Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Executing on 2023-09-13 09:10:00+00:00", - "insertId": "1j87f4wfi348aw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:20:04.848550908Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T09:10:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1310" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Started process 2257 to run task", - "insertId": "1j87f4wfi348ax", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:20:04.891607969Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:55", - "try-number": "1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "execution-date": "2023-09-13T09:10:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T09:10:00+00:00', '--job-id', '962', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp9f7zro83']", - "insertId": "1j87f4wfi348ay", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:20:04.894018763Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T09:10:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "standard_task_runner.py:82", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Job 962: Subtask echo", - "insertId": "1j87f4wfi348az", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:20:04.894963993Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "execution-date": "2023-09-13T09:10:00+00:00", - "try-number": "1", - "process": "standard_task_runner.py:83", - "map-index": "-1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1j87f4wfi348b0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:20:05.272204854Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T09:10:00+00:00", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T09:10:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T09:10:00+00:00", - "insertId": "1j87f4wfi348b1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:20:05.475416545Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T09:10:00+00:00", - "process": "taskinstance.py:1518", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1j87f4wfi348b2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:20:05.477171352Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:63", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T09:10:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1j87f4wfi348b3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:20:05.478503606Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T09:10:00+00:00", - "task-id": "echo", - "process": "subprocess.py:75", - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Output:", - "insertId": "1j87f4wfi348b4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:20:05.623185935Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "process": "subprocess.py:86", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T09:10:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "test", - "insertId": "1j87f4wfi348b5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:20:05.630344479Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T09:10:00+00:00", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "process": "subprocess.py:93", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1j87f4wfi348b6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:20:05.630928691Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T09:10:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1", - "try-number": "1", - "process": "subprocess.py:97" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T091000, start_date=20230913T092003, end_date=20230913T092005", - "insertId": "1j87f4wfi348b7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:20:05.678520205Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T09:10:00+00:00", - "process": "taskinstance.py:1328", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:06.842328625Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1ulji46fiuaijr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:20:06.401041736Z", - "severity": "INFO", - "labels": { - "process": "local_task_job.py:212", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T09:10:00+00:00", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:11.922103568Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1ulji46fiuaijs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:20:06.454048697Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1", - "execution-date": "2023-09-13T09:10:00+00:00", - "process": "taskinstance.py:2599" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:11.922103568Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[086fd2b8-f935-40ad-ab61-e3d72193608b] succeeded in 4.925387285009492s: None", - "insertId": "1ulji46fiuaijt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:20:06.608550207Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:20:11.922103568Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "ukf4wfozpk8m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:21:26.213615139Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:21:32.304360331Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "tp3t0xfltty47", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:26:28.125933922Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:26:33.926286632Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[1900aa23-96dc-4ff2-83ad-83c58095aafa] received", - "insertId": "148tgddfowkw6x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:30:01.750417696Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:02.116135935Z" - }, - { - "textPayload": "[1900aa23-96dc-4ff2-83ad-83c58095aafa] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T09:20:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "148tgddfowkw6y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:30:01.755805433Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:02.116135935Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1ck5nq2fi950r6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:30:02.031842375Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1ck5nq2fi950r7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:30:02.033904505Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1ck5nq2fi950r8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:30:02.149077328Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "1ck5nq2fi950r9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:30:03.147229894Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1ck5nq2fi950ra", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:30:03.735401378Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1ck5nq2fi950rb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:30:03.885160423Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "map-index": "-1", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T09:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1ck5nq2fi950rc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:30:03.909772232Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T09:20:00+00:00", - "process": "taskinstance.py:1091", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1ck5nq2fi950rd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:30:03.911230145Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "task-id": "echo", - "process": "taskinstance.py:1289", - "execution-date": "2023-09-13T09:20:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "1ck5nq2fi950re", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:30:03.912159381Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1290", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T09:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1ck5nq2fi950rf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:30:03.912972566Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1291", - "execution-date": "2023-09-13T09:20:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1ck5nq2fi950rg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:30:04.194326321Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1ck5nq2fi950rh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:30:04.194370087Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1ck5nq2fi950ri", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:30:04.220329131Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1ck5nq2fi950rj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:30:04.220372463Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Executing on 2023-09-13 09:20:00+00:00", - "insertId": "1ck5nq2fi950rk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:30:05.091438264Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "map-index": "-1", - "task-id": "echo", - "process": "taskinstance.py:1310", - "execution-date": "2023-09-13T09:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Started process 2483 to run task", - "insertId": "1ck5nq2fi950rl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:30:05.130666102Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-13T09:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T09:20:00+00:00', '--job-id', '963', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpmj2zjqxe']", - "insertId": "1ck5nq2fi950rm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:30:05.132897323Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:82", - "execution-date": "2023-09-13T09:20:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Job 963: Subtask echo", - "insertId": "1ck5nq2fi950rn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:30:05.133524268Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:83", - "execution-date": "2023-09-13T09:20:00+00:00", - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1ck5nq2fi950ro", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:30:05.609470257Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T09:20:00+00:00", - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T09:20:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T09:20:00+00:00", - "insertId": "1ck5nq2fi950rp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:30:06.209893614Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T09:20:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "taskinstance.py:1518" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1ck5nq2fi950rq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:30:06.213708439Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "task-id": "echo", - "process": "subprocess.py:63", - "map-index": "-1", - "execution-date": "2023-09-13T09:20:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1ck5nq2fi950rr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:30:06.215799540Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T09:20:00+00:00", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "process": "subprocess.py:75" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Output:", - "insertId": "1ck5nq2fi950rs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:30:06.415853747Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T09:20:00+00:00", - "process": "subprocess.py:86", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "test", - "insertId": "1ck5nq2fi950rt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:30:06.427012131Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "subprocess.py:93", - "execution-date": "2023-09-13T09:20:00+00:00", - "task-id": "echo", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1ck5nq2fi950ru", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:30:06.429121449Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:97", - "map-index": "-1", - "execution-date": "2023-09-13T09:20:00+00:00", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T092000, start_date=20230913T093003, end_date=20230913T093006", - "insertId": "1ck5nq2fi950rv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:30:06.523318129Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "taskinstance.py:1328", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T09:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1ck5nq2fi950rw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:30:07.543788263Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "workflow": "airflow_monitoring", - "process": "local_task_job.py:212", - "execution-date": "2023-09-13T09:20:00+00:00", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1ck5nq2fi950rx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:30:07.731418815Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T09:20:00+00:00", - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:2599", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:08.259409122Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[1900aa23-96dc-4ff2-83ad-83c58095aafa] succeeded in 6.274028416024521s: None", - "insertId": "3md8g2fixotq7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:30:08.028282428Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:30:14.052901816Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "xuoq7hf6mucc5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:31:30.222181637Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:31:36.698917822Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "l3c42sfivagxq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:36:32.052464128Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:36:37.608662466Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[b1572d05-6dbc-49ba-b20b-fa1a88467a0b] received", - "insertId": "1y7n6efi6hrrk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:01.175931396Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "[b1572d05-6dbc-49ba-b20b-fa1a88467a0b] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T09:30:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "1y7n6efi6hrrl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:01.182910829Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1y7n6efi6hrrm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:40:01.559337774Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1y7n6efi6hrrn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:40:01.561717052Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1y7n6efi6hrro", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:40:01.717746589Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "1y7n6efi6hrrp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:40:02.825666197Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1y7n6efi6hrrq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:40:03.311688796Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1y7n6efi6hrrr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:40:03.605066712Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "process": "taskinstance.py:1091", - "try-number": "1", - "execution-date": "2023-09-13T09:30:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1y7n6efi6hrrs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:40:03.631029326Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1091", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T09:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1y7n6efi6hrrt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:03.631526253Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T09:30:00+00:00", - "task-id": "echo", - "try-number": "1", - "process": "taskinstance.py:1289", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "1y7n6efi6hrru", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:03.632039731Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1290", - "map-index": "-1", - "execution-date": "2023-09-13T09:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1y7n6efi6hrrv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:40:03.632359210Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1291", - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T09:30:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1y7n6efi6hrrw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:03.927552311Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1y7n6efi6hrrx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:03.927629922Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1y7n6efi6hrry", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:40:03.950991102Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1y7n6efi6hrrz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:03.951061112Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Executing on 2023-09-13 09:30:00+00:00", - "insertId": "1y7n6efi6hrs0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:40:05.075740888Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "process": "taskinstance.py:1310", - "task-id": "echo", - "execution-date": "2023-09-13T09:30:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T09:30:00+00:00', '--job-id', '964', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpbcp_dvpy']", - "insertId": "1y7n6efi6hrs1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:40:05.113652715Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T09:30:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:82" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Started process 2718 to run task", - "insertId": "1y7n6efi6hrs2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:05.113734611Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:55", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "execution-date": "2023-09-13T09:30:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Job 964: Subtask echo", - "insertId": "1y7n6efi6hrs3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:05.114605443Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T09:30:00+00:00", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "process": "standard_task_runner.py:83" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1y7n6efi6hrs4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:05.485562569Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "try-number": "1", - "process": "task_command.py:393", - "execution-date": "2023-09-13T09:30:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T09:30:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T09:30:00+00:00", - "insertId": "1y7n6efi6hrs5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:05.685973804Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1518", - "execution-date": "2023-09-13T09:30:00+00:00", - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1y7n6efi6hrs6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:05.688429574Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:63", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T09:30:00+00:00", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1y7n6efi6hrs7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:05.690716199Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T09:30:00+00:00", - "process": "subprocess.py:75", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:06.754004854Z" - }, - { - "textPayload": "Output:", - "insertId": "1s3smeyfidvisf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:05.828447861Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "subprocess.py:86", - "execution-date": "2023-09-13T09:30:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:11.853566645Z" - }, - { - "textPayload": "test", - "insertId": "1s3smeyfidvisg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:40:05.835257972Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T09:30:00+00:00", - "worker_id": "airflow-worker-n79fs", - "process": "subprocess.py:93" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:11.853566645Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1s3smeyfidvish", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:05.836202442Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:97", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T09:30:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:11.853566645Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T093000, start_date=20230913T094003, end_date=20230913T094005", - "insertId": "1s3smeyfidvisi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:40:05.881553088Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T09:30:00+00:00", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1328" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:11.853566645Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1s3smeyfidvisj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:06.741026875Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "local_task_job.py:212", - "execution-date": "2023-09-13T09:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:11.853566645Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1s3smeyfidvisk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:40:06.822857971Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:2599", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T09:30:00+00:00", - "map-index": "-1", - "workflow": "airflow_monitoring", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:11.853566645Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[b1572d05-6dbc-49ba-b20b-fa1a88467a0b] succeeded in 5.78971284700674s: None", - "insertId": "1s3smeyfidvisl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:40:06.970677063Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:40:11.853566645Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1u1ifs3f6md9yf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:41:34.231368171Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:41:39.773388551Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1lq1qkcfids68i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:46:38.915040986Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:46:41.527204058Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[f83c5606-d67b-4529-ba20-0b4706cff40d] received", - "insertId": "1iy9hgof6n59sv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:50:00.887839137Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "[f83c5606-d67b-4529-ba20-0b4706cff40d] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T09:40:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "1iy9hgof6n59sw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:50:00.893937099Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1iy9hgof6n59sx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:50:01.222128025Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1iy9hgof6n59sy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:50:01.224467185Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1iy9hgof6n59sz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:50:01.345087123Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "1iy9hgof6n59t0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:50:02.237697353Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1iy9hgof6n59t1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:50:02.839552565Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1iy9hgof6n59t2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:50:02.966651500Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T09:40:00+00:00", - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1iy9hgof6n59t3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:50:02.985522911Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T09:40:00+00:00", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1iy9hgof6n59t4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:50:02.985860529Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1289", - "map-index": "-1", - "execution-date": "2023-09-13T09:40:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "1iy9hgof6n59t5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:50:02.986333751Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "task-id": "echo", - "execution-date": "2023-09-13T09:40:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1iy9hgof6n59t6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:50:02.986734495Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "task-id": "echo", - "execution-date": "2023-09-13T09:40:00+00:00", - "process": "taskinstance.py:1291", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1iy9hgof6n59t7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:50:03.246402186Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1iy9hgof6n59t8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:50:03.246452389Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1iy9hgof6n59t9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:50:03.307350027Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1iy9hgof6n59ta", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:50:03.307418362Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Executing on 2023-09-13 09:40:00+00:00", - "insertId": "1iy9hgof6n59tb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:50:04.172506349Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T09:40:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1310", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Started process 2954 to run task", - "insertId": "1iy9hgof6n59tc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:50:04.248179486Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T09:40:00+00:00", - "process": "standard_task_runner.py:55", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T09:40:00+00:00', '--job-id', '965', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpu230az8e']", - "insertId": "1iy9hgof6n59td", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:50:04.248582362Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T09:40:00+00:00", - "process": "standard_task_runner.py:82", - "map-index": "-1", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Job 965: Subtask echo", - "insertId": "1iy9hgof6n59te", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:50:04.249919827Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:83", - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T09:40:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1iy9hgof6n59tf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:50:04.610386952Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T09:40:00+00:00", - "map-index": "-1", - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T09:40:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T09:40:00+00:00", - "insertId": "1iy9hgof6n59tg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:50:04.804879805Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:1518", - "execution-date": "2023-09-13T09:40:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1iy9hgof6n59th", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:50:04.807954004Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1", - "process": "subprocess.py:63", - "map-index": "-1", - "execution-date": "2023-09-13T09:40:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1iy9hgof6n59ti", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:50:04.810084714Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "subprocess.py:75", - "execution-date": "2023-09-13T09:40:00+00:00", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Output:", - "insertId": "1iy9hgof6n59tj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:50:04.984039541Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:86", - "execution-date": "2023-09-13T09:40:00+00:00", - "task-id": "echo", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "test", - "insertId": "1iy9hgof6n59tk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:50:04.991332722Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "subprocess.py:93", - "execution-date": "2023-09-13T09:40:00+00:00", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1iy9hgof6n59tl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:50:04.992766958Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T09:40:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "subprocess.py:97" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T094000, start_date=20230913T095002, end_date=20230913T095005", - "insertId": "1iy9hgof6n59tm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:50:05.043591180Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "taskinstance.py:1328", - "execution-date": "2023-09-13T09:40:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1iy9hgof6n59tn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:50:05.791759107Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "local_task_job.py:212", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T09:40:00+00:00", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:06.834673843Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "z8sggxf4qs8px", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:50:05.843551644Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "taskinstance.py:2599", - "execution-date": "2023-09-13T09:40:00+00:00", - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:11.892781037Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[f83c5606-d67b-4529-ba20-0b4706cff40d] succeeded in 5.113018922973424s: None", - "insertId": "z8sggxf4qs8py", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T09:50:06.013327566Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:50:11.892781037Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1b6jxnrfiuvvjb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T09:51:38.111739182Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:51:44.184476842Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "6o7zm5fisa6qe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T09:56:40.945871327Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T09:56:46.485381811Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[eef46875-af70-45bc-8c7e-edae481015e6] received", - "insertId": "10mylthflz3tsl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:00:00.751336674Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:02.545036615Z" - }, - { - "textPayload": "[eef46875-af70-45bc-8c7e-edae481015e6] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T09:50:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "10mylthflz3tsm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:00:00.757686550Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:02.545036615Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "10mylthflz3tsn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:00:01.123790535Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:02.545036615Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "10mylthflz3tso", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:00:01.125953268Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:02.545036615Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "10mylthflz3tsp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:00:01.263254634Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:02.545036615Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "1sds7spfifjyas", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:00:02.465166115Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1sds7spfifjyat", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:00:03.151847130Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1sds7spfifjyau", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:00:03.290935439Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T09:50:00+00:00", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1sds7spfifjyav", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:00:03.315748521Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T09:50:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1sds7spfifjyaw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:00:03.316041132Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T09:50:00+00:00", - "process": "taskinstance.py:1289", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "1sds7spfifjyax", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:00:03.316370743Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "taskinstance.py:1290", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T09:50:00+00:00", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1sds7spfifjyay", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:00:03.317482195Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T09:50:00+00:00", - "process": "taskinstance.py:1291", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1sds7spfifjyaz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:00:03.651617964Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1sds7spfifjyb0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:00:03.651649356Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1sds7spfifjyb1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:00:03.708570383Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1sds7spfifjyb2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:00:03.708614111Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Executing on 2023-09-13 09:50:00+00:00", - "insertId": "1sds7spfifjyb3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:00:04.619711888Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1310", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T09:50:00+00:00", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Started process 3179 to run task", - "insertId": "1sds7spfifjyb4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:00:04.653866960Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T09:50:00+00:00", - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:55", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T09:50:00+00:00', '--job-id', '966', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmphs1lb5fa']", - "insertId": "1sds7spfifjyb5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:00:04.656581289Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T09:50:00+00:00", - "task-id": "echo", - "process": "standard_task_runner.py:82", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Job 966: Subtask echo", - "insertId": "1sds7spfifjyb6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:00:04.657279631Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "standard_task_runner.py:83", - "task-id": "echo", - "execution-date": "2023-09-13T09:50:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1sds7spfifjyb7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:00:05.045328076Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393", - "task-id": "echo", - "execution-date": "2023-09-13T09:50:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T09:50:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T09:50:00+00:00", - "insertId": "1sds7spfifjyb8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:00:05.242831992Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T09:50:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1518", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1sds7spfifjyb9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:00:05.247269639Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:63", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T09:50:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1sds7spfifjyba", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:00:05.250884606Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "subprocess.py:75", - "execution-date": "2023-09-13T09:50:00+00:00", - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Output:", - "insertId": "1sds7spfifjybb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:00:05.417871733Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T09:50:00+00:00", - "map-index": "-1", - "process": "subprocess.py:86" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "test", - "insertId": "1sds7spfifjybc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:00:05.433879175Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "map-index": "-1", - "process": "subprocess.py:93", - "execution-date": "2023-09-13T09:50:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1sds7spfifjybd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:00:05.435529077Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "process": "subprocess.py:97", - "execution-date": "2023-09-13T09:50:00+00:00", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T095000, start_date=20230913T100003, end_date=20230913T100005", - "insertId": "1sds7spfifjybe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:00:05.605132053Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "taskinstance.py:1328", - "task-id": "echo", - "execution-date": "2023-09-13T09:50:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1sds7spfifjybf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:00:06.645236416Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "local_task_job.py:212", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T09:50:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:07.668526344Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "8m2vcafp27aqa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:00:06.826835297Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "map-index": "-1", - "process": "taskinstance.py:2599", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T09:50:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:12.772800862Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[eef46875-af70-45bc-8c7e-edae481015e6] succeeded in 6.4660902399919s: None", - "insertId": "8m2vcafp27aqb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:00:07.222129523Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:00:12.772800862Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "i1cjg0fizali4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:01:52.832344381Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:01:56.117733621Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "jpfmtxfozup01", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:06:44.928722529Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:06:50.169583605Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[10a8d27d-5047-46a7-a50f-cf0744170735] received", - "insertId": "1no069nfiz86jo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:01.016408378Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:02.830313971Z" - }, - { - "textPayload": "[10a8d27d-5047-46a7-a50f-cf0744170735] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T10:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "1no069nfiz86jp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:01.022176872Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:02.830313971Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1no069nfiz86jq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:01.332184203Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:02.830313971Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1no069nfiz86jr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:01.334574981Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:02.830313971Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1no069nfiz86js", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:01.525766282Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-n79fs", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:02.830313971Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "1cu2uvqf88klq0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:02.356852252Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1cu2uvqf88klq1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:10:02.995237982Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1cu2uvqf88klq2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:03.146689103Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "task-id": "echo", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T10:00:00+00:00", - "process": "taskinstance.py:1091", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1cu2uvqf88klq3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:03.165119453Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "execution-date": "2023-09-13T10:00:00+00:00", - "process": "taskinstance.py:1091", - "try-number": "1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1cu2uvqf88klq4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:10:03.165628848Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "workflow": "airflow_monitoring", - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:1289", - "execution-date": "2023-09-13T10:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "1cu2uvqf88klq5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:03.166242953Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T10:00:00+00:00", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1cu2uvqf88klq6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:03.166841698Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T10:00:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "process": "taskinstance.py:1291", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1cu2uvqf88klq7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:03.527978646Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1cu2uvqf88klq8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:03.528039140Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1cu2uvqf88klq9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:10:03.550716326Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1cu2uvqf88klqa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:03.550759974Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Executing on 2023-09-13 10:00:00+00:00", - "insertId": "1cu2uvqf88klqb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:04.919043367Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1310", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T10:00:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Started process 3415 to run task", - "insertId": "1cu2uvqf88klqc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:04.956813337Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:00:00+00:00", - "try-number": "1", - "task-id": "echo", - "process": "standard_task_runner.py:55", - "worker_id": "airflow-worker-n79fs", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T10:00:00+00:00', '--job-id', '969', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp31zuw_44']", - "insertId": "1cu2uvqf88klqd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:10:04.958244406Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "standard_task_runner.py:82", - "execution-date": "2023-09-13T10:00:00+00:00", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Job 969: Subtask echo", - "insertId": "1cu2uvqf88klqe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:04.959732616Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:00:00+00:00", - "workflow": "airflow_monitoring", - "task-id": "echo", - "process": "standard_task_runner.py:83", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Running on host airflow-worker-n79fs", - "insertId": "1cu2uvqf88klqf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:10:05.328896060Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:00:00+00:00", - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T10:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T10:00:00+00:00", - "insertId": "1cu2uvqf88klqg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:05.527782596Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T10:00:00+00:00", - "task-id": "echo", - "process": "taskinstance.py:1518", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1cu2uvqf88klqh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:05.529598086Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T10:00:00+00:00", - "task-id": "echo", - "process": "subprocess.py:63", - "map-index": "-1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1cu2uvqf88klqi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:05.531411190Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:75", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:00:00+00:00", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Output:", - "insertId": "1cu2uvqf88klqj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:05.676676234Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "subprocess.py:86", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T10:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "test", - "insertId": "1cu2uvqf88klqk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:05.684315088Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:93", - "task-id": "echo", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T10:00:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1cu2uvqf88klql", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:05.685767129Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-n79fs", - "execution-date": "2023-09-13T10:00:00+00:00", - "process": "subprocess.py:97", - "try-number": "1", - "workflow": "airflow_monitoring", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T100000, start_date=20230913T101003, end_date=20230913T101005", - "insertId": "1cu2uvqf88klqm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:05.730683697Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:00:00+00:00", - "process": "taskinstance.py:1328", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1cu2uvqf88klqn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:06.460488792Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:00:00+00:00", - "map-index": "-1", - "process": "local_task_job.py:212", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-n79fs", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1cu2uvqf88klqo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:06.546841377Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:00:00+00:00", - "process": "taskinstance.py:2599", - "task-id": "echo", - "worker_id": "airflow-worker-n79fs", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[10a8d27d-5047-46a7-a50f-cf0744170735] succeeded in 5.696080741006881s: None", - "insertId": "1cu2uvqf88klqp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:06.716577090Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:07.903822971Z" - }, - { - "textPayload": "I0913 10:10:15.803382 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "7hp4dlflzf8fw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:10:15.803642523Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:10:21.509676970Z" - }, - { - "textPayload": "I0913 10:10:15.805023 1 airflowworkerset_controller.go:268] \"controllers/AirflowWorkerSet: Worker uses old template. Recreating.\" worker name=\"airflow-worker-n79fs\"", - "insertId": "7hp4dlflzf8fx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:10:15.805245448Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:10:21.509676970Z" - }, - { - "textPayload": "I0913 10:10:15.827133 1 airflowworkerset_controller.go:77] \"controllers/AirflowWorkerSet: Template changed, workers recreated.\"", - "insertId": "7hp4dlflzf8fy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:15.827463373Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:10:21.509676970Z" - }, - { - "textPayload": "I0913 10:10:15.827287 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "7hp4dlflzf8fz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:15.827598115Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:10:21.509676970Z" - }, - { - "textPayload": "", - "insertId": "a58t7fesfwv3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:15.850409774Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:22.157666669Z" - }, - { - "textPayload": "worker: Warm shutdown (MainProcess)", - "insertId": "a58t7fesfwv4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:15.850448741Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:22.157666669Z" - }, - { - "textPayload": "Caught SIGTERM signal!", - "insertId": "a58t7fesfwv5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:10:15.850470154Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:22.157666669Z" - }, - { - "textPayload": "Passing SIGTERM to Airflow process.", - "insertId": "a58t7fesfwv6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:15.850477407Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:22.157666669Z" - }, - { - "textPayload": "I0913 10:10:15.881686 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "7hp4dlflzf8g0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:15.881880758Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:10:21.509676970Z" - }, - { - "textPayload": "Exiting due to SIGTERM.", - "insertId": "a58t7fesfwv7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:20.637622240Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-n79fs" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:10:22.157666669Z" - }, - { - "textPayload": "I0913 10:10:21.314507 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "xuoq7hf6pnr3v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:10:21.314802216Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:10:27.585004660Z" - }, - { - "textPayload": "I0913 10:10:21.315922 1 airflowworkerset_controller.go:97] \"controllers/AirflowWorkerSet: Workers scale up needed.\" current number of workers=0 desired=1 scaling up by=1", - "insertId": "xuoq7hf6pnr3w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:21.316059900Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:10:27.585004660Z" - }, - { - "textPayload": "I0913 10:10:21.550056 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "xuoq7hf6pnr3x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:21.550350530Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:10:27.585004660Z" - }, - { - "textPayload": "I0913 10:10:21.593795 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "xuoq7hf6pnr3y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:21.593999211Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:10:27.585004660Z" - }, - { - "textPayload": "I0913 10:10:21.618517 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "xuoq7hf6pnr3z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:21.618738450Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:10:27.585004660Z" - }, - { - "textPayload": "I0913 10:10:21.646619 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "xuoq7hf6pnr40", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:10:21.646889025Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:10:27.585004660Z" - }, - { - "textPayload": "I0913 10:10:21.652132 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "xuoq7hf6pnr41", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:21.652246896Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:10:27.585004660Z" - }, - { - "textPayload": "Starting the process, got command: worker", - "insertId": "1e8f7zyfielfzh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:23.063078248Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "Initializing airflow.cfg.", - "insertId": "1e8f7zyfielfzi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:10:23.067740287Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "airflow.cfg initialization is done.", - "insertId": "1e8f7zyfielfzj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:23.092633062Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "I0913 10:10:23.574049 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "xuoq7hf6pnr42", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:23.575183591Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:10:27.585004660Z" - }, - { - "textPayload": "I0913 10:10:23.613594 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "xuoq7hf6pnr43", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:23.613855664Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:10:27.585004660Z" - }, - { - "textPayload": "Setupping GCS Fuse.", - "insertId": "1e8f7zyfielfzk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:30.408276962Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "gcsfuse mount seems ready, proceeding.", - "insertId": "1e8f7zyfielfzl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:30.408985084Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "Initializing kube_config.", - "insertId": "1e8f7zyfielfzm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:30.423982681Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "Fetching cluster endpoint and auth data.", - "insertId": "1e8f7zyfielfzn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:37.624390926Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "kubeconfig entry generated for us-west1-openlineage-1614b57c-gke.", - "insertId": "1e8f7zyfielfzo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:38.060015496Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "/home/airflow/composer_kube_config is initialized", - "insertId": "1e8f7zyfielfzp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:10:43.489521675Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "Waiting for dags and plugins synchronization.", - "insertId": "1e8f7zyfielfzq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:43.490681995Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1e8f7zyfielfzr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:10:43.491253092Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "Searching for recent worker pod evictions", - "insertId": "1e8f7zyfielfzs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:43.500510353Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1e8f7zyfielfzt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:48.509144501Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "Finished searching for recent worker pod evictions", - "insertId": "1e8f7zyfielfzu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:50.765746704Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1e8f7zyfielfzv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:10:53.524178589Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1e8f7zyfielfzw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:10:58.529869906Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "I0913 10:11:00.699271 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "gng4jffj30ont", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:11:00.700977606Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:11:06.949477759Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1e8f7zyfielfzx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:11:03.539656803Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:05.847657730Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "9g8l41fj3t2n9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:11:08.549389506Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:10.867631029Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1ie0r7hfj0rouj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:11:13.558101579Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:15.975279129Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "9pw4s0f6o5vog", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:11:18.565184400Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:23.825346430Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "gnn07ufp0j0gj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:11:23.572037931Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:28.824731346Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "nl9heofp363dh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:11:28.579292427Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:33.831608831Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1ulgsojfp4qsrs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:11:33.586059457Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:38.828190669Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "3md8g2fj0rxtn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:11:38.592466162Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:43.836714658Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1212jhwf88dqgb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:11:43.598955530Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:48.828507484Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "2s2ivjfi9h84c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:11:48.614329005Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:53.833761138Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "n1k69afcm5p9e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:11:53.621376302Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:11:58.830937565Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "18yso1afic9roc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:11:58.625991116Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:03.841146477Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "198q49ffi9bajg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:12:03.634646433Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:08.825063877Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "aassfj0mkcu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:12:08.641579518Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:13.884937334Z" - }, - { - "textPayload": "Dags and plugins are synced", - "insertId": "170tysffjpetsd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:12:13.665902468Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:18.883406942Z" - }, - { - "textPayload": "Starting Airflow Celery Flower API.", - "insertId": "170tysffjpetse", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:12:13.667549949Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:18.883406942Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "18yr8ebfij3j58", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:12:36.124295459Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:37.990809593Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "18yr8ebfij3j59", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:12:36.211459856Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:37.990809593Z" - }, - { - "textPayload": " ", - "insertId": "1iduk6af6g5cvu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:12:51.807913332Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": " -------------- celery@airflow-worker-8z65g v5.2.7 (dawn-chorus)", - "insertId": "1iduk6af6g5cvv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:12:51.808036918Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "1iduk6af6g5cvw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:12:51.808068777Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": "-- ******* ---- Linux-5.15.109+-x86_64-with-glibc2.27 2023-09-13 10:12:51", - "insertId": "1iduk6af6g5cvx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:12:51.808078909Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": "- *** --- * --- ", - "insertId": "1iduk6af6g5cvy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:12:51.808087472Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": "- ** ---------- [config]", - "insertId": "1iduk6af6g5cvz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:12:51.808099575Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": "- ** ---------- .> app: airflow.executors.celery_executor:0x7df3513a2340", - "insertId": "1iduk6af6g5cw0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:12:51.808112169Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": "- ** ---------- .> transport: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "1iduk6af6g5cw1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:12:51.808159713Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": "- ** ---------- .> results: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "1iduk6af6g5cw2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:12:51.808170297Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": "- *** --- * --- .> concurrency: 6 (prefork)", - "insertId": "1iduk6af6g5cw3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:12:51.808179184Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": "-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)", - "insertId": "1iduk6af6g5cw4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:12:51.808190589Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "1iduk6af6g5cw5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:12:51.808202493Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": " -------------- [queues]", - "insertId": "1iduk6af6g5cw6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:12:51.808248626Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": " .> default exchange=default(direct) key=default", - "insertId": "1iduk6af6g5cw7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:12:51.808260156Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": " ", - "insertId": "1iduk6af6g5cw8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:12:51.808267298Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": "", - "insertId": "1iduk6af6g5cw9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:12:51.808274227Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": "[tasks]", - "insertId": "1iduk6af6g5cwa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:12:51.808281483Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": " . airflow.executors.celery_executor.execute_command", - "insertId": "1iduk6af6g5cwb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:12:51.808288939Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": "", - "insertId": "1iduk6af6g5cwc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:12:51.808296420Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:12:54.104967853Z" - }, - { - "textPayload": "Connected to redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "d1uasgflrjh8k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:12:59.931948703Z", - "severity": "INFO", - "labels": { - "process": "connection.py:22", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:13:04.177352976Z" - }, - { - "textPayload": "mingle: searching for neighbors", - "insertId": "d1uasgflrjh8l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:13:00.016312742Z", - "severity": "INFO", - "labels": { - "process": "mingle.py:40", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:13:04.177352976Z" - }, - { - "textPayload": "mingle: all alone", - "insertId": "d1uasgflrjh8m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:13:01.050337663Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "mingle.py:49" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:13:04.177352976Z" - }, - { - "textPayload": "celery@airflow-worker-8z65g ready.", - "insertId": "d1uasgflrjh8n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:13:01.088945330Z", - "severity": "INFO", - "labels": { - "process": "worker.py:176", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:13:04.177352976Z" - }, - { - "textPayload": "Events of group {task} enabled by remote.", - "insertId": "d1uasgflrjh8o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:13:01.124094245Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "control.py:277" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:13:04.177352976Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[2e71476c-c9a3-47ed-bfa5-90a0f6caeec0] received", - "insertId": "xl6jgmflt0sxq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:12.210733083Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:13.875721787Z" - }, - { - "textPayload": "[2e71476c-c9a3-47ed-bfa5-90a0f6caeec0] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'run_bq_external_ingestion', 'manual__2023-09-13T10:16:06.365428+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "xl6jgmflt0sxr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:12.250870078Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:13.875721787Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "xl6jgmflt0sxs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:12.769640914Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:13.875721787Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "xl6jgmflt0sxt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:12.772352612Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:13.875721787Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "ci0rqeflvlosy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:12.883062353Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:18.863745184Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "ci0rqeflvlosz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:13.956677253Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:18.863745184Z" - }, - { - "textPayload": "Running on host airflow-worker-8z65g", - "insertId": "13pgdg5fox7hm2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:18.291223412Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "13pgdg5fox7hm3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:18.440185241Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "workflow": "data_analytics_dag", - "task-id": "run_bq_external_ingestion", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "process": "taskinstance.py:1091", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "13pgdg5fox7hm4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:18.457247903Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "task-id": "run_bq_external_ingestion", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-8z65g", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "13pgdg5fox7hm5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:18.458020460Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-8z65g", - "task-id": "run_bq_external_ingestion", - "process": "taskinstance.py:1289", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "13pgdg5fox7hm6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:18.458576338Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1290", - "worker_id": "airflow-worker-8z65g", - "task-id": "run_bq_external_ingestion", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "try-number": "1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "13pgdg5fox7hm7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:18.459019210Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "try-number": "1", - "process": "taskinstance.py:1291", - "worker_id": "airflow-worker-8z65g", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "13pgdg5fox7hm8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:18.764268424Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "13pgdg5fox7hm9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:18.764315910Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "13pgdg5fox7hma", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:18.809980140Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "13pgdg5fox7hmb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:18.810031052Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Did not find openlineage.yml and OPENLINEAGE_URL is not set", - "insertId": "13pgdg5fox7hmc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:18.827503498Z", - "severity": "ERROR", - "labels": { - "task-id": "run_bq_external_ingestion", - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "map-index": "-1", - "process": "factory.py:78" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Couldn't initialize transport; will print events to console.", - "insertId": "13pgdg5fox7hmd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:18.828325079Z", - "severity": "WARNING", - "labels": { - "task-id": "run_bq_external_ingestion", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "map-index": "-1", - "process": "factory.py:37", - "try-number": "1", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "{\"eventTime\": \"2023-09-13T10:16:18.440173Z\", \"eventType\": \"START\", \"inputs\": [], \"job\": {\"facets\": {\"ownership\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/OwnershipJobFacet\", \"owners\": [{\"name\": \"airflow\"}]}}, \"name\": \"data_analytics_dag.run_bq_external_ingestion\", \"namespace\": \"default\"}, \"outputs\": [], \"producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"run\": {\"facets\": {\"airflow\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"dag\": {\"dag_id\": \"data_analytics_dag\", \"schedule_interval\": \"1 day, 0:00:00\", \"tags\": \"[]\", \"timetable\": {\"delta\": 86400.0}}, \"dagRun\": {\"conf\": {}, \"dag_id\": \"data_analytics_dag\", \"data_interval_end\": \"2023-09-13T10:16:06.365428+00:00\", \"data_interval_start\": \"2023-09-12T10:16:06.365428+00:00\", \"external_trigger\": true, \"run_id\": \"manual__2023-09-13T10:16:06.365428+00:00\", \"run_type\": \"manual\", \"start_date\": \"2023-09-13T10:16:11.615234+00:00\"}, \"task\": {\"allow_jagged_rows\": false, \"allow_quoted_newlines\": false, \"args\": {\"bucket\": \"openlineagedemo\", \"destination_project_dataset_table\": \"holiday_weather.holidays\", \"email_on_failure\": false, \"email_on_retry\": false, \"schema_fields\": [{\"name\": \"Date\", \"type\": \"DATE\"}, {\"name\": \"Holiday\", \"type\": \"STRING\"}], \"skip_leading_rows\": 1, \"source_format\": \"CSV\", \"source_objects\": [], \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_id\": \"run_bq_external_ingestion\", \"write_disposition\": \"WRITE_TRUNCATE\"}, \"autodetect\": true, \"bucket\": \"openlineagedemo\", \"cancel_on_kill\": true, \"compression\": \"NONE\", \"configuration\": {}, \"create_disposition\": \"CREATE_IF_NEEDED\", \"deferrable\": false, \"depends_on_past\": false, \"destination_project_dataset_table\": \"holiday_weather.holidays\", \"do_xcom_push\": true, \"downstream_task_ids\": \"['join_bq_datasets.bq_join_holidays_weather_data_2020', 'join_bq_datasets.bq_join_holidays_weather_data_2021']\", \"email_on_failure\": false, \"email_on_retry\": false, \"encoding\": \"UTF-8\", \"executor_config\": {}, \"external_table\": false, \"field_delimiter\": \",\", \"force_rerun\": true, \"gcp_conn_id\": \"google_cloud_default\", \"ignore_first_depends_on_past\": true, \"ignore_unknown_values\": false, \"inlets\": \"[]\", \"mapped\": false, \"max_bad_records\": 0, \"operator_class\": \"airflow.providers.google.cloud.transfers.gcs_to_bigquery.GCSToBigQueryOperator\", \"outlets\": \"[]\", \"owner\": \"airflow\", \"pool\": \"default_pool\", \"pool_slots\": 1, \"priority_weight\": 1, \"queue\": \"default\", \"reattach_states\": \"[]\", \"result_retry\": \", initial=1.0, maximum=60.0, multiplier=2.0, timeout=600.0, on_error=None>\", \"retries\": 2, \"retry_exponential_backoff\": false, \"schema_fields\": \"[{'name': 'Date', 'type': 'DATE'}, {'name': 'Holiday', 'type': 'STRING'}]\", \"schema_object_bucket\": \"openlineagedemo\", \"schema_update_options\": \"[]\", \"skip_leading_rows\": 1, \"source_format\": \"CSV\", \"source_objects\": \"['holidays.csv']\", \"src_fmt_configs\": {}, \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_id\": \"run_bq_external_ingestion\", \"time_partitioning\": {}, \"trigger_rule\": \"all_success\", \"upstream_task_ids\": \"[]\", \"wait_for_downstream\": false, \"weight_rule\": \"downstream\", \"write_disposition\": \"WRITE_TRUNCATE\"}, \"taskInstance\": {\"pool\": \"default_pool\", \"try_number\": 1}, \"taskUuid\": \"422ad112-b51b-396a-8a29-34a9f9ddd10c\"}, \"airflow_runArgs\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"externalTrigger\": true}, \"airflow_version\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/Op", - "insertId": "13pgdg5fox7hme", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:18.832073409Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-8z65g", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "process": "console.py:29" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "enLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"airflowVersion\": \"2.5.3+composer\", \"openlineageAirflowVersion\": \"1.1.0\", \"operator\": \"airflow.providers.google.cloud.transfers.gcs_to_bigquery.GCSToBigQueryOperator\", \"taskInfo\": {\"_BaseOperator__from_mapped\": false, \"_BaseOperator__init_kwargs\": {\"bucket\": \"openlineagedemo\", \"destination_project_dataset_table\": \"holiday_weather.holidays\", \"email_on_failure\": false, \"email_on_retry\": false, \"schema_fields\": [{\"name\": \"Date\", \"type\": \"DATE\"}, {\"name\": \"Holiday\", \"type\": \"STRING\"}], \"skip_leading_rows\": 1, \"source_format\": \"CSV\", \"source_objects\": [], \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_id\": \"run_bq_external_ingestion\", \"write_disposition\": \"WRITE_TRUNCATE\"}, \"_BaseOperator__instantiated\": true, \"_dag\": {\"dag_id\": \"data_analytics_dag\", \"schedule_interval\": \"1 day, 0:00:00\", \"tags\": []}, \"_log\": \"\", \"allow_jagged_rows\": false, \"allow_quoted_newlines\": false, \"autodetect\": true, \"bucket\": \"openlineagedemo\", \"cancel_on_kill\": true, \"compression\": \"NONE\", \"configuration\": {}, \"create_disposition\": \"CREATE_IF_NEEDED\", \"dag_run\": {\"_sa_instance_state\": \"\", \"_state\": \"running\", \"conf\": {}, \"dag_hash\": \"490322026de974a9493458f2b8e6ca00\", \"dag_id\": \"data_analytics_dag\", \"data_interval_end\": \"2023-09-13T10:16:06.365428+00:00\", \"data_interval_start\": \"2023-09-12T10:16:06.365428+00:00\", \"execution_date\": \"2023-09-13T10:16:06.365428+00:00\", \"external_trigger\": true, \"id\": 741, \"last_scheduling_decision\": \"2023-09-13T10:16:17.325294+00:00\", \"log_template_id\": 2, \"queued_at\": \"2023-09-13T10:16:11.225422+00:00\", \"run_id\": \"manual__2023-09-13T10:16:06.365428+00:00\", \"run_type\": \"manual\", \"start_date\": \"2023-09-13T10:16:11.615234+00:00\", \"updated_at\": \"2023-09-13T10:16:17.337353+00:00\"}, \"deferrable\": false, \"depends_on_past\": false, \"destination_project_dataset_table\": \"holiday_weather.holidays\", \"do_xcom_push\": true, \"downstream_task_ids\": \"{'join_bq_datasets.bq_join_holidays_weather_data_2020', 'join_bq_datasets.bq_join_holidays_weather_data_2021'}\", \"email_on_failure\": false, \"email_on_retry\": false, \"encoding\": \"UTF-8\", \"executor_config\": {}, \"external_table\": false, \"field_delimiter\": \",\", \"force_rerun\": true, \"gcp_conn_id\": \"google_cloud_default\", \"ignore_first_depends_on_past\": true, \"ignore_unknown_values\": false, \"inlets\": [], \"max_bad_records\": 0, \"outlets\": [], \"owner\": \"airflow\", \"params\": \"{}\", \"pool\": \"default_pool\", \"pool_slots\": 1, \"priority_weight\": 1, \"queue\": \"default\", \"reattach_states\": \"set()\", \"result_retry\": \", initial=1.0, maximum=60.0, multiplier=2.0, timeout=600.0, on_error=None>\", \"retries\": 2, \"retry_delay\": \"0:05:00\", \"retry_exponential_backoff\": false, \"schema_fields\": [{\"name\": \"Date\", \"type\": \"DATE\"}, {\"name\": \"Holiday\", \"type\": \"STRING\"}], \"schema_object_bucket\": \"openlineagedemo\", \"schema_update_options\": [], \"skip_leading_rows\": 1, \"source_format\": \"CSV\", \"source_objects\": [], \"src_fmt_configs\": {}, \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_group\": \"\", \"task_id\": \"run_bq_external_ingestion\", \"time_partitioning\": {}, \"trigger_rule\": \"all_success\", \"upstream_task_ids\": \"set()\", \"wait_for_downstream\": false, \"weight_rule\": \"downstream\", \"write_disposition\": \"WRITE_TRUNCATE\"}}, \"nominalTime\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/NominalTimeRunFacet\", \"nominalEndTime\": \"2023-09-13T10:16:06.365428Z\", \"nominalStartTime\": \"2023-09-12T10:16:06.365428Z\"}, \"parent\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ParentRunFacet\", \"job\": {\"name\": \"data_analytics_dag\", \"namespace\": \"default\"}", - "insertId": "13pgdg5fox7hmf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:18.832127590Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "try-number": "1", - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": ", \"run\": {\"runId\": \"7fbd4e08-8435-3747-a708-0fa4943e905a\"}}, \"parentRun\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ParentRunFacet\", \"job\": {\"name\": \"data_analytics_dag\", \"namespace\": \"default\"}, \"run\": {\"runId\": \"7fbd4e08-8435-3747-a708-0fa4943e905a\"}}, \"processing_engine\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ProcessingEngineRunFacet\", \"name\": \"Airflow\", \"openlineageAdapterVersion\": \"1.1.0\", \"version\": \"2.5.3+composer\"}, \"unknownSourceAttribute\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"unknownItems\": [{\"name\": \"GCSToBigQueryOperator\", \"properties\": {\"_BaseOperator__from_mapped\": false, \"_BaseOperator__init_kwargs\": {\"bucket\": \"openlineagedemo\", \"destination_project_dataset_table\": \"holiday_weather.holidays\", \"email_on_failure\": false, \"email_on_retry\": false, \"schema_fields\": [{\"name\": \"Date\", \"type\": \"DATE\"}, {\"name\": \"Holiday\", \"type\": \"STRING\"}], \"skip_leading_rows\": 1, \"source_format\": \"CSV\", \"source_objects\": [], \"start_date\": \"<>\", \"task_id\": \"run_bq_external_ingestion\", \"write_disposition\": \"WRITE_TRUNCATE\"}, \"_BaseOperator__instantiated\": true, \"_dag\": \"<>\", \"_log\": \"<>\", \"allow_jagged_rows\": false, \"allow_quoted_newlines\": false, \"autodetect\": true, \"bucket\": \"openlineagedemo\", \"cancel_on_kill\": true, \"compression\": \"NONE\", \"configuration\": {}, \"create_disposition\": \"CREATE_IF_NEEDED\", \"deferrable\": false, \"depends_on_past\": false, \"destination_project_dataset_table\": \"holiday_weather.holidays\", \"do_xcom_push\": true, \"downstream_task_ids\": [], \"email_on_failure\": false, \"email_on_retry\": false, \"encoding\": \"UTF-8\", \"executor_config\": {}, \"external_table\": false, \"field_delimiter\": \",\", \"force_rerun\": true, \"gcp_conn_id\": \"google_cloud_default\", \"ignore_first_depends_on_past\": true, \"ignore_unknown_values\": false, \"inlets\": [], \"max_bad_records\": 0, \"outlets\": [], \"owner\": \"airflow\", \"params\": \"<>\", \"pool\": \"default_pool\", \"pool_slots\": 1, \"priority_weight\": 1, \"queue\": \"default\", \"reattach_states\": [], \"result_retry\": \"<>\", \"retries\": 2, \"retry_delay\": \"<>\", \"retry_exponential_backoff\": false, \"schema_fields\": [{\"name\": \"Date\", \"type\": \"DATE\"}, {\"name\": \"Holiday\", \"type\": \"STRING\"}], \"schema_object_bucket\": \"openlineagedemo\", \"schema_update_options\": [], \"skip_leading_rows\": 1, \"source_format\": \"CSV\", \"source_objects\": [], \"src_fmt_configs\": {}, \"start_date\": \"<>\", \"task_group\": \"<>\", \"task_id\": \"run_bq_external_ingestion\", \"time_partitioning\": {}, \"trigger_rule\": \"all_success\", \"upstream_task_ids\": [], \"wait_for_downstream\": false, \"weight_rule\": \"downstream\", \"write_disposition\": \"WRITE_TRUNCATE\"}, \"type\": \"operator\"}]}}, \"runId\": \"422ad112-b51b-396a-8a29-34a9f9ddd10c\"}, \"schemaURL\": \"https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent\"}", - "insertId": "13pgdg5fox7hmg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:18.832144636Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Executing on 2023-09-13 10:16:06.365428+00:00", - "insertId": "13pgdg5fox7hmh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:18.839407839Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1310", - "task-id": "run_bq_external_ingestion", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-8z65g", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Started process 245 to run task", - "insertId": "13pgdg5fox7hmi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:18.902346170Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "process": "standard_task_runner.py:55", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "task-id": "run_bq_external_ingestion", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'run_bq_external_ingestion', 'manual__2023-09-13T10:16:06.365428+00:00', '--job-id', '971', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpmkj5vq6x']", - "insertId": "13pgdg5fox7hmj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:18.912614663Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "map-index": "-1", - "process": "standard_task_runner.py:82", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-8z65g", - "task-id": "run_bq_external_ingestion", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Job 971: Subtask run_bq_external_ingestion", - "insertId": "13pgdg5fox7hmk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:18.913335750Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "process": "standard_task_runner.py:83", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Running on host airflow-worker-8z65g", - "insertId": "13pgdg5fox7hml", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:19.272791835Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "process": "task_command.py:393", - "try-number": "1", - "workflow": "data_analytics_dag", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=run_bq_external_ingestion\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T10:16:06.365428+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=manual__2023-09-13T10:16:06.365428+00:00", - "insertId": "13pgdg5fox7hmm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:19.536694962Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:1518", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "13pgdg5fox7hmn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:19.576891072Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "workflow": "data_analytics_dag", - "process": "base.py:73", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Using existing BigQuery table for storing data...", - "insertId": "13pgdg5fox7hmo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:19.579456491Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "gcs_to_bigquery.py:375", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "13pgdg5fox7hmp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:19.580224587Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "try-number": "1", - "process": "credentials_provider.py:353", - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Project is not included in destination_project_dataset_table: holiday_weather.holidays; using project \"acceldata-acm\"", - "insertId": "13pgdg5fox7hmq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:19.631652915Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "workflow": "data_analytics_dag", - "process": "bigquery.py:2314", - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Executing: {'load': {'autodetect': True, 'createDisposition': 'CREATE_IF_NEEDED', 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays'}, 'sourceFormat': 'CSV', 'sourceUris': ['gs://openlineagedemo/holidays.csv'], 'writeDisposition': 'WRITE_TRUNCATE', 'ignoreUnknownValues': False, 'schema': {'fields': [{'name': 'Date', 'type': 'DATE'}, {'name': 'Holiday', 'type': 'STRING'}]}, 'skipLeadingRows': 1, 'fieldDelimiter': ',', 'quote': None, 'allowQuotedNewlines': False, 'encoding': 'UTF-8'}}", - "insertId": "13pgdg5fox7hmr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:19.632737396Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "try-number": "1", - "task-id": "run_bq_external_ingestion", - "process": "gcs_to_bigquery.py:379" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_run_bq_external_ingestion_2023_09_13T10_16_06_365428_00_00_547a7515133390b6dcb713ecf5d2c80b", - "insertId": "13pgdg5fox7hms", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:19.634121419Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "bigquery.py:1596", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "task-id": "run_bq_external_ingestion", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=run_bq_external_ingestion, execution_date=20230913T101606, start_date=20230913T101618, end_date=20230913T101622", - "insertId": "13pgdg5fox7hmt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:22.217089870Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g", - "process": "taskinstance.py:1328", - "task-id": "run_bq_external_ingestion", - "workflow": "data_analytics_dag", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "{\"eventTime\": \"2023-09-13T10:16:22.215489Z\", \"eventType\": \"COMPLETE\", \"inputs\": [], \"job\": {\"facets\": {}, \"name\": \"data_analytics_dag.run_bq_external_ingestion\", \"namespace\": \"default\"}, \"outputs\": [], \"producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"run\": {\"facets\": {\"unknownSourceAttribute\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"unknownItems\": [{\"name\": \"GCSToBigQueryOperator\", \"properties\": {\"_BaseOperator__from_mapped\": false, \"_BaseOperator__init_kwargs\": {\"bucket\": \"openlineagedemo\", \"destination_project_dataset_table\": \"holiday_weather.holidays\", \"email_on_failure\": false, \"email_on_retry\": false, \"schema_fields\": [{\"name\": \"Date\", \"type\": \"DATE\"}, {\"name\": \"Holiday\", \"type\": \"STRING\"}], \"skip_leading_rows\": 1, \"source_format\": \"CSV\", \"source_objects\": [], \"start_date\": \"<>\", \"task_id\": \"run_bq_external_ingestion\", \"write_disposition\": \"WRITE_TRUNCATE\"}, \"_BaseOperator__instantiated\": true, \"_dag\": \"<>\", \"_log\": \"<>\", \"allow_jagged_rows\": false, \"allow_quoted_newlines\": false, \"autodetect\": true, \"bucket\": \"openlineagedemo\", \"cancel_on_kill\": true, \"compression\": \"NONE\", \"configuration\": {}, \"create_disposition\": \"CREATE_IF_NEEDED\", \"deferrable\": false, \"depends_on_past\": false, \"destination_project_dataset_table\": \"holiday_weather.holidays\", \"do_xcom_push\": true, \"downstream_task_ids\": [], \"email_on_failure\": false, \"email_on_retry\": false, \"encoding\": \"UTF-8\", \"executor_config\": {}, \"external_table\": false, \"field_delimiter\": \",\", \"force_rerun\": true, \"gcp_conn_id\": \"google_cloud_default\", \"ignore_first_depends_on_past\": true, \"ignore_unknown_values\": false, \"inlets\": [], \"max_bad_records\": 0, \"outlets\": [], \"owner\": \"airflow\", \"params\": \"<>\", \"pool\": \"default_pool\", \"pool_slots\": 1, \"priority_weight\": 1, \"queue\": \"default\", \"reattach_states\": [], \"result_retry\": \"<>\", \"retries\": 2, \"retry_delay\": \"<>\", \"retry_exponential_backoff\": false, \"schema_fields\": [{\"name\": \"Date\", \"type\": \"DATE\"}, {\"name\": \"Holiday\", \"type\": \"STRING\"}], \"schema_object_bucket\": \"openlineagedemo\", \"schema_update_options\": [], \"skip_leading_rows\": 1, \"source_format\": \"CSV\", \"source_objects\": [], \"src_fmt_configs\": {}, \"start_date\": \"<>\", \"task_group\": \"<>\", \"task_id\": \"run_bq_external_ingestion\", \"time_partitioning\": {}, \"trigger_rule\": \"all_success\", \"upstream_task_ids\": [], \"wait_for_downstream\": false, \"weight_rule\": \"downstream\", \"write_disposition\": \"WRITE_TRUNCATE\"}, \"type\": \"operator\"}]}}, \"runId\": \"422ad112-b51b-396a-8a29-34a9f9ddd10c\"}, \"schemaURL\": \"https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent\"}", - "insertId": "13pgdg5fox7hmu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:22.312619676Z", - "severity": "INFO", - "labels": { - "process": "console.py:29", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "map-index": "-1", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[5353ffa2-6b4c-4159-8624-68025954ae01] received", - "insertId": "13pgdg5fox7hmv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:22.474601764Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[3aad797d-d8b4-4358-8759-11ea8f32d2a9] received", - "insertId": "13pgdg5fox7hmw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:22.480302358Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "13pgdg5fox7hmx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:22.607386632Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "workflow": "data_analytics_dag", - "process": "local_task_job.py:212", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "[5353ffa2-6b4c-4159-8624-68025954ae01] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2020', 'manual__2023-09-13T10:16:06.365428+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "13pgdg5fox7hmy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:22.622648062Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "[3aad797d-d8b4-4358-8759-11ea8f32d2a9] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2021', 'manual__2023-09-13T10:16:06.365428+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "13pgdg5fox7hmz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:22.630372746Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:23.868914585Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "2hzkj9f2i01de", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:23.005466390Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "process": "taskinstance.py:2599", - "worker_id": "airflow-worker-8z65g", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:28.837378740Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[2e71476c-c9a3-47ed-bfa5-90a0f6caeec0] succeeded in 11.198478614998749s: None", - "insertId": "2hzkj9f2i01df", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:23.414858322Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:28.837378740Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "2hzkj9f2i01dg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:24.127678153Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:28.837378740Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "2hzkj9f2i01dh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:24.130816342Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:28.837378740Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "2hzkj9f2i01di", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:24.323434873Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:28.837378740Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "2hzkj9f2i01dj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:24.329372404Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:28.837378740Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "2hzkj9f2i01dk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:24.639911540Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:28.837378740Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "2hzkj9f2i01dl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:24.825821612Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:28.837378740Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "qnbw2ifi68i30", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:28.306698941Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:33.960736148Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "qnbw2ifi68i31", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:28.420051338Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:33.960736148Z" - }, - { - "textPayload": "Running on host airflow-worker-8z65g", - "insertId": "9pw4s0f6oij2e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:41.117703047Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "9pw4s0f6oij2f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:41.615047307Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "taskinstance.py:1091", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "9pw4s0f6oij2g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:41.716685031Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "workflow": "data_analytics_dag", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "9pw4s0f6oij2h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:41.719005402Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1", - "process": "taskinstance.py:1289" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "9pw4s0f6oij2i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:41.720309581Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "process": "taskinstance.py:1290", - "try-number": "1", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "9pw4s0f6oij2j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:41.722670390Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "process": "taskinstance.py:1291", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "Running on host airflow-worker-8z65g", - "insertId": "9pw4s0f6oij2k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:42.468658268Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "9pw4s0f6oij2l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:42.752657659Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "9pw4s0f6oij2m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:42.752702226Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "9pw4s0f6oij2n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:42.919459017Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "9pw4s0f6oij2o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:42.919531122Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "Did not find openlineage.yml and OPENLINEAGE_URL is not set", - "insertId": "9pw4s0f6oij2p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:42.951636916Z", - "severity": "ERROR", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-8z65g", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "process": "factory.py:78", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "Couldn't initialize transport; will print events to console.", - "insertId": "9pw4s0f6oij2q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:42.952322326Z", - "severity": "WARNING", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "factory.py:37", - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:16:06.365428+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "{\"eventTime\": \"2023-09-13T10:16:41.614916Z\", \"eventType\": \"START\", \"inputs\": [], \"job\": {\"facets\": {\"ownership\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/OwnershipJobFacet\", \"owners\": [{\"name\": \"airflow\"}]}}, \"name\": \"data_analytics_dag.join_bq_datasets.bq_join_holidays_weather_data_2020\", \"namespace\": \"default\"}, \"outputs\": [], \"producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"run\": {\"facets\": {\"airflow\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"dag\": {\"dag_id\": \"data_analytics_dag\", \"schedule_interval\": \"1 day, 0:00:00\", \"tags\": \"[]\", \"timetable\": {\"delta\": 86400.0}}, \"dagRun\": {\"conf\": {}, \"dag_id\": \"data_analytics_dag\", \"data_interval_end\": \"2023-09-13T10:16:06.365428+00:00\", \"data_interval_start\": \"2023-09-12T10:16:06.365428+00:00\", \"external_trigger\": true, \"run_id\": \"manual__2023-09-13T10:16:06.365428+00:00\", \"run_type\": \"manual\", \"start_date\": \"2023-09-13T10:16:11.615234+00:00\"}, \"task\": {\"args\": {\"configuration\": {\"query\": {\"destinationTable\": {\"datasetId\": \"holiday_weather\", \"projectId\": \"acceldata-acm\", \"tableId\": \"holidays_weather_joined\"}, \"query\": \"\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2020 AS Table WHERE Table.element=\\\"TMAX\\\" AND Table.id=\\\"USW00094846\\\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n \", \"useLegacySql\": false, \"writeDisposition\": \"WRITE_APPEND\"}}, \"email_on_failure\": false, \"email_on_retry\": false, \"location\": \"US\", \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_id\": \"join_bq_datasets.bq_join_holidays_weather_data_2020\"}, \"cancel_on_kill\": true, \"configuration\": {\"query\": {\"destinationTable\": {\"datasetId\": \"holiday_weather\", \"projectId\": \"acceldata-acm\", \"tableId\": \"holidays_weather_joined\"}, \"query\": \"\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2020 AS Table WHERE Table.element=\\\"TMAX\\\" AND Table.id=\\\"USW00094846\\\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n \", \"useLegacySql\": false, \"writeDisposition\": \"WRITE_APPEND\"}}, \"deferrable\": false, \"depends_on_past\": false, \"do_xcom_push\": true, \"downstream_task_ids\": \"['create_batch']\", \"email_on_failure\": false, \"email_on_retry\": false, \"executor_config\": {}, \"force_rerun\": true, \"gcp_conn_id\": \"google_cloud_default\", \"ignore_first_depends_on_past\": true, \"inlets\": \"[]\", \"location\": \"US\", \"mapped\": false, \"operator_class\": \"airflow.providers.google.cloud.operators.bigquery.BigQueryInsertJobOperator\", \"outlets\": \"[]\", \"owner\": \"airflow\", \"poll_interval\": 4.0, \"pool\": \"default_pool\", \"pool_slots\": 1, \"priority_weight\": 1, \"queue\": \"default\", \"reattach_states\": \"[]\", \"result_retry\": \", initial=1.0, maximum=60.0, multiplier=2.0, timeout=600.0, on_error=None>\", \"retries\": 2, \"retry_exponential_backoff\": false, \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_group\": {\"downstream_group_ids\": \"[]\", \"downstream_task_ids\": \"['create_batch']\", \"group_id\": \"join_bq_datasets\", \"prefix_group_id\": true, \"tooltip\": \"\", \"upstream_group_ids\": \"[]\", \"upstream_task_ids\": \"['run_bq_external_ingestion']\"}, \"task_id\": \"join_bq_datasets.bq_join_holidays_weather_data_2020\", \"trigger_rule\": \"all_success\", \"upstream_task_ids\": \"['run_bq_external_ingestion']\", \"wait_for_downstream\": false, \"weight_rule\": \"downstream\"}, \"taskInstance\": {\"pool\": \"default_pool\", \"try_number\": 1}, \"t", - "insertId": "9pw4s0f6oij2r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:43.021515847Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag", - "process": "console.py:29" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "askUuid\": \"aca24a7f-3904-3bd0-ac70-6a0367b77c87\"}, \"airflow_runArgs\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"externalTrigger\": true}, \"airflow_version\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"airflowVersion\": \"2.5.3+composer\", \"openlineageAirflowVersion\": \"1.1.0\", \"operator\": \"airflow.providers.google.cloud.operators.bigquery.BigQueryInsertJobOperator\", \"taskInfo\": {\"_BaseOperator__from_mapped\": false, \"_BaseOperator__init_kwargs\": {\"configuration\": {\"query\": {\"destinationTable\": {\"datasetId\": \"holiday_weather\", \"projectId\": \"acceldata-acm\", \"tableId\": \"holidays_weather_joined\"}, \"query\": \"\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2020 AS Table WHERE Table.element=\\\"TMAX\\\" AND Table.id=\\\"USW00094846\\\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n \", \"useLegacySql\": false, \"writeDisposition\": \"WRITE_APPEND\"}}, \"email_on_failure\": false, \"email_on_retry\": false, \"location\": \"US\", \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_id\": \"join_bq_datasets.bq_join_holidays_weather_data_2020\"}, \"_BaseOperator__instantiated\": true, \"_dag\": {\"dag_id\": \"data_analytics_dag\", \"schedule_interval\": \"1 day, 0:00:00\", \"tags\": []}, \"_log\": \"\", \"cancel_on_kill\": true, \"configuration\": {\"query\": {\"destinationTable\": {\"datasetId\": \"holiday_weather\", \"projectId\": \"acceldata-acm\", \"tableId\": \"holidays_weather_joined\"}, \"query\": \"\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2020 AS Table WHERE Table.element=\\\"TMAX\\\" AND Table.id=\\\"USW00094846\\\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n \", \"useLegacySql\": false, \"writeDisposition\": \"WRITE_APPEND\"}}, \"dag_run\": {\"_sa_instance_state\": \"\", \"_state\": \"running\", \"conf\": {}, \"dag_hash\": \"490322026de974a9493458f2b8e6ca00\", \"dag_id\": \"data_analytics_dag\", \"data_interval_end\": \"2023-09-13T10:16:06.365428+00:00\", \"data_interval_start\": \"2023-09-12T10:16:06.365428+00:00\", \"execution_date\": \"2023-09-13T10:16:06.365428+00:00\", \"external_trigger\": true, \"id\": 741, \"last_scheduling_decision\": \"2023-09-13T10:16:39.115374+00:00\", \"log_template_id\": 2, \"queued_at\": \"2023-09-13T10:16:11.225422+00:00\", \"run_id\": \"manual__2023-09-13T10:16:06.365428+00:00\", \"run_type\": \"manual\", \"start_date\": \"2023-09-13T10:16:11.615234+00:00\", \"updated_at\": \"2023-09-13T10:16:39.140301+00:00\"}, \"deferrable\": false, \"depends_on_past\": false, \"do_xcom_push\": true, \"downstream_task_ids\": \"{'create_batch'}\", \"email_on_failure\": false, \"email_on_retry\": false, \"executor_config\": {}, \"force_rerun\": true, \"gcp_conn_id\": \"google_cloud_default\", \"ignore_first_depends_on_past\": true, \"inlets\": [], \"location\": \"US\", \"outlets\": [], \"owner\": \"airflow\", \"params\": \"{}\", \"poll_interval\": 4.0, \"pool\": \"default_pool\", \"pool_slots\": 1, \"priority_weight\": 1, \"queue\": \"default\", \"reattach_states\": \"set()\", \"result_retry\": \", initial=1.0, maximum=60.0, multiplier=2.0, timeout=600.0, on_error=None>\", \"retries\": 2, \"retry_delay\": \"0:05:00\", \"retry_exponential_backoff\": false, \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_group\": \"\", \"task_id\": \"join_bq_datasets.bq_join_holidays_weather_data_2020\", \"trigger_rule\": \"all_success\", \"upstream_task_ids\": \"{'run_bq_external_ingestion'}\", \"", - "insertId": "9pw4s0f6oij2s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:43.021618908Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "wait_for_downstream\": false, \"weight_rule\": \"downstream\"}}, \"nominalTime\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/NominalTimeRunFacet\", \"nominalEndTime\": \"2023-09-13T10:16:06.365428Z\", \"nominalStartTime\": \"2023-09-12T10:16:06.365428Z\"}, \"parent\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ParentRunFacet\", \"job\": {\"name\": \"data_analytics_dag\", \"namespace\": \"default\"}, \"run\": {\"runId\": \"7fbd4e08-8435-3747-a708-0fa4943e905a\"}}, \"parentRun\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ParentRunFacet\", \"job\": {\"name\": \"data_analytics_dag\", \"namespace\": \"default\"}, \"run\": {\"runId\": \"7fbd4e08-8435-3747-a708-0fa4943e905a\"}}, \"processing_engine\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ProcessingEngineRunFacet\", \"name\": \"Airflow\", \"openlineageAdapterVersion\": \"1.1.0\", \"version\": \"2.5.3+composer\"}}, \"runId\": \"aca24a7f-3904-3bd0-ac70-6a0367b77c87\"}, \"schemaURL\": \"https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent\"}", - "insertId": "9pw4s0f6oij2t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:43.021638641Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "Executing on 2023-09-13 10:16:06.365428+00:00", - "insertId": "9pw4s0f6oij2u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:43.032853220Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "try-number": "1", - "process": "taskinstance.py:1310" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "9pw4s0f6oij2v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:43.107848779Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "process": "taskinstance.py:1091", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "Started process 262 to run task", - "insertId": "9pw4s0f6oij2w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:43.137449990Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-13T10:16:06.365428+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "9pw4s0f6oij2x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:43.233823815Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "process": "taskinstance.py:1091", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "9pw4s0f6oij2y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:43.234480778Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "try-number": "1", - "process": "taskinstance.py:1289" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "9pw4s0f6oij2z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:43.235248254Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "process": "taskinstance.py:1290", - "map-index": "-1", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "try-number": "1", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "9pw4s0f6oij30", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:43.237413275Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "process": "taskinstance.py:1291", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "workflow": "data_analytics_dag", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2020', 'manual__2023-09-13T10:16:06.365428+00:00', '--job-id', '972', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmps5ayiaw0']", - "insertId": "9pw4s0f6oij31", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:43.317156710Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "process": "standard_task_runner.py:82", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "Job 972: Subtask join_bq_datasets.bq_join_holidays_weather_data_2020", - "insertId": "9pw4s0f6oij32", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:43.322727210Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "try-number": "1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "process": "standard_task_runner.py:83", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:44.864942856Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "vcy0ouflx5twa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:44.503982273Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "vcy0ouflx5twb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:44.504043306Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "vcy0ouflx5twc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:44.614902840Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "vcy0ouflx5twd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:44.614934019Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Did not find openlineage.yml and OPENLINEAGE_URL is not set", - "insertId": "vcy0ouflx5twe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:44.640431337Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g", - "try-number": "1", - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "process": "factory.py:78", - "execution-date": "2023-09-13T10:16:06.365428+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Couldn't initialize transport; will print events to console.", - "insertId": "vcy0ouflx5twf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:44.642239844Z", - "severity": "WARNING", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "process": "factory.py:37", - "workflow": "data_analytics_dag", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "{\"eventTime\": \"2023-09-13T10:16:43.107907Z\", \"eventType\": \"START\", \"inputs\": [], \"job\": {\"facets\": {\"ownership\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/OwnershipJobFacet\", \"owners\": [{\"name\": \"airflow\"}]}}, \"name\": \"data_analytics_dag.join_bq_datasets.bq_join_holidays_weather_data_2021\", \"namespace\": \"default\"}, \"outputs\": [], \"producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"run\": {\"facets\": {\"airflow\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"dag\": {\"dag_id\": \"data_analytics_dag\", \"schedule_interval\": \"1 day, 0:00:00\", \"tags\": \"[]\", \"timetable\": {\"delta\": 86400.0}}, \"dagRun\": {\"conf\": {}, \"dag_id\": \"data_analytics_dag\", \"data_interval_end\": \"2023-09-13T10:16:06.365428+00:00\", \"data_interval_start\": \"2023-09-12T10:16:06.365428+00:00\", \"external_trigger\": true, \"run_id\": \"manual__2023-09-13T10:16:06.365428+00:00\", \"run_type\": \"manual\", \"start_date\": \"2023-09-13T10:16:11.615234+00:00\"}, \"task\": {\"args\": {\"configuration\": {\"query\": {\"destinationTable\": {\"datasetId\": \"holiday_weather\", \"projectId\": \"acceldata-acm\", \"tableId\": \"holidays_weather_joined\"}, \"query\": \"\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2021 AS Table WHERE Table.element=\\\"TMAX\\\" AND Table.id=\\\"USW00094846\\\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n \", \"useLegacySql\": false, \"writeDisposition\": \"WRITE_APPEND\"}}, \"email_on_failure\": false, \"email_on_retry\": false, \"location\": \"US\", \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_id\": \"join_bq_datasets.bq_join_holidays_weather_data_2021\"}, \"cancel_on_kill\": true, \"configuration\": {\"query\": {\"destinationTable\": {\"datasetId\": \"holiday_weather\", \"projectId\": \"acceldata-acm\", \"tableId\": \"holidays_weather_joined\"}, \"query\": \"\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2021 AS Table WHERE Table.element=\\\"TMAX\\\" AND Table.id=\\\"USW00094846\\\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n \", \"useLegacySql\": false, \"writeDisposition\": \"WRITE_APPEND\"}}, \"deferrable\": false, \"depends_on_past\": false, \"do_xcom_push\": true, \"downstream_task_ids\": \"['create_batch']\", \"email_on_failure\": false, \"email_on_retry\": false, \"executor_config\": {}, \"force_rerun\": true, \"gcp_conn_id\": \"google_cloud_default\", \"ignore_first_depends_on_past\": true, \"inlets\": \"[]\", \"location\": \"US\", \"mapped\": false, \"operator_class\": \"airflow.providers.google.cloud.operators.bigquery.BigQueryInsertJobOperator\", \"outlets\": \"[]\", \"owner\": \"airflow\", \"poll_interval\": 4.0, \"pool\": \"default_pool\", \"pool_slots\": 1, \"priority_weight\": 1, \"queue\": \"default\", \"reattach_states\": \"[]\", \"result_retry\": \", initial=1.0, maximum=60.0, multiplier=2.0, timeout=600.0, on_error=None>\", \"retries\": 2, \"retry_exponential_backoff\": false, \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_group\": {\"downstream_group_ids\": \"[]\", \"downstream_task_ids\": \"['create_batch']\", \"group_id\": \"join_bq_datasets\", \"prefix_group_id\": true, \"tooltip\": \"\", \"upstream_group_ids\": \"[]\", \"upstream_task_ids\": \"['run_bq_external_ingestion']\"}, \"task_id\": \"join_bq_datasets.bq_join_holidays_weather_data_2021\", \"trigger_rule\": \"all_success\", \"upstream_task_ids\": \"['run_bq_external_ingestion']\", \"wait_for_downstream\": false, \"weight_rule\": \"downstream\"}, \"taskInstance\": {\"pool\": \"default_pool\", \"try_number\": 1}, \"t", - "insertId": "vcy0ouflx5twg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:44.710287380Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "process": "console.py:29", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "askUuid\": \"946663c3-02ef-3dd2-89b4-aeb1b6c76bbe\"}, \"airflow_runArgs\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"externalTrigger\": true}, \"airflow_version\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"airflowVersion\": \"2.5.3+composer\", \"openlineageAirflowVersion\": \"1.1.0\", \"operator\": \"airflow.providers.google.cloud.operators.bigquery.BigQueryInsertJobOperator\", \"taskInfo\": {\"_BaseOperator__from_mapped\": false, \"_BaseOperator__init_kwargs\": {\"configuration\": {\"query\": {\"destinationTable\": {\"datasetId\": \"holiday_weather\", \"projectId\": \"acceldata-acm\", \"tableId\": \"holidays_weather_joined\"}, \"query\": \"\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2021 AS Table WHERE Table.element=\\\"TMAX\\\" AND Table.id=\\\"USW00094846\\\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n \", \"useLegacySql\": false, \"writeDisposition\": \"WRITE_APPEND\"}}, \"email_on_failure\": false, \"email_on_retry\": false, \"location\": \"US\", \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_id\": \"join_bq_datasets.bq_join_holidays_weather_data_2021\"}, \"_BaseOperator__instantiated\": true, \"_dag\": {\"dag_id\": \"data_analytics_dag\", \"schedule_interval\": \"1 day, 0:00:00\", \"tags\": []}, \"_log\": \"\", \"cancel_on_kill\": true, \"configuration\": {\"query\": {\"destinationTable\": {\"datasetId\": \"holiday_weather\", \"projectId\": \"acceldata-acm\", \"tableId\": \"holidays_weather_joined\"}, \"query\": \"\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2021 AS Table WHERE Table.element=\\\"TMAX\\\" AND Table.id=\\\"USW00094846\\\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n \", \"useLegacySql\": false, \"writeDisposition\": \"WRITE_APPEND\"}}, \"dag_run\": {\"_sa_instance_state\": \"\", \"_state\": \"running\", \"conf\": {}, \"dag_hash\": \"490322026de974a9493458f2b8e6ca00\", \"dag_id\": \"data_analytics_dag\", \"data_interval_end\": \"2023-09-13T10:16:06.365428+00:00\", \"data_interval_start\": \"2023-09-12T10:16:06.365428+00:00\", \"execution_date\": \"2023-09-13T10:16:06.365428+00:00\", \"external_trigger\": true, \"id\": 741, \"last_scheduling_decision\": \"2023-09-13T10:16:40.917732+00:00\", \"log_template_id\": 2, \"queued_at\": \"2023-09-13T10:16:11.225422+00:00\", \"run_id\": \"manual__2023-09-13T10:16:06.365428+00:00\", \"run_type\": \"manual\", \"start_date\": \"2023-09-13T10:16:11.615234+00:00\", \"updated_at\": \"2023-09-13T10:16:40.937176+00:00\"}, \"deferrable\": false, \"depends_on_past\": false, \"do_xcom_push\": true, \"downstream_task_ids\": \"{'create_batch'}\", \"email_on_failure\": false, \"email_on_retry\": false, \"executor_config\": {}, \"force_rerun\": true, \"gcp_conn_id\": \"google_cloud_default\", \"ignore_first_depends_on_past\": true, \"inlets\": [], \"location\": \"US\", \"outlets\": [], \"owner\": \"airflow\", \"params\": \"{}\", \"poll_interval\": 4.0, \"pool\": \"default_pool\", \"pool_slots\": 1, \"priority_weight\": 1, \"queue\": \"default\", \"reattach_states\": \"set()\", \"result_retry\": \", initial=1.0, maximum=60.0, multiplier=2.0, timeout=600.0, on_error=None>\", \"retries\": 2, \"retry_delay\": \"0:05:00\", \"retry_exponential_backoff\": false, \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_group\": \"\", \"task_id\": \"join_bq_datasets.bq_join_holidays_weather_data_2021\", \"trigger_rule\": \"all_success\", \"upstream_task_ids\": \"{'run_bq_external_ingestion'}\", \"", - "insertId": "vcy0ouflx5twh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:44.710365847Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "wait_for_downstream\": false, \"weight_rule\": \"downstream\"}}, \"nominalTime\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/NominalTimeRunFacet\", \"nominalEndTime\": \"2023-09-13T10:16:06.365428Z\", \"nominalStartTime\": \"2023-09-12T10:16:06.365428Z\"}, \"parent\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ParentRunFacet\", \"job\": {\"name\": \"data_analytics_dag\", \"namespace\": \"default\"}, \"run\": {\"runId\": \"7fbd4e08-8435-3747-a708-0fa4943e905a\"}}, \"parentRun\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ParentRunFacet\", \"job\": {\"name\": \"data_analytics_dag\", \"namespace\": \"default\"}, \"run\": {\"runId\": \"7fbd4e08-8435-3747-a708-0fa4943e905a\"}}, \"processing_engine\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ProcessingEngineRunFacet\", \"name\": \"Airflow\", \"openlineageAdapterVersion\": \"1.1.0\", \"version\": \"2.5.3+composer\"}}, \"runId\": \"946663c3-02ef-3dd2-89b4-aeb1b6c76bbe\"}, \"schemaURL\": \"https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent\"}", - "insertId": "vcy0ouflx5twi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:44.710383954Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-8z65g", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Executing on 2023-09-13 10:16:06.365428+00:00", - "insertId": "vcy0ouflx5twj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:44.720206595Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "process": "taskinstance.py:1310", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "worker_id": "airflow-worker-8z65g", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T10:16:06.365428+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Started process 267 to run task", - "insertId": "vcy0ouflx5twk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:44.804131572Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-8z65g", - "process": "standard_task_runner.py:55" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2021', 'manual__2023-09-13T10:16:06.365428+00:00', '--job-id', '973', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpbmylfz_9']", - "insertId": "vcy0ouflx5twl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:44.836236316Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "standard_task_runner.py:82", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Job 973: Subtask join_bq_datasets.bq_join_holidays_weather_data_2021", - "insertId": "vcy0ouflx5twm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:44.838992287Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "process": "standard_task_runner.py:83", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Running on host airflow-worker-8z65g", - "insertId": "vcy0ouflx5twn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:44.902309346Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "task_command.py:393", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:16:06.365428+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Running on host airflow-worker-8z65g", - "insertId": "vcy0ouflx5two", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:46.002345806Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "try-number": "1", - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2020\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T10:16:06.365428+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=manual__2023-09-13T10:16:06.365428+00:00", - "insertId": "vcy0ouflx5twp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:46.234457595Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g", - "workflow": "data_analytics_dag", - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:1518" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "vcy0ouflx5twq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:46.408533032Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "base.py:73", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2020 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "vcy0ouflx5twr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:46.416756904Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "workflow": "data_analytics_dag", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "process": "bigquery.py:2710" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "vcy0ouflx5tws", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:46.420005591Z", - "severity": "INFO", - "labels": { - "process": "credentials_provider.py:353", - "worker_id": "airflow-worker-8z65g", - "try-number": "1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2020_2023_09_13T10_16_06_365428_00_00_551731e37ff51512987c934c95342fe0", - "insertId": "vcy0ouflx5twt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:46.517407767Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-8z65g", - "process": "bigquery.py:1596", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2021\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T10:16:06.365428+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=manual__2023-09-13T10:16:06.365428+00:00", - "insertId": "vcy0ouflx5twu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:47.191061189Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1518", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "vcy0ouflx5twv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:47.304020941Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "process": "base.py:73", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2021 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "vcy0ouflx5tww", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:47.314722518Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "workflow": "data_analytics_dag", - "process": "bigquery.py:2710", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "vcy0ouflx5twx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:47.315773915Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "process": "credentials_provider.py:353", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "execution-date": "2023-09-13T10:16:06.365428+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2021_2023_09_13T10_16_06_365428_00_00_5cc0714ea3782c8100f8aabd1eaa6259", - "insertId": "vcy0ouflx5twy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:47.403040478Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-8z65g", - "process": "bigquery.py:1596", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:49.850267694Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2020, execution_date=20230913T101606, start_date=20230913T101641, end_date=20230913T101649", - "insertId": "1q63gwwfozbzk7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:49.720591901Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1328", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "try-number": "1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:54.849661693Z" - }, - { - "textPayload": "{\"eventTime\": \"2023-09-13T10:16:49.719679Z\", \"eventType\": \"COMPLETE\", \"inputs\": [], \"job\": {\"facets\": {}, \"name\": \"data_analytics_dag.join_bq_datasets.bq_join_holidays_weather_data_2020\", \"namespace\": \"default\"}, \"outputs\": [], \"producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"run\": {\"facets\": {}, \"runId\": \"aca24a7f-3904-3bd0-ac70-6a0367b77c87\"}, \"schemaURL\": \"https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent\"}", - "insertId": "1q63gwwfozbzk8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:49.829342054Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "console.py:29", - "worker_id": "airflow-worker-8z65g", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "try-number": "1", - "execution-date": "2023-09-13T10:16:06.365428+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:54.849661693Z" - }, - { - "textPayload": "I0913 10:16:49.973415 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "nlili1foy3glt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:49.973688379Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:16:55.500188765Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1q63gwwfozbzk9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:50.108236035Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "local_task_job.py:212", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:54.849661693Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1q63gwwfozbzka", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:50.328174415Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "try-number": "1", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:2599", - "execution-date": "2023-09-13T10:16:06.365428+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:54.849661693Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2021, execution_date=20230913T101606, start_date=20230913T101643, end_date=20230913T101650", - "insertId": "1q63gwwfozbzkb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:50.609634750Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "try-number": "1", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1328", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:54.849661693Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[5353ffa2-6b4c-4159-8624-68025954ae01] succeeded in 28.254463076009415s: None", - "insertId": "1q63gwwfozbzkc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:50.737209121Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:54.849661693Z" - }, - { - "textPayload": "{\"eventTime\": \"2023-09-13T10:16:50.608785Z\", \"eventType\": \"COMPLETE\", \"inputs\": [], \"job\": {\"facets\": {}, \"name\": \"data_analytics_dag.join_bq_datasets.bq_join_holidays_weather_data_2021\", \"namespace\": \"default\"}, \"outputs\": [], \"producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"run\": {\"facets\": {}, \"runId\": \"946663c3-02ef-3dd2-89b4-aeb1b6c76bbe\"}, \"schemaURL\": \"https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent\"}", - "insertId": "1q63gwwfozbzkd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:50.811276541Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "process": "console.py:29", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:54.849661693Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1q63gwwfozbzke", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:16:50.945468577Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "local_task_job.py:212", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T10:16:06.365428+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:54.849661693Z" - }, - { - "textPayload": "1 downstream tasks scheduled from follow-on schedule check", - "insertId": "1q63gwwfozbzkf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:51.220348346Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "process": "taskinstance.py:2599", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:54.849661693Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[7616318c-7a02-4510-8ae0-e602dacbaafc] received", - "insertId": "1q63gwwfozbzkg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:51.510084023Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:54.849661693Z" - }, - { - "textPayload": "[7616318c-7a02-4510-8ae0-e602dacbaafc] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'create_batch', 'manual__2023-09-13T10:16:06.365428+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "1q63gwwfozbzkh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:51.518978456Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:54.849661693Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[3aad797d-d8b4-4358-8759-11ea8f32d2a9] succeeded in 29.10347548898426s: None", - "insertId": "1q63gwwfozbzki", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:51.607737379Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:54.849661693Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1q63gwwfozbzkj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:52.440175684Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:54.849661693Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1q63gwwfozbzkk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:16:52.443665Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:54.849661693Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1q63gwwfozbzkl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:52.732150742Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:54.849661693Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "pt1ygufp6w04s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:16:55.241178931Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:16:59.906083033Z" - }, - { - "textPayload": "Running on host airflow-worker-8z65g", - "insertId": "1yr8ptufizvx9c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:17:03.406330608Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1yr8ptufizvx9d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:17:03.811337668Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "task-id": "create_batch", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "process": "taskinstance.py:1091", - "workflow": "data_analytics_dag", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1yr8ptufizvx9e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:17:03.910271973Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "try-number": "1", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-8z65g", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1yr8ptufizvx9f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:17:03.912048826Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1289", - "task-id": "create_batch", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "1yr8ptufizvx9g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:17:03.913481530Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g", - "task-id": "create_batch", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1yr8ptufizvx9h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:17:03.914521584Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1291", - "try-number": "1", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:16:06.365428+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1yr8ptufizvx9i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:17:05.007598502Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1yr8ptufizvx9j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:17:05.007706801Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1yr8ptufizvx9k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:17:05.123051213Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1yr8ptufizvx9l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:17:05.123121888Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Did not find openlineage.yml and OPENLINEAGE_URL is not set", - "insertId": "1yr8ptufizvx9m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:17:05.207098814Z", - "severity": "ERROR", - "labels": { - "process": "factory.py:78", - "workflow": "data_analytics_dag", - "task-id": "create_batch", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Couldn't initialize transport; will print events to console.", - "insertId": "1yr8ptufizvx9n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:17:05.208717132Z", - "severity": "WARNING", - "labels": { - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "task-id": "create_batch", - "process": "factory.py:37", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "{\"eventTime\": \"2023-09-13T10:17:03.812164Z\", \"eventType\": \"START\", \"inputs\": [], \"job\": {\"facets\": {\"ownership\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/OwnershipJobFacet\", \"owners\": [{\"name\": \"airflow\"}]}}, \"name\": \"data_analytics_dag.create_batch\", \"namespace\": \"default\"}, \"outputs\": [], \"producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"run\": {\"facets\": {\"airflow\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"dag\": {\"dag_id\": \"data_analytics_dag\", \"schedule_interval\": \"1 day, 0:00:00\", \"tags\": \"[]\", \"timetable\": {\"delta\": 86400.0}}, \"dagRun\": {\"conf\": {}, \"dag_id\": \"data_analytics_dag\", \"data_interval_end\": \"2023-09-13T10:16:06.365428+00:00\", \"data_interval_start\": \"2023-09-12T10:16:06.365428+00:00\", \"external_trigger\": true, \"run_id\": \"manual__2023-09-13T10:16:06.365428+00:00\", \"run_type\": \"manual\", \"start_date\": \"2023-09-13T10:16:11.615234+00:00\"}, \"task\": {\"args\": {\"batch\": {\"environment_config\": {\"execution_config\": {\"service_account\": \"***\"}}, \"pyspark_batch\": {\"args\": [], \"main_python_file_uri\": \"gs://openlineagedemo/data_analytics_process.py\"}, \"runtime_config\": {\"version\": \"1.1\"}}, \"batch_id\": \"data-processing-20230913t101606\", \"email_on_failure\": false, \"email_on_retry\": false, \"project_id\": \"acceldata-acm\", \"region\": \"us-west1\", \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_id\": \"create_batch\"}, \"asynchronous\": false, \"batch\": {\"environment_config\": {\"execution_config\": {\"service_account\": \"***\"}}, \"pyspark_batch\": {\"args\": [], \"main_python_file_uri\": \"gs://openlineagedemo/data_analytics_process.py\"}, \"runtime_config\": {\"version\": \"1.1\"}}, \"batch_id\": \"data-processing-20230913t101606\", \"deferrable\": false, \"depends_on_past\": false, \"do_xcom_push\": true, \"downstream_task_ids\": \"[]\", \"email_on_failure\": false, \"email_on_retry\": false, \"executor_config\": {}, \"gcp_conn_id\": \"google_cloud_default\", \"ignore_first_depends_on_past\": true, \"inlets\": \"[]\", \"mapped\": false, \"metadata\": \"[]\", \"operator_class\": \"airflow.providers.google.cloud.operators.dataproc.DataprocCreateBatchOperator\", \"outlets\": \"[]\", \"owner\": \"airflow\", \"polling_interval_seconds\": 5, \"pool\": \"default_pool\", \"pool_slots\": 1, \"priority_weight\": 1, \"project_id\": \"acceldata-acm\", \"queue\": \"default\", \"region\": \"us-west1\", \"result_retry\": \"_MethodDefault._DEFAULT_VALUE\", \"retries\": 2, \"retry\": \"_MethodDefault._DEFAULT_VALUE\", \"retry_exponential_backoff\": false, \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_id\": \"create_batch\", \"trigger_rule\": \"all_success\", \"upstream_task_ids\": \"['join_bq_datasets.bq_join_holidays_weather_data_2020', 'join_bq_datasets.bq_join_holidays_weather_data_2021']\", \"wait_for_downstream\": false, \"weight_rule\": \"downstream\"}, \"taskInstance\": {\"pool\": \"default_pool\", \"try_number\": 1}, \"taskUuid\": \"cd1553f6-3bdb-3cae-acb8-06865df3f186\"}, \"airflow_runArgs\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"externalTrigger\": true}, \"airflow_version\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"airflowVersion\": \"2.5.3+composer\", \"openlineageAirflowVersion\": \"1.1.0\", \"operator\": \"airflow.providers.google.cloud.operators.dataproc.DataprocCreateBatchOperator\", \"taskInfo\": {\"_BaseOperator__from_mapped\": false, \"_BaseOperator__init_kwargs\": {\"batch\": {\"environment_config\": {\"execution_config\": {\"service_account\": \"***\"}}, \"pyspark_batch\": {\"args\": [], \"main_python_file_uri\": \"gs://open", - "insertId": "1yr8ptufizvx9o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:17:05.215913057Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "task-id": "create_batch", - "process": "console.py:29", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "lineagedemo/data_analytics_process.py\"}, \"runtime_config\": {\"version\": \"1.1\"}}, \"batch_id\": \"data-processing-20230913t101606\", \"email_on_failure\": false, \"email_on_retry\": false, \"project_id\": \"acceldata-acm\", \"region\": \"us-west1\", \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_id\": \"create_batch\"}, \"_BaseOperator__instantiated\": true, \"_dag\": {\"dag_id\": \"data_analytics_dag\", \"schedule_interval\": \"1 day, 0:00:00\", \"tags\": []}, \"_log\": \"\", \"asynchronous\": false, \"batch\": {\"environment_config\": {\"execution_config\": {\"service_account\": \"***\"}}, \"pyspark_batch\": {\"args\": [], \"main_python_file_uri\": \"gs://openlineagedemo/data_analytics_process.py\"}, \"runtime_config\": {\"version\": \"1.1\"}}, \"batch_id\": \"data-processing-20230913t101606\", \"dag_run\": {\"_sa_instance_state\": \"\", \"_state\": \"running\", \"conf\": {}, \"dag_hash\": \"490322026de974a9493458f2b8e6ca00\", \"dag_id\": \"data_analytics_dag\", \"data_interval_end\": \"2023-09-13T10:16:06.365428+00:00\", \"data_interval_start\": \"2023-09-12T10:16:06.365428+00:00\", \"execution_date\": \"2023-09-13T10:16:06.365428+00:00\", \"external_trigger\": true, \"id\": 741, \"last_scheduling_decision\": \"2023-09-13T10:17:01.755210+00:00\", \"log_template_id\": 2, \"queued_at\": \"2023-09-13T10:16:11.225422+00:00\", \"run_id\": \"manual__2023-09-13T10:16:06.365428+00:00\", \"run_type\": \"manual\", \"start_date\": \"2023-09-13T10:16:11.615234+00:00\", \"updated_at\": \"2023-09-13T10:17:01.762285+00:00\"}, \"deferrable\": false, \"depends_on_past\": false, \"do_xcom_push\": true, \"downstream_task_ids\": \"set()\", \"email_on_failure\": false, \"email_on_retry\": false, \"executor_config\": {}, \"gcp_conn_id\": \"google_cloud_default\", \"ignore_first_depends_on_past\": true, \"inlets\": [], \"metadata\": [], \"outlets\": [], \"owner\": \"airflow\", \"params\": \"{}\", \"polling_interval_seconds\": 5, \"pool\": \"default_pool\", \"pool_slots\": 1, \"priority_weight\": 1, \"project_id\": \"acceldata-acm\", \"queue\": \"default\", \"region\": \"us-west1\", \"result_retry\": \"_MethodDefault._DEFAULT_VALUE\", \"retries\": 2, \"retry\": \"_MethodDefault._DEFAULT_VALUE\", \"retry_delay\": \"0:05:00\", \"retry_exponential_backoff\": false, \"start_date\": \"2023-09-12T00:00:00+00:00\", \"task_group\": \"\", \"task_id\": \"create_batch\", \"trigger_rule\": \"all_success\", \"upstream_task_ids\": \"{'join_bq_datasets.bq_join_holidays_weather_data_2020', 'join_bq_datasets.bq_join_holidays_weather_data_2021'}\", \"wait_for_downstream\": false, \"weight_rule\": \"downstream\"}}, \"nominalTime\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/NominalTimeRunFacet\", \"nominalEndTime\": \"2023-09-13T10:16:06.365428Z\", \"nominalStartTime\": \"2023-09-12T10:16:06.365428Z\"}, \"parent\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ParentRunFacet\", \"job\": {\"name\": \"data_analytics_dag\", \"namespace\": \"default\"}, \"run\": {\"runId\": \"7fbd4e08-8435-3747-a708-0fa4943e905a\"}}, \"parentRun\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ParentRunFacet\", \"job\": {\"name\": \"data_analytics_dag\", \"namespace\": \"default\"}, \"run\": {\"runId\": \"7fbd4e08-8435-3747-a708-0fa4943e905a\"}}, \"processing_engine\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ProcessingEngineRunFacet\", \"name\": \"Airflow\", \"openlineageAdapterVersion\": \"1.1.0\", \"version\": \"2.5.3+composer\"}, \"unknownSourceAttribute\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubus", - "insertId": "1yr8ptufizvx9p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:17:05.215979062Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "map-index": "-1", - "task-id": "create_batch", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "ercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"unknownItems\": [{\"name\": \"DataprocCreateBatchOperator\", \"properties\": {\"_BaseOperator__from_mapped\": false, \"_BaseOperator__init_kwargs\": {\"batch\": {\"environment_config\": {\"execution_config\": {\"service_account\": \"***\"}}, \"pyspark_batch\": {\"args\": [], \"main_python_file_uri\": \"gs://openlineagedemo/data_analytics_process.py\"}, \"runtime_config\": {\"version\": \"1.1\"}}, \"batch_id\": \"data-processing-20230913t101606\", \"email_on_failure\": false, \"email_on_retry\": false, \"project_id\": \"acceldata-acm\", \"region\": \"us-west1\", \"start_date\": \"<>\", \"task_id\": \"create_batch\"}, \"_BaseOperator__instantiated\": true, \"_dag\": \"<>\", \"_log\": \"<>\", \"asynchronous\": false, \"batch\": {\"environment_config\": {\"execution_config\": {\"service_account\": \"***\"}}, \"pyspark_batch\": {\"args\": [], \"main_python_file_uri\": \"gs://openlineagedemo/data_analytics_process.py\"}, \"runtime_config\": {\"version\": \"1.1\"}}, \"batch_id\": \"data-processing-20230913t101606\", \"deferrable\": false, \"depends_on_past\": false, \"do_xcom_push\": true, \"downstream_task_ids\": [], \"email_on_failure\": false, \"email_on_retry\": false, \"executor_config\": {}, \"gcp_conn_id\": \"google_cloud_default\", \"ignore_first_depends_on_past\": true, \"inlets\": [], \"metadata\": [], \"outlets\": [], \"owner\": \"airflow\", \"params\": \"<>\", \"polling_interval_seconds\": 5, \"pool\": \"default_pool\", \"pool_slots\": 1, \"priority_weight\": 1, \"project_id\": \"acceldata-acm\", \"queue\": \"default\", \"region\": \"us-west1\", \"result_retry\": \"<>\", \"retries\": 2, \"retry\": \"<>\", \"retry_delay\": \"<>\", \"retry_exponential_backoff\": false, \"start_date\": \"<>\", \"task_group\": \"<>\", \"task_id\": \"create_batch\", \"trigger_rule\": \"all_success\", \"upstream_task_ids\": [], \"wait_for_downstream\": false, \"weight_rule\": \"downstream\"}, \"type\": \"operator\"}]}}, \"runId\": \"cd1553f6-3bdb-3cae-acb8-06865df3f186\"}, \"schemaURL\": \"https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent\"}", - "insertId": "1yr8ptufizvx9q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:17:05.216001055Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g", - "workflow": "data_analytics_dag", - "task-id": "create_batch", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Executing on 2023-09-13 10:16:06.365428+00:00", - "insertId": "1yr8ptufizvx9r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:17:05.224804656Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "try-number": "1", - "map-index": "-1", - "task-id": "create_batch", - "process": "taskinstance.py:1310", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'create_batch', 'manual__2023-09-13T10:16:06.365428+00:00', '--job-id', '974', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpuaun2uht']", - "insertId": "1yr8ptufizvx9s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:17:05.317014714Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1", - "task-id": "create_batch", - "process": "standard_task_runner.py:82", - "execution-date": "2023-09-13T10:16:06.365428+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Job 974: Subtask create_batch", - "insertId": "1yr8ptufizvx9t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:17:05.318148672Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "task-id": "create_batch", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "process": "standard_task_runner.py:83", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Started process 295 to run task", - "insertId": "1yr8ptufizvx9u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:17:05.321565126Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "create_batch", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "process": "standard_task_runner.py:55", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Running on host airflow-worker-8z65g", - "insertId": "1yr8ptufizvx9v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:17:06.059408491Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-8z65g", - "process": "task_command.py:393", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=create_batch\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T10:16:06.365428+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=manual__2023-09-13T10:16:06.365428+00:00", - "insertId": "1yr8ptufizvx9w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:17:07.223246379Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "create_batch", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "taskinstance.py:1518", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "1yr8ptufizvx9x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:17:07.318441800Z", - "severity": "INFO", - "labels": { - "process": "base.py:73", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Creating batch data-processing-20230913t101606", - "insertId": "1yr8ptufizvx9y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:17:07.321044550Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "try-number": "1", - "task-id": "create_batch", - "map-index": "-1", - "process": "dataproc.py:2349", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Once started, the batch job will be available at https://console.cloud.google.com/dataproc/batches/us-west1/data-processing-20230913t101606/monitoring?project=acceldata-acm", - "insertId": "1yr8ptufizvx9z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:17:07.321908654Z", - "severity": "INFO", - "labels": { - "process": "dataproc.py:2350", - "map-index": "-1", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "workflow": "data_analytics_dag", - "task-id": "create_batch" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "1yr8ptufizvxa0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:17:07.323122187Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "credentials_provider.py:353", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "task-id": "create_batch", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:08.985449611Z" - }, - { - "textPayload": "Task failed with exception\nTraceback (most recent call last):\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/dataproc.py\", line 261, in wait_for_operation\n return operation.result(timeout=timeout, retry=result_retry)\n File \"/opt/python3.8/lib/python3.8/site-packages/google/api_core/future/polling.py\", line 261, in result\n raise self._exception\ngoogle.api_core.exceptions.Aborted: 409 Constraint constraints/compute.requireOsLogin violated for project 910708407740.\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2371, in execute\n result = hook.wait_for_operation(\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/dataproc.py\", line 264, in wait_for_operation\n raise AirflowException(error)\nairflow.exceptions.AirflowException: 409 Constraint constraints/compute.requireOsLogin violated for project 910708407740.", - "insertId": "1plm3bsfeojm8s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:17:21.011158030Z", - "severity": "ERROR", - "labels": { - "try-number": "1", - "workflow": "data_analytics_dag", - "task-id": "create_batch", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "process": "taskinstance.py:1778", - "execution-date": "2023-09-13T10:16:06.365428+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:27.108445457Z" - }, - { - "textPayload": "Marking task as UP_FOR_RETRY. dag_id=data_analytics_dag, task_id=create_batch, execution_date=20230913T101606, start_date=20230913T101703, end_date=20230913T101721", - "insertId": "1plm3bsfeojm8t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:17:21.024110112Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "worker_id": "airflow-worker-8z65g", - "process": "taskinstance.py:1328", - "workflow": "data_analytics_dag", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T10:16:06.365428+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:27.108445457Z" - }, - { - "textPayload": "Failed to execute job 974 for task create_batch (409 Constraint constraints/compute.requireOsLogin violated for project 910708407740.; 295)", - "insertId": "1plm3bsfeojm8u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:17:21.116322493Z", - "severity": "ERROR", - "labels": { - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "task-id": "create_batch", - "process": "standard_task_runner.py:100", - "try-number": "1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:27.108445457Z" - }, - { - "textPayload": "Task exited with return code 1", - "insertId": "1plm3bsfeojm8v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:17:21.439977610Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag", - "process": "local_task_job.py:212", - "task-id": "create_batch", - "execution-date": "2023-09-13T10:16:06.365428+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:27.108445457Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1plm3bsfeojm8w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:17:21.623065740Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:2599", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:16:06.365428+00:00", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:27.108445457Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[7616318c-7a02-4510-8ae0-e602dacbaafc] succeeded in 30.395540201017866s: None", - "insertId": "1plm3bsfeojm8x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:17:21.912667322Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:27.108445457Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1plm3bsfeojm8y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:17:23.242794274Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:17:27.108445457Z" - }, - { - "textPayload": "I0913 10:17:47.214003 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "yerjkdf1ildxv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:17:47.214248606Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:17:52.787251252Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[b4ca9bd4-ae8f-4e22-bd93-252349ae1755] received", - "insertId": "1v5cwcrfiwsiz5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:20:00.455854943Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:01.831736777Z" - }, - { - "textPayload": "[b4ca9bd4-ae8f-4e22-bd93-252349ae1755] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T10:10:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "1v5cwcrfiwsiz6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:20:00.461893220Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:01.831736777Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "148tgddfp0cwt1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:20:00.814506151Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "148tgddfp0cwt2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:20:00.816903948Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "148tgddfp0cwt3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:20:00.936413208Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "148tgddfp0cwt4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:20:01.815058981Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Running on host airflow-worker-8z65g", - "insertId": "148tgddfp0cwt5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:20:02.336818455Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "148tgddfp0cwt6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:20:02.474614440Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:10:00+00:00", - "worker_id": "airflow-worker-8z65g", - "process": "taskinstance.py:1091", - "task-id": "echo", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "148tgddfp0cwt7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:20:02.514610484Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:10:00+00:00", - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:1091", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "148tgddfp0cwt8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:20:02.515887484Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1289", - "execution-date": "2023-09-13T10:10:00+00:00", - "workflow": "airflow_monitoring", - "task-id": "echo", - "worker_id": "airflow-worker-8z65g", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "148tgddfp0cwt9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:20:02.517024148Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T10:10:00+00:00", - "worker_id": "airflow-worker-8z65g", - "process": "taskinstance.py:1290", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "148tgddfp0cwta", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:20:02.518172527Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:10:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-8z65g", - "process": "taskinstance.py:1291", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "148tgddfp0cwtb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:20:02.809266115Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "148tgddfp0cwtc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:20:02.809313844Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "148tgddfp0cwtd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:20:02.834659058Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "148tgddfp0cwte", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:20:02.834704501Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Did not find openlineage.yml and OPENLINEAGE_URL is not set", - "insertId": "148tgddfp0cwtf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:20:02.846714452Z", - "severity": "ERROR", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T10:10:00+00:00", - "worker_id": "airflow-worker-8z65g", - "task-id": "echo", - "process": "factory.py:78", - "map-index": "-1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Couldn't initialize transport; will print events to console.", - "insertId": "148tgddfp0cwtg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:20:02.847985537Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8z65g", - "workflow": "airflow_monitoring", - "try-number": "1", - "execution-date": "2023-09-13T10:10:00+00:00", - "process": "factory.py:37", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "{\"eventTime\": \"2023-09-13T10:20:02.473968Z\", \"eventType\": \"START\", \"inputs\": [], \"job\": {\"facets\": {\"documentation\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/DocumentationJobFacet\", \"description\": \"liveness monitoring dag\"}, \"ownership\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/OwnershipJobFacet\", \"owners\": [{\"name\": \"airflow\"}]}}, \"name\": \"airflow_monitoring.echo\", \"namespace\": \"default\"}, \"outputs\": [], \"producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"run\": {\"facets\": {\"airflow\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"dag\": {\"dag_id\": \"airflow_monitoring\", \"schedule_interval\": \"*/10 * * * *\", \"tags\": \"[]\", \"timetable\": {\"expression\": \"*/10 * * * *\", \"timezone\": \"UTC\"}}, \"dagRun\": {\"conf\": {}, \"dag_id\": \"airflow_monitoring\", \"data_interval_end\": \"2023-09-13T10:20:00+00:00\", \"data_interval_start\": \"2023-09-13T10:10:00+00:00\", \"external_trigger\": false, \"run_id\": \"scheduled__2023-09-13T10:10:00+00:00\", \"run_type\": \"scheduled\", \"start_date\": \"2023-09-13T10:20:00.328512+00:00\"}, \"task\": {\"append_env\": false, \"args\": {\"bash_command\": \"echo test\", \"dag\": \"\", \"depends_on_past\": false, \"do_xcom_push\": false, \"priority_weight\": 2147483647, \"retries\": 1, \"retry_delay\": \"0:05:00\", \"start_date\": \"2023-09-13T00:00:00+00:00\", \"task_id\": \"echo\"}, \"bash_command\": \"echo test\", \"depends_on_past\": false, \"do_xcom_push\": false, \"downstream_task_ids\": \"[]\", \"email_on_failure\": true, \"email_on_retry\": true, \"executor_config\": {}, \"ignore_first_depends_on_past\": true, \"inlets\": \"[]\", \"mapped\": false, \"operator_class\": \"airflow.operators.bash.BashOperator\", \"outlets\": \"[]\", \"output_encoding\": \"utf-8\", \"owner\": \"airflow\", \"pool\": \"default_pool\", \"pool_slots\": 1, \"priority_weight\": 2147483647, \"queue\": \"default\", \"retries\": 1, \"retry_exponential_backoff\": false, \"skip_exit_code\": 99, \"start_date\": \"2023-09-13T00:00:00+00:00\", \"task_id\": \"echo\", \"trigger_rule\": \"all_success\", \"upstream_task_ids\": \"[]\", \"wait_for_downstream\": false, \"weight_rule\": \"downstream\"}, \"taskInstance\": {\"pool\": \"default_pool\", \"try_number\": 1}, \"taskUuid\": \"7930aaee-6d65-3f4c-80b0-9c9e490cc8fc\"}, \"airflow_runArgs\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"externalTrigger\": false}, \"airflow_version\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"airflowVersion\": \"2.5.3+composer\", \"openlineageAirflowVersion\": \"1.1.0\", \"operator\": \"airflow.operators.bash.BashOperator\", \"taskInfo\": {\"_BaseOperator__from_mapped\": false, \"_BaseOperator__init_kwargs\": {\"bash_command\": \"echo test\", \"dag\": {\"dag_id\": \"airflow_monitoring\", \"schedule_interval\": \"*/10 * * * *\", \"tags\": []}, \"depends_on_past\": false, \"do_xcom_push\": false, \"priority_weight\": 2147483647, \"retries\": 1, \"retry_delay\": \"0:05:00\", \"start_date\": \"2023-09-13T00:00:00+00:00\", \"task_id\": \"echo\"}, \"_BaseOperator__instantiated\": true, \"_dag\": {\"dag_id\": \"airflow_monitoring\", \"schedule_interval\": \"*/10 * * * *\", \"tags\": []}, \"_log\": \"\", \"append_env\": false, \"bash_command\": \"echo test\", \"dag_run\": {\"_sa_instance_state\": \"\", \"_state\": \"running\", \"conf\": {}, \"creating_job_id\": 970, \"dag_h", - "insertId": "148tgddfp0cwth", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:20:02.852189656Z", - "severity": "INFO", - "labels": { - "process": "console.py:29", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T10:10:00+00:00", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "ash\": \"e27cd9350a0188d627f0bef9212e8c88\", \"dag_id\": \"airflow_monitoring\", \"data_interval_end\": \"2023-09-13T10:20:00+00:00\", \"data_interval_start\": \"2023-09-13T10:10:00+00:00\", \"execution_date\": \"2023-09-13T10:10:00+00:00\", \"external_trigger\": false, \"id\": 742, \"last_scheduling_decision\": \"2023-09-13T10:20:01.676453+00:00\", \"log_template_id\": 2, \"queued_at\": \"2023-09-13T10:20:00.265711+00:00\", \"run_id\": \"scheduled__2023-09-13T10:10:00+00:00\", \"run_type\": \"scheduled\", \"start_date\": \"2023-09-13T10:20:00.328512+00:00\", \"updated_at\": \"2023-09-13T10:20:01.682513+00:00\"}, \"depends_on_past\": false, \"do_xcom_push\": false, \"downstream_task_ids\": \"set()\", \"email_on_failure\": true, \"email_on_retry\": true, \"executor_config\": {}, \"ignore_first_depends_on_past\": true, \"inlets\": [], \"outlets\": [], \"output_encoding\": \"utf-8\", \"owner\": \"airflow\", \"params\": \"{}\", \"pool\": \"default_pool\", \"pool_slots\": 1, \"priority_weight\": 2147483647, \"queue\": \"default\", \"retries\": 1, \"retry_delay\": \"0:05:00\", \"retry_exponential_backoff\": false, \"skip_exit_code\": 99, \"start_date\": \"2023-09-13T00:00:00+00:00\", \"task_group\": \"\", \"task_id\": \"echo\", \"trigger_rule\": \"all_success\", \"upstream_task_ids\": \"set()\", \"wait_for_downstream\": false, \"weight_rule\": \"downstream\"}}, \"nominalTime\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/NominalTimeRunFacet\", \"nominalEndTime\": \"2023-09-13T10:20:00.000000Z\", \"nominalStartTime\": \"2023-09-13T10:10:00.000000Z\"}, \"parent\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ParentRunFacet\", \"job\": {\"name\": \"airflow_monitoring\", \"namespace\": \"default\"}, \"run\": {\"runId\": \"964e15e8-00d5-3176-a19c-b7e86f7e7cc3\"}}, \"parentRun\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ParentRunFacet\", \"job\": {\"name\": \"airflow_monitoring\", \"namespace\": \"default\"}, \"run\": {\"runId\": \"964e15e8-00d5-3176-a19c-b7e86f7e7cc3\"}}, \"processing_engine\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/ProcessingEngineRunFacet\", \"name\": \"Airflow\", \"openlineageAdapterVersion\": \"1.1.0\", \"version\": \"2.5.3+composer\"}, \"unknownSourceAttribute\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"unknownItems\": [{\"name\": \"BashOperator\", \"properties\": {\"_BaseOperator__from_mapped\": false, \"_BaseOperator__init_kwargs\": {\"bash_command\": \"echo test\", \"dag\": \"<>\", \"depends_on_past\": false, \"do_xcom_push\": false, \"priority_weight\": 2147483647, \"retries\": 1, \"retry_delay\": \"<>\", \"start_date\": \"<>\", \"task_id\": \"echo\"}, \"_BaseOperator__instantiated\": true, \"_dag\": \"<>\", \"_log\": \"<>\", \"append_env\": false, \"bash_command\": \"echo test\", \"depends_on_past\": false, \"do_xcom_push\": false, \"downstream_task_ids\": [], \"email_on_failure\": true, \"email_on_retry\": true, \"executor_config\": {}, \"ignore_first_depends_on_past\": true, \"inlets\": [], \"outlets\": [], \"output_encoding\": \"utf-8\", \"owner\": \"airflow\", \"params\": \"<>\", \"pool\": \"default_pool\", \"pool_slots\": 1, \"priority_weight\": 2147483647, \"queue\": \"default\", \"retries\": 1, \"retry_delay\": \"<>\", \"retry_exponential_backoff\": false, \"skip_exit_code\": 99, \"start_date\": \"<>\", \"task_group\": \"<", - "insertId": "148tgddfp0cwti", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:20:02.852294610Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:10:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-8z65g", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": ">\", \"task_id\": \"echo\", \"trigger_rule\": \"all_success\", \"upstream_task_ids\": [], \"wait_for_downstream\": false, \"weight_rule\": \"downstream\"}, \"type\": \"operator\"}]}}, \"runId\": \"7930aaee-6d65-3f4c-80b0-9c9e490cc8fc\"}, \"schemaURL\": \"https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent\"}", - "insertId": "148tgddfp0cwtj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:20:02.852317435Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T10:10:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Executing on 2023-09-13 10:10:00+00:00", - "insertId": "148tgddfp0cwtk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:20:02.859569267Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "task-id": "echo", - "execution-date": "2023-09-13T10:10:00+00:00", - "process": "taskinstance.py:1310", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Started process 386 to run task", - "insertId": "148tgddfp0cwtl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:20:02.909854197Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:55", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T10:10:00+00:00", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T10:10:00+00:00', '--job-id', '975', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp4xb9cmjn']", - "insertId": "148tgddfp0cwtm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:20:02.914396655Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:82", - "task-id": "echo", - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:10:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Job 975: Subtask echo", - "insertId": "148tgddfp0cwtn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:20:02.914812291Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:83", - "task-id": "echo", - "map-index": "-1", - "execution-date": "2023-09-13T10:10:00+00:00", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Running on host airflow-worker-8z65g", - "insertId": "148tgddfp0cwto", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:20:03.280098489Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "task_command.py:393", - "execution-date": "2023-09-13T10:10:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T10:10:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T10:10:00+00:00", - "insertId": "148tgddfp0cwtp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:20:03.727146605Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:10:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "process": "taskinstance.py:1518", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "148tgddfp0cwtq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:20:03.730452629Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "process": "subprocess.py:63", - "execution-date": "2023-09-13T10:10:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "148tgddfp0cwtr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:20:03.732714169Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T10:10:00+00:00", - "process": "subprocess.py:75", - "task-id": "echo", - "worker_id": "airflow-worker-8z65g", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Output:", - "insertId": "148tgddfp0cwts", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:20:03.873718471Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:10:00+00:00", - "task-id": "echo", - "process": "subprocess.py:86" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "test", - "insertId": "148tgddfp0cwtt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:20:03.880746144Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "map-index": "-1", - "execution-date": "2023-09-13T10:10:00+00:00", - "process": "subprocess.py:93", - "worker_id": "airflow-worker-8z65g", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "148tgddfp0cwtu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:20:03.881678694Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1", - "process": "subprocess.py:97", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:10:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T101000, start_date=20230913T102002, end_date=20230913T102003", - "insertId": "148tgddfp0cwtv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:20:03.924665667Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-8z65g", - "execution-date": "2023-09-13T10:10:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1", - "process": "taskinstance.py:1328" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "{\"eventTime\": \"2023-09-13T10:20:03.923929Z\", \"eventType\": \"COMPLETE\", \"inputs\": [], \"job\": {\"facets\": {}, \"name\": \"airflow_monitoring.echo\", \"namespace\": \"default\"}, \"outputs\": [], \"producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"run\": {\"facets\": {\"unknownSourceAttribute\": {\"_producer\": \"https://github.com/OpenLineage/OpenLineage/tree/1.1.0/integration/airflow\", \"_schemaURL\": \"https://raw.githubusercontent.com/OpenLineage/OpenLineage/main/spec/OpenLineage.json#/definitions/BaseFacet\", \"unknownItems\": [{\"name\": \"BashOperator\", \"properties\": {\"_BaseOperator__from_mapped\": false, \"_BaseOperator__init_kwargs\": {\"bash_command\": \"echo test\", \"dag\": \"<>\", \"depends_on_past\": false, \"do_xcom_push\": false, \"priority_weight\": 2147483647, \"retries\": 1, \"retry_delay\": \"<>\", \"start_date\": \"<>\", \"task_id\": \"echo\"}, \"_BaseOperator__instantiated\": true, \"_dag\": \"<>\", \"_log\": \"<>\", \"append_env\": false, \"bash_command\": \"echo test\", \"depends_on_past\": false, \"do_xcom_push\": false, \"downstream_task_ids\": [], \"email_on_failure\": true, \"email_on_retry\": true, \"executor_config\": {}, \"ignore_first_depends_on_past\": true, \"inlets\": [], \"outlets\": [], \"output_encoding\": \"utf-8\", \"owner\": \"airflow\", \"params\": \"<>\", \"pool\": \"default_pool\", \"pool_slots\": 1, \"priority_weight\": 2147483647, \"queue\": \"default\", \"retries\": 1, \"retry_delay\": \"<>\", \"retry_exponential_backoff\": false, \"skip_exit_code\": 99, \"start_date\": \"<>\", \"task_group\": \"<>\", \"task_id\": \"echo\", \"trigger_rule\": \"all_success\", \"upstream_task_ids\": [], \"wait_for_downstream\": false, \"weight_rule\": \"downstream\"}, \"type\": \"operator\"}]}}, \"runId\": \"7930aaee-6d65-3f4c-80b0-9c9e490cc8fc\"}, \"schemaURL\": \"https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent\"}", - "insertId": "148tgddfp0cwtw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:20:03.949449Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:10:00+00:00", - "process": "console.py:29", - "task-id": "echo", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "148tgddfp0cwtx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:20:04.130854957Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-8z65g", - "task-id": "echo", - "execution-date": "2023-09-13T10:10:00+00:00", - "process": "local_task_job.py:212", - "map-index": "-1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "148tgddfp0cwty", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:20:04.200650475Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-8z65g", - "process": "taskinstance.py:2599", - "execution-date": "2023-09-13T10:10:00+00:00", - "try-number": "1", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[b4ca9bd4-ae8f-4e22-bd93-252349ae1755] succeeded in 3.899365258985199s: None", - "insertId": "148tgddfp0cwtz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:20:04.359121322Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:20:06.849619947Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "hheh34flvd821", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:22:30.610472980Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:22:35.929937115Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "byy986f9kmyal", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:27:26.620165949Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:27:28.142288201Z" - }, - { - "textPayload": "I0913 10:29:00.040406 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1d4gjkaflven7x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:29:00.040691763Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:29:06.187773795Z" - }, - { - "textPayload": "I0913 10:29:00.041978 1 airflowworkerset_controller.go:268] \"controllers/AirflowWorkerSet: Worker uses old template. Recreating.\" worker name=\"airflow-worker-8z65g\"", - "insertId": "1d4gjkaflven7y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:29:00.042264226Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:29:06.187773795Z" - }, - { - "textPayload": "I0913 10:29:00.075244 1 airflowworkerset_controller.go:77] \"controllers/AirflowWorkerSet: Template changed, workers recreated.\"", - "insertId": "1d4gjkaflven7z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:29:00.075541444Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:29:06.187773795Z" - }, - { - "textPayload": "I0913 10:29:00.075410 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1d4gjkaflven80", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:29:00.075640142Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:29:06.187773795Z" - }, - { - "textPayload": "", - "insertId": "1pbw1vjfiasntr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:29:00.095763273Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:29:06.687568056Z" - }, - { - "textPayload": "worker: Warm shutdown (MainProcess)", - "insertId": "1pbw1vjfiasnts", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:29:00.095833245Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:29:06.687568056Z" - }, - { - "textPayload": "Caught SIGTERM signal!", - "insertId": "1pbw1vjfiasntt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:29:00.095858751Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:29:06.687568056Z" - }, - { - "textPayload": "Passing SIGTERM to Airflow process.", - "insertId": "1pbw1vjfiasntu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:29:00.095865870Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:29:06.687568056Z" - }, - { - "textPayload": "I0913 10:29:00.121276 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1d4gjkaflven81", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:29:00.121512828Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:29:06.187773795Z" - }, - { - "textPayload": "Exiting due to SIGTERM.", - "insertId": "1pbw1vjfiasntv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:29:03.731506890Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8z65g" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:29:06.687568056Z" - }, - { - "textPayload": "I0913 10:29:04.457522 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1d4gjkaflven82", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:29:04.457809816Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:29:06.187773795Z" - }, - { - "textPayload": "I0913 10:29:04.459314 1 airflowworkerset_controller.go:97] \"controllers/AirflowWorkerSet: Workers scale up needed.\" current number of workers=0 desired=1 scaling up by=1", - "insertId": "1d4gjkaflven83", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:29:04.459534307Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:29:06.187773795Z" - }, - { - "textPayload": "I0913 10:29:04.849953 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1d4gjkaflven84", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:29:04.850234809Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:29:06.187773795Z" - }, - { - "textPayload": "I0913 10:29:04.892603 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1d4gjkaflven85", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:29:04.892805928Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:29:06.187773795Z" - }, - { - "textPayload": "I0913 10:29:04.910820 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1d4gjkaflven86", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:29:04.911090260Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:29:06.187773795Z" - }, - { - "textPayload": "I0913 10:29:04.960884 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1d4gjkaflven87", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:29:04.962511610Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:29:06.187773795Z" - }, - { - "textPayload": "I0913 10:29:04.982397 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1d4gjkaflven88", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:29:04.983836861Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:29:06.187773795Z" - }, - { - "textPayload": "Starting the process, got command: worker", - "insertId": "1c01xt4f6mo63r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:29:06.144013383Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "Initializing airflow.cfg.", - "insertId": "1c01xt4f6mo63s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:29:06.149306210Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "airflow.cfg initialization is done.", - "insertId": "1c01xt4f6mo63t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:29:06.161628236Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "I0913 10:29:06.918121 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "sl63n5fis94j2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:29:06.918349381Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:29:12.323421340Z" - }, - { - "textPayload": "I0913 10:29:06.954298 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "sl63n5fis94j3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:29:06.954526413Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:29:12.323421340Z" - }, - { - "textPayload": "Setupping GCS Fuse.", - "insertId": "1c01xt4f6mo63u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:29:13.843835229Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "gcsfuse mount seems ready, proceeding.", - "insertId": "1c01xt4f6mo63v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:29:13.844625183Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "Initializing kube_config.", - "insertId": "1c01xt4f6mo63w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:29:13.859827717Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "I0913 10:29:17.515839 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1rjug2rfic1n37", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:29:17.516127858Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:29:24.382767125Z" - }, - { - "textPayload": "Fetching cluster endpoint and auth data.", - "insertId": "1c01xt4f6mo63x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:29:21.007911862Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "kubeconfig entry generated for us-west1-openlineage-1614b57c-gke.", - "insertId": "1c01xt4f6mo63y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:29:21.216911734Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "/home/airflow/composer_kube_config is initialized", - "insertId": "1c01xt4f6mo63z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:29:26.444472548Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "Waiting for dags and plugins synchronization.", - "insertId": "1c01xt4f6mo640", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:29:26.445563394Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1c01xt4f6mo641", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:29:26.446940619Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "Searching for recent worker pod evictions", - "insertId": "1c01xt4f6mo642", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:29:26.454328929Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1c01xt4f6mo643", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:29:31.505530778Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "Finished searching for recent worker pod evictions", - "insertId": "1c01xt4f6mo644", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:29:33.154260636Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1c01xt4f6mo645", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:29:36.517445031Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1c01xt4f6mo646", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:29:41.523934889Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1c01xt4f6mo647", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:29:46.530484644Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1c01xt4f6mo648", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:29:51.537424805Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1c01xt4f6mo649", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:29:56.543844538Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1c01xt4f6mo64a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:30:01.549081447Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:06.238900539Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "15wv3f1fiysxrv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:30:06.555199330Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:11.366843608Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "134msadflw8lig", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:30:11.562602697Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:16.469019119Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "uj4u1jfizegl0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:30:16.569297722Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:21.605187143Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1t7u2cgfp7ia6a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:30:21.576260093Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:26.684834736Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1lzxolqfigh5mw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:30:26.584493071Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:31.764379826Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1mu6wydfp5x0gp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:30:31.590842788Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:36.831374962Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1h9zaycfixift0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:30:36.597495030Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:41.834003954Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1thyen4fp6sz67", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:30:41.607544187Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:46.832825144Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1h0cq1jfic86dg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:30:46.620456383Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:51.856219411Z" - }, - { - "textPayload": "Dags and plugins are synced", - "insertId": "l3b1vmf4u4ws2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:30:51.626695938Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:56.958589896Z" - }, - { - "textPayload": "Starting Airflow Celery Flower API.", - "insertId": "l3b1vmf4u4ws3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:30:51.628051286Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:30:56.958589896Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "6xx100fizkq1n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:13.217435266Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:18.336579864Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "6xx100fizkq1o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:13.819799223Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:18.336579864Z" - }, - { - "textPayload": " ", - "insertId": "1trv21yf6pvgeq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:28.118085614Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": " -------------- celery@airflow-worker-8gz7j v5.2.7 (dawn-chorus)", - "insertId": "1trv21yf6pvger", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:28.118110701Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "1trv21yf6pvges", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:28.118118573Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": "-- ******* ---- Linux-5.15.109+-x86_64-with-glibc2.27 2023-09-13 10:31:27", - "insertId": "1trv21yf6pvget", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:31:28.118125579Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": "- *** --- * --- ", - "insertId": "1trv21yf6pvgeu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:28.118131347Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": "- ** ---------- [config]", - "insertId": "1trv21yf6pvgev", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:31:28.118137980Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": "- ** ---------- .> app: airflow.executors.celery_executor:0x7d4b367a3370", - "insertId": "1trv21yf6pvgew", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:31:28.118144107Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": "- ** ---------- .> transport: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "1trv21yf6pvgex", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:28.118150895Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": "- ** ---------- .> results: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "1trv21yf6pvgey", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:28.118172999Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": "- *** --- * --- .> concurrency: 6 (prefork)", - "insertId": "1trv21yf6pvgez", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:28.118179667Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": "-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)", - "insertId": "1trv21yf6pvgf0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:31:28.118185230Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "1trv21yf6pvgf1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:28.118190962Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": " -------------- [queues]", - "insertId": "1trv21yf6pvgf2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:28.118196213Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": " .> default exchange=default(direct) key=default", - "insertId": "1trv21yf6pvgf3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:28.118201369Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": " ", - "insertId": "1trv21yf6pvgf4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:28.118206535Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": "", - "insertId": "1trv21yf6pvgf5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:28.118224517Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": "[tasks]", - "insertId": "1trv21yf6pvgf6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:28.118232695Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": " . airflow.executors.celery_executor.execute_command", - "insertId": "1trv21yf6pvgf7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:31:28.118239193Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": "", - "insertId": "1trv21yf6pvgf8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:28.118244519Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:33.497297963Z" - }, - { - "textPayload": "Connected to redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "jpflkkf8kf8a1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:33.626416237Z", - "severity": "INFO", - "labels": { - "process": "connection.py:22", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "mingle: searching for neighbors", - "insertId": "jpflkkf8kf8a2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:33.723263701Z", - "severity": "INFO", - "labels": { - "process": "mingle.py:40", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "mingle: all alone", - "insertId": "jpflkkf8kf8a3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:34.745071056Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "process": "mingle.py:49" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "celery@airflow-worker-8gz7j ready.", - "insertId": "jpflkkf8kf8a4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:34.797672521Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "process": "worker.py:176" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[b091b9bf-5f08-4ea1-a211-b7ac34d41b04] received", - "insertId": "jpflkkf8kf8a5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:34.803705820Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "[b091b9bf-5f08-4ea1-a211-b7ac34d41b04] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T10:20:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "jpflkkf8kf8a6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:34.819714978Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "jpflkkf8kf8a7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:35.145631362Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "jpflkkf8kf8a8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:35.147442448Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "jpflkkf8kf8a9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:31:35.330752428Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "jpflkkf8kf8aa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:36.238134130Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "Running on host airflow-worker-8gz7j", - "insertId": "jpflkkf8kf8ab", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:36.810936832Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "jpflkkf8kf8ac", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:31:36.942114808Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-8gz7j", - "try-number": "1", - "execution-date": "2023-09-13T10:20:00+00:00", - "task-id": "echo", - "process": "taskinstance.py:1091", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "jpflkkf8kf8ad", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:36.959827276Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:20:00+00:00", - "task-id": "echo", - "map-index": "-1", - "worker_id": "airflow-worker-8gz7j", - "process": "taskinstance.py:1091", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "jpflkkf8kf8ae", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:31:36.960450121Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1289", - "try-number": "1", - "execution-date": "2023-09-13T10:20:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-8gz7j", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "jpflkkf8kf8af", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:36.960997937Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:20:00+00:00", - "task-id": "echo", - "process": "taskinstance.py:1290", - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "jpflkkf8kf8ag", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:36.961520109Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "process": "taskinstance.py:1291", - "execution-date": "2023-09-13T10:20:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "jpflkkf8kf8ah", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:37.235916549Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "jpflkkf8kf8ai", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:37.235963796Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "jpflkkf8kf8aj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:37.254896480Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "jpflkkf8kf8ak", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:31:37.254941009Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "Events of group {task} enabled by remote.", - "insertId": "jpflkkf8kf8al", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:31:37.528027452Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "process": "control.py:277" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:38.585926381Z" - }, - { - "textPayload": "Executing on 2023-09-13 10:20:00+00:00", - "insertId": "1u1ifs3f6q3e5e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:38.231794813Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "execution-date": "2023-09-13T10:20:00+00:00", - "process": "taskinstance.py:1310", - "worker_id": "airflow-worker-8gz7j", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:43.702373031Z" - }, - { - "textPayload": "Started process 182 to run task", - "insertId": "1u1ifs3f6q3e5f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:38.307811927Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:55", - "try-number": "1", - "execution-date": "2023-09-13T10:20:00+00:00", - "worker_id": "airflow-worker-8gz7j", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:43.702373031Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T10:20:00+00:00', '--job-id', '978', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp8oxufftm']", - "insertId": "1u1ifs3f6q3e5g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:38.307845304Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:20:00+00:00", - "worker_id": "airflow-worker-8gz7j", - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:82" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:43.702373031Z" - }, - { - "textPayload": "Job 978: Subtask echo", - "insertId": "1u1ifs3f6q3e5h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:31:38.307893380Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-8gz7j", - "task-id": "echo", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:20:00+00:00", - "process": "standard_task_runner.py:83", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:43.702373031Z" - }, - { - "textPayload": "Running on host airflow-worker-8gz7j", - "insertId": "1u1ifs3f6q3e5i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:38.699340378Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "workflow": "airflow_monitoring", - "process": "task_command.py:393", - "execution-date": "2023-09-13T10:20:00+00:00", - "map-index": "-1", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:43.702373031Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T10:20:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T10:20:00+00:00", - "insertId": "1u1ifs3f6q3e5j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:38.885689203Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-8gz7j", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1518", - "execution-date": "2023-09-13T10:20:00+00:00", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:43.702373031Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1u1ifs3f6q3e5k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:38.887582172Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-8gz7j", - "process": "subprocess.py:63", - "execution-date": "2023-09-13T10:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:43.702373031Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1u1ifs3f6q3e5l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:38.889354174Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:20:00+00:00", - "process": "subprocess.py:75", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-8gz7j", - "map-index": "-1", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:43.702373031Z" - }, - { - "textPayload": "Output:", - "insertId": "1u1ifs3f6q3e5m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:39.019595254Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:20:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-8gz7j", - "task-id": "echo", - "process": "subprocess.py:86", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:43.702373031Z" - }, - { - "textPayload": "test", - "insertId": "1u1ifs3f6q3e5n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:39.026071378Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "subprocess.py:93", - "task-id": "echo", - "execution-date": "2023-09-13T10:20:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:43.702373031Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1u1ifs3f6q3e5o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:39.027497713Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "process": "subprocess.py:97", - "execution-date": "2023-09-13T10:20:00+00:00", - "worker_id": "airflow-worker-8gz7j", - "map-index": "-1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:43.702373031Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T102000, start_date=20230913T103136, end_date=20230913T103139", - "insertId": "1u1ifs3f6q3e5p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:31:39.068590273Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T10:20:00+00:00", - "worker_id": "airflow-worker-8gz7j", - "task-id": "echo", - "process": "taskinstance.py:1328", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:43.702373031Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1u1ifs3f6q3e5q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:39.976977177Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "map-index": "-1", - "worker_id": "airflow-worker-8gz7j", - "execution-date": "2023-09-13T10:20:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "local_task_job.py:212" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:43.702373031Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1u1ifs3f6q3e5r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:31:40.052621091Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T10:20:00+00:00", - "map-index": "-1", - "process": "taskinstance.py:2599", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:43.702373031Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[b091b9bf-5f08-4ea1-a211-b7ac34d41b04] succeeded in 5.418022138997912s: None", - "insertId": "1u1ifs3f6q3e5s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:31:40.225766607Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:31:43.702373031Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "h7t0k0fj380by", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:36:00.519267585Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:36:03.649772031Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[398edb52-3adf-4345-95d8-63435549d738] received", - "insertId": "oz3blsf6qed33", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:40:02.219664538Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "[398edb52-3adf-4345-95d8-63435549d738] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T10:30:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "oz3blsf6qed34", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:40:02.224927645Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "oz3blsf6qed35", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:40:02.722069546Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "oz3blsf6qed36", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:40:02.724395137Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "oz3blsf6qed37", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:40:02.838424569Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "oz3blsf6qed38", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:40:03.716420202Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Running on host airflow-worker-8gz7j", - "insertId": "oz3blsf6qed39", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:40:04.240938329Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "oz3blsf6qed3a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:40:04.371126032Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-13T10:30:00+00:00", - "task-id": "echo", - "worker_id": "airflow-worker-8gz7j", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "oz3blsf6qed3b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:40:04.390593468Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T10:30:00+00:00", - "process": "taskinstance.py:1091", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "oz3blsf6qed3c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:40:04.391265024Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-8gz7j", - "process": "taskinstance.py:1289", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:30:00+00:00", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "oz3blsf6qed3d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:40:04.392016301Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "process": "taskinstance.py:1290", - "task-id": "echo", - "worker_id": "airflow-worker-8gz7j", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "oz3blsf6qed3e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:40:04.392570691Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "taskinstance.py:1291", - "worker_id": "airflow-worker-8gz7j", - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T10:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "oz3blsf6qed3f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:40:04.728825118Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "oz3blsf6qed3g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:40:04.728874618Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "oz3blsf6qed3h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:40:04.750649500Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "oz3blsf6qed3i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:40:04.750698287Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Executing on 2023-09-13 10:30:00+00:00", - "insertId": "oz3blsf6qed3j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:40:05.609808863Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T10:30:00+00:00", - "process": "taskinstance.py:1310", - "map-index": "-1", - "worker_id": "airflow-worker-8gz7j", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Started process 359 to run task", - "insertId": "oz3blsf6qed3k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:40:05.647405611Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-13T10:30:00+00:00", - "worker_id": "airflow-worker-8gz7j", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T10:30:00+00:00', '--job-id', '980', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp_ayk_8v3']", - "insertId": "oz3blsf6qed3l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:40:05.649941161Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-8gz7j", - "process": "standard_task_runner.py:82", - "execution-date": "2023-09-13T10:30:00+00:00", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Job 980: Subtask echo", - "insertId": "oz3blsf6qed3m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:40:05.650527313Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:83", - "task-id": "echo", - "worker_id": "airflow-worker-8gz7j", - "workflow": "airflow_monitoring", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T10:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Running on host airflow-worker-8gz7j", - "insertId": "oz3blsf6qed3n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:40:06.015859843Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T10:30:00+00:00", - "process": "task_command.py:393", - "worker_id": "airflow-worker-8gz7j", - "map-index": "-1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T10:30:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T10:30:00+00:00", - "insertId": "oz3blsf6qed3o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:40:06.231046991Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:30:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-8gz7j", - "process": "taskinstance.py:1518" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "oz3blsf6qed3p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:40:06.233649658Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-8gz7j", - "execution-date": "2023-09-13T10:30:00+00:00", - "workflow": "airflow_monitoring", - "process": "subprocess.py:63" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "oz3blsf6qed3q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:40:06.235673050Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:30:00+00:00", - "process": "subprocess.py:75", - "try-number": "1", - "worker_id": "airflow-worker-8gz7j", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Output:", - "insertId": "oz3blsf6qed3r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:40:06.395635787Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "try-number": "1", - "execution-date": "2023-09-13T10:30:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "subprocess.py:86" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "test", - "insertId": "oz3blsf6qed3s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:40:06.402328998Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "process": "subprocess.py:93", - "execution-date": "2023-09-13T10:30:00+00:00", - "worker_id": "airflow-worker-8gz7j", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "oz3blsf6qed3t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:40:06.405389750Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "process": "subprocess.py:97", - "workflow": "airflow_monitoring", - "try-number": "1", - "execution-date": "2023-09-13T10:30:00+00:00", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T103000, start_date=20230913T104004, end_date=20230913T104006", - "insertId": "oz3blsf6qed3u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:40:06.455465698Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1328", - "map-index": "-1", - "task-id": "echo", - "execution-date": "2023-09-13T10:30:00+00:00", - "worker_id": "airflow-worker-8gz7j", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:07.521755432Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1q5vifcflzisxc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:40:07.154979877Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T10:30:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "local_task_job.py:212", - "try-number": "1", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:12.683115619Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1q5vifcflzisxd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:40:07.226283770Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:2599", - "execution-date": "2023-09-13T10:30:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-8gz7j", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:12.683115619Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[398edb52-3adf-4345-95d8-63435549d738] succeeded in 5.166132819984341s: None", - "insertId": "1q5vifcflzisxe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:40:07.389226265Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:40:12.683115619Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "14e1u1f4n4sbh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:41:03.121987818Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:41:06.278781324Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "ibb1udfp3hwqa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:46:04.737274245Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:46:09.340408812Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[60ba5168-d829-4b7d-9d71-d8e9e2be03cb] received", - "insertId": "3md8g2fj3kddh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:01.066497428Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "[60ba5168-d829-4b7d-9d71-d8e9e2be03cb] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T10:40:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "3md8g2fj3kddi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:01.071270314Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "3md8g2fj3kddj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:01.421913254Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "3md8g2fj3kddk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:01.425032911Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "3md8g2fj3kddl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:01.554853670Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "3md8g2fj3kddm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:02.505148962Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Running on host airflow-worker-8gz7j", - "insertId": "3md8g2fj3kddn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:03.116100305Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "3md8g2fj3kddo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:03.241597205Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-8gz7j", - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1091", - "map-index": "-1", - "execution-date": "2023-09-13T10:40:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "3md8g2fj3kddp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:03.259957313Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1", - "execution-date": "2023-09-13T10:40:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-8gz7j", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "3md8g2fj3kddq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:03.260379718Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1289", - "worker_id": "airflow-worker-8gz7j", - "map-index": "-1", - "execution-date": "2023-09-13T10:40:00+00:00", - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "3md8g2fj3kddr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:03.260987799Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T10:40:00+00:00", - "task-id": "echo", - "process": "taskinstance.py:1290", - "worker_id": "airflow-worker-8gz7j", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "3md8g2fj3kdds", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:03.261444332Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:40:00+00:00", - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "process": "taskinstance.py:1291", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "3md8g2fj3kddt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:03.559318359Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "3md8g2fj3kddu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:03.559361201Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "3md8g2fj3kddv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:03.579841432Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "3md8g2fj3kddw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:03.579920882Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Executing on 2023-09-13 10:40:00+00:00", - "insertId": "3md8g2fj3kddx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:04.182320758Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T10:40:00+00:00", - "process": "taskinstance.py:1310", - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-8gz7j", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Started process 586 to run task", - "insertId": "3md8g2fj3kddy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:04.224384139Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:40:00+00:00", - "task-id": "echo", - "map-index": "-1", - "process": "standard_task_runner.py:55", - "try-number": "1", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T10:40:00+00:00', '--job-id', '983', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpqbikl9li']", - "insertId": "3md8g2fj3kddz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:04.224428227Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T10:40:00+00:00", - "process": "standard_task_runner.py:82", - "workflow": "airflow_monitoring", - "task-id": "echo", - "worker_id": "airflow-worker-8gz7j", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Job 983: Subtask echo", - "insertId": "3md8g2fj3kde0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:04.225258613Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:83", - "workflow": "airflow_monitoring", - "try-number": "1", - "execution-date": "2023-09-13T10:40:00+00:00", - "worker_id": "airflow-worker-8gz7j", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Running on host airflow-worker-8gz7j", - "insertId": "3md8g2fj3kde1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:04.587706054Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:40:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1", - "process": "task_command.py:393", - "map-index": "-1", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T10:40:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T10:40:00+00:00", - "insertId": "3md8g2fj3kde2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:04.787092478Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "echo", - "execution-date": "2023-09-13T10:40:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "process": "taskinstance.py:1518", - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "3md8g2fj3kde3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:04.789737337Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:40:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-8gz7j", - "process": "subprocess.py:63", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "3md8g2fj3kde4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:04.791911181Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "execution-date": "2023-09-13T10:40:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "subprocess.py:75", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Output:", - "insertId": "3md8g2fj3kde5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:04.931612055Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:40:00+00:00", - "worker_id": "airflow-worker-8gz7j", - "map-index": "-1", - "task-id": "echo", - "process": "subprocess.py:86", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "test", - "insertId": "3md8g2fj3kde6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:04.936873359Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T10:40:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-8gz7j", - "map-index": "-1", - "process": "subprocess.py:93" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "3md8g2fj3kde7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:04.938029108Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T10:40:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-8gz7j", - "process": "subprocess.py:97", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T104000, start_date=20230913T105003, end_date=20230913T105004", - "insertId": "3md8g2fj3kde8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:04.981922247Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-8gz7j", - "execution-date": "2023-09-13T10:40:00+00:00", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1328", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "3md8g2fj3kde9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:05.784567260Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:40:00+00:00", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-8gz7j", - "try-number": "1", - "process": "local_task_job.py:212" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:06.862236066Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "y4qkvef78tnxi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:05.930744285Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T10:40:00+00:00", - "task-id": "echo", - "worker_id": "airflow-worker-8gz7j", - "try-number": "1", - "process": "taskinstance.py:2599", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:11.876120857Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[60ba5168-d829-4b7d-9d71-d8e9e2be03cb] succeeded in 5.24778766700183s: None", - "insertId": "y4qkvef78tnxj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:06.317432436Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:11.876120857Z" - }, - { - "textPayload": "I0913 10:50:08.712060 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "8vnkihf8ypv19", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:08.712326940Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:50:15.651568306Z" - }, - { - "textPayload": "I0913 10:50:08.714048 1 airflowworkerset_controller.go:268] \"controllers/AirflowWorkerSet: Worker uses old template. Recreating.\" worker name=\"airflow-worker-8gz7j\"", - "insertId": "8vnkihf8ypv1a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:08.714236583Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:50:15.651568306Z" - }, - { - "textPayload": "I0913 10:50:08.748013 1 airflowworkerset_controller.go:77] \"controllers/AirflowWorkerSet: Template changed, workers recreated.\"", - "insertId": "8vnkihf8ypv1b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:08.748225495Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:50:15.651568306Z" - }, - { - "textPayload": "I0913 10:50:08.748946 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "8vnkihf8ypv1c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:08.749054021Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:50:15.651568306Z" - }, - { - "textPayload": "Caught SIGTERM signal!", - "insertId": "y4qkvef78tnxk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:08.808320593Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:11.876120857Z" - }, - { - "textPayload": "Passing SIGTERM to Airflow process.", - "insertId": "y4qkvef78tnxl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:08.808368270Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:11.876120857Z" - }, - { - "textPayload": "", - "insertId": "y4qkvef78tnxm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:08.808385606Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:11.876120857Z" - }, - { - "textPayload": "worker: Warm shutdown (MainProcess)", - "insertId": "y4qkvef78tnxn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:08.808427620Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:11.876120857Z" - }, - { - "textPayload": "I0913 10:50:08.826454 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "8vnkihf8ypv1d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:08.826741223Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:50:15.651568306Z" - }, - { - "textPayload": "Exiting due to SIGTERM.", - "insertId": "xkonvff6nb7ku", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:15.105755907Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-8gz7j" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:50:17.003150726Z" - }, - { - "textPayload": "I0913 10:50:15.776793 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1dogrosfj0b1km", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:15.777041265Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:50:22.724001289Z" - }, - { - "textPayload": "I0913 10:50:15.778056 1 airflowworkerset_controller.go:97] \"controllers/AirflowWorkerSet: Workers scale up needed.\" current number of workers=0 desired=1 scaling up by=1", - "insertId": "1dogrosfj0b1kn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:15.778259064Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:50:22.724001289Z" - }, - { - "textPayload": "I0913 10:50:16.019821 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1dogrosfj0b1ko", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:16.020031682Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:50:22.724001289Z" - }, - { - "textPayload": "I0913 10:50:16.022677 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1dogrosfj0b1kp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:16.022839589Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:50:22.724001289Z" - }, - { - "textPayload": "I0913 10:50:16.067622 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1dogrosfj0b1kq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:16.067851890Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:50:22.724001289Z" - }, - { - "textPayload": "I0913 10:50:16.069713 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1dogrosfj0b1kr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:16.069907596Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:50:22.724001289Z" - }, - { - "textPayload": "Starting the process, got command: worker", - "insertId": "1jrvr6tfib0upt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:17.449260692Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "Initializing airflow.cfg.", - "insertId": "1jrvr6tfib0upu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:17.452895149Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "airflow.cfg initialization is done.", - "insertId": "1jrvr6tfib0upv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:17.465120633Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "I0913 10:50:17.923829 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1dogrosfj0b1ks", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:17.924045617Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:50:22.724001289Z" - }, - { - "textPayload": "I0913 10:50:17.974616 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1dogrosfj0b1kt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:17.974764749Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:50:22.724001289Z" - }, - { - "textPayload": "I0913 10:50:19.559303 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1dogrosfj0b1ku", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:19.559474601Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T10:50:22.724001289Z" - }, - { - "textPayload": "Setupping GCS Fuse.", - "insertId": "1jrvr6tfib0upw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:24.705894825Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "gcsfuse mount seems ready, proceeding.", - "insertId": "1jrvr6tfib0upx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:24.706314019Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "Initializing kube_config.", - "insertId": "1jrvr6tfib0upy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:24.723259210Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "Fetching cluster endpoint and auth data.", - "insertId": "1jrvr6tfib0upz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:31.875563495Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "kubeconfig entry generated for us-west1-openlineage-1614b57c-gke.", - "insertId": "1jrvr6tfib0uq0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:32.120236296Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "/home/airflow/composer_kube_config is initialized", - "insertId": "1jrvr6tfib0uq1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:37.455069311Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "Waiting for dags and plugins synchronization.", - "insertId": "1jrvr6tfib0uq2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:37.456425640Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1jrvr6tfib0uq3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:37.457570386Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "Searching for recent worker pod evictions", - "insertId": "1jrvr6tfib0uq4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:50:37.469597408Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1jrvr6tfib0uq5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:42.471689286Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "Finished searching for recent worker pod evictions", - "insertId": "1jrvr6tfib0uq6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:44.749023055Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1jrvr6tfib0uq7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:47.478958851Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1jrvr6tfib0uq8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:50:52.485584865Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1jrvr6tfib0uq9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:50:57.492880446Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1jrvr6tfib0uqa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:51:02.500085921Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:06.216233168Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1mjy76tfixx1i1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:51:07.525042598Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:12.877831890Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1bgjvxgfizvvky", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:51:12.533558718Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:17.828662989Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "12ai80yfjd2xuk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:51:17.543777999Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:22.824209987Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1q63gwwfp1vcjv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:51:22.550020710Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:27.817787554Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "l3qx7jfj8ltgl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:51:27.557696360Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:32.818777692Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "fjij6qfm060tv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:51:32.565475568Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:37.820082919Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "184cnu8flzbprc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:51:37.570900190Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:42.827879956Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1yhfno8fj2sfwp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:51:42.578133263Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:47.820483312Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "sbgsuzfp45fbi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:51:47.585712941Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:52.824766428Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "epq13ofe7cobz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:51:52.592972276Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:51:57.820460038Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "kji2uqfh2emuf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:51:57.599420011Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:02.820751738Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "q344tifiz8n9u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:52:02.616472025Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:07.831752633Z" - }, - { - "textPayload": "Dags and plugins are synced", - "insertId": "1fwadnefp70j50", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:52:07.626554859Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:12.954623935Z" - }, - { - "textPayload": "Starting Airflow Celery Flower API.", - "insertId": "1fwadnefp70j51", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:52:07.627860665Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:12.954623935Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "b3m8atf8kvuml", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:52:29.624810541Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:35.093872210Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "b3m8atf8kvumm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:52:29.628549929Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:35.093872210Z" - }, - { - "textPayload": " ", - "insertId": "a9mb6f8c4bnz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:52:44.615151844Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": " -------------- celery@airflow-worker-qlmrk v5.2.7 (dawn-chorus)", - "insertId": "a9mb6f8c4bo0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:52:44.615293744Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "a9mb6f8c4bo1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:52:44.615308010Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": "-- ******* ---- Linux-5.15.109+-x86_64-with-glibc2.27 2023-09-13 10:52:44", - "insertId": "a9mb6f8c4bo2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:52:44.615315541Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": "- *** --- * --- ", - "insertId": "a9mb6f8c4bo3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:52:44.615320863Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": "- ** ---------- [config]", - "insertId": "a9mb6f8c4bo4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:52:44.615327071Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": "- ** ---------- .> app: airflow.executors.celery_executor:0x789da3dac3a0", - "insertId": "a9mb6f8c4bo5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:52:44.615332024Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": "- ** ---------- .> transport: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "a9mb6f8c4bo6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:52:44.615372947Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": "- ** ---------- .> results: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "a9mb6f8c4bo7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:52:44.615381036Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": "- *** --- * --- .> concurrency: 6 (prefork)", - "insertId": "a9mb6f8c4bo8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:52:44.615386770Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": "-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)", - "insertId": "a9mb6f8c4bo9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:52:44.615391706Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "a9mb6f8c4boa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:52:44.615396658Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": " -------------- [queues]", - "insertId": "a9mb6f8c4bob", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:52:44.615401726Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": " .> default exchange=default(direct) key=default", - "insertId": "a9mb6f8c4boc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:52:44.615406576Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": " ", - "insertId": "a9mb6f8c4bod", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:52:44.615411835Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": "", - "insertId": "a9mb6f8c4boe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:52:44.615417084Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": "[tasks]", - "insertId": "a9mb6f8c4bof", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:52:44.615422506Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": " . airflow.executors.celery_executor.execute_command", - "insertId": "a9mb6f8c4bog", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:52:44.615428120Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": "", - "insertId": "a9mb6f8c4boh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:52:44.615433229Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:47.151126558Z" - }, - { - "textPayload": "Connected to redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "15cwy51flvavtn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:52:50.132588322Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "process": "connection.py:22" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:54.208482912Z" - }, - { - "textPayload": "mingle: searching for neighbors", - "insertId": "15cwy51flvavto", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T10:52:50.205994268Z", - "severity": "INFO", - "labels": { - "process": "mingle.py:40", - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:54.208482912Z" - }, - { - "textPayload": "mingle: all alone", - "insertId": "15cwy51flvavtp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T10:52:51.242855747Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "process": "mingle.py:49" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:54.208482912Z" - }, - { - "textPayload": "celery@airflow-worker-qlmrk ready.", - "insertId": "15cwy51flvavtq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:52:51.283022177Z", - "severity": "INFO", - "labels": { - "process": "worker.py:176", - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:52:54.208482912Z" - }, - { - "textPayload": "Events of group {task} enabled by remote.", - "insertId": "16qvb51fj43srp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:52:53.811294352Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "process": "control.py:277" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:53:00.292789332Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "w8lfwtfp5xh9g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T10:57:16.725495225Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T10:57:17.570009703Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[f9f4f601-839e-4ab3-b866-47709941f8cb] received", - "insertId": "rhgbdifj0e9me", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:00:01.148200162Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "[f9f4f601-839e-4ab3-b866-47709941f8cb] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T10:50:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "rhgbdifj0e9mf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:00:01.188041439Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "rhgbdifj0e9mg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:00:01.546316425Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "rhgbdifj0e9mh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:00:01.548616633Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "rhgbdifj0e9mi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:00:01.722888184Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "rhgbdifj0e9mj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:00:02.637517805Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Running on host airflow-worker-qlmrk", - "insertId": "rhgbdifj0e9mk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:00:03.217466127Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "rhgbdifj0e9ml", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:00:03.348190862Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "execution-date": "2023-09-13T10:50:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-qlmrk", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "rhgbdifj0e9mm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:00:03.366774435Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-qlmrk", - "process": "taskinstance.py:1091", - "task-id": "echo", - "execution-date": "2023-09-13T10:50:00+00:00", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "rhgbdifj0e9mn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:00:03.367348360Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-qlmrk", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1289", - "execution-date": "2023-09-13T10:50:00+00:00", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "rhgbdifj0e9mo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:00:03.367848080Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T10:50:00+00:00", - "map-index": "-1", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1290", - "worker_id": "airflow-worker-qlmrk", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "rhgbdifj0e9mp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:00:03.368420444Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "map-index": "-1", - "process": "taskinstance.py:1291", - "execution-date": "2023-09-13T10:50:00+00:00", - "worker_id": "airflow-worker-qlmrk", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "rhgbdifj0e9mq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:00:03.859267117Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "rhgbdifj0e9mr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:00:03.859313913Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "rhgbdifj0e9ms", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:00:03.917059795Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "rhgbdifj0e9mt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:00:03.917124672Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Executing on 2023-09-13 10:50:00+00:00", - "insertId": "rhgbdifj0e9mu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:00:04.823830556Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:50:00+00:00", - "task-id": "echo", - "try-number": "1", - "worker_id": "airflow-worker-qlmrk", - "process": "taskinstance.py:1310", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Started process 325 to run task", - "insertId": "rhgbdifj0e9mv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:00:04.893429846Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "process": "standard_task_runner.py:55", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "task-id": "echo", - "execution-date": "2023-09-13T10:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T10:50:00+00:00', '--job-id', '985', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpdxt6zt8s']", - "insertId": "rhgbdifj0e9mw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:00:04.894355653Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T10:50:00+00:00", - "worker_id": "airflow-worker-qlmrk", - "process": "standard_task_runner.py:82", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Job 985: Subtask echo", - "insertId": "rhgbdifj0e9mx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:00:04.895203012Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:50:00+00:00", - "map-index": "-1", - "process": "standard_task_runner.py:83", - "task-id": "echo", - "worker_id": "airflow-worker-qlmrk", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Running on host airflow-worker-qlmrk", - "insertId": "rhgbdifj0e9my", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:00:05.258069891Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "map-index": "-1", - "process": "task_command.py:393", - "worker_id": "airflow-worker-qlmrk", - "execution-date": "2023-09-13T10:50:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T10:50:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T10:50:00+00:00", - "insertId": "rhgbdifj0e9mz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:00:05.449289160Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:50:00+00:00", - "process": "taskinstance.py:1518", - "worker_id": "airflow-worker-qlmrk", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "rhgbdifj0e9n0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:00:05.452157001Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T10:50:00+00:00", - "process": "subprocess.py:63", - "map-index": "-1", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-qlmrk", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "rhgbdifj0e9n1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:00:05.454373264Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-qlmrk", - "execution-date": "2023-09-13T10:50:00+00:00", - "process": "subprocess.py:75", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Output:", - "insertId": "rhgbdifj0e9n2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:00:05.610028107Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-qlmrk", - "map-index": "-1", - "task-id": "echo", - "execution-date": "2023-09-13T10:50:00+00:00", - "process": "subprocess.py:86", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "test", - "insertId": "rhgbdifj0e9n3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:00:05.619921283Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-qlmrk", - "execution-date": "2023-09-13T10:50:00+00:00", - "workflow": "airflow_monitoring", - "process": "subprocess.py:93", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "rhgbdifj0e9n4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:00:05.621730275Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "try-number": "1", - "process": "subprocess.py:97", - "map-index": "-1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "execution-date": "2023-09-13T10:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T105000, start_date=20230913T110003, end_date=20230913T110005", - "insertId": "rhgbdifj0e9n5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:00:05.666812205Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-qlmrk", - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:1328", - "execution-date": "2023-09-13T10:50:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:06.844919069Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "kji2uqfh2z1o9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:00:06.399499524Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-qlmrk", - "execution-date": "2023-09-13T10:50:00+00:00", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1", - "process": "local_task_job.py:212" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:11.845488891Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "kji2uqfh2z1oa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:00:06.485182369Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T10:50:00+00:00", - "task-id": "echo", - "process": "taskinstance.py:2599", - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:11.845488891Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[f9f4f601-839e-4ab3-b866-47709941f8cb] succeeded in 5.493013041996164s: None", - "insertId": "kji2uqfh2z1ob", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:00:06.646740040Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:00:11.845488891Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "11gorr5fp9w19q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:02:29.017669616Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:02:34.845040360Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "12knff2fidzl8d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:07:20.835254054Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:07:27.292124717Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[9736a748-577b-4863-936b-1ca0a10dcbe9] received", - "insertId": "12ai80yfjehrqj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:10:02.725511519Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:03.821496665Z" - }, - { - "textPayload": "[9736a748-577b-4863-936b-1ca0a10dcbe9] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T11:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "12ai80yfjehrqk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:10:02.731266108Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:03.821496665Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1mjy76tfizbilo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:10:03.061007752Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1mjy76tfizbilp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:10:03.063492235Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1mjy76tfizbilq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:10:03.339371005Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "1mjy76tfizbilr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:10:04.203888251Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Running on host airflow-worker-qlmrk", - "insertId": "1mjy76tfizbils", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:10:04.750909470Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1mjy76tfizbilt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:10:04.885564287Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "taskinstance.py:1091", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-qlmrk", - "map-index": "-1", - "execution-date": "2023-09-13T11:00:00+00:00", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1mjy76tfizbilu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:10:04.912791940Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-qlmrk", - "execution-date": "2023-09-13T11:00:00+00:00", - "task-id": "echo", - "try-number": "1", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1mjy76tfizbilv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:10:04.913398712Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "process": "taskinstance.py:1289", - "try-number": "1", - "execution-date": "2023-09-13T11:00:00+00:00", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "1mjy76tfizbilw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:10:04.913909635Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T11:00:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1290", - "workflow": "airflow_monitoring", - "task-id": "echo", - "worker_id": "airflow-worker-qlmrk", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1mjy76tfizbilx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:10:04.916557610Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T11:00:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1291", - "map-index": "-1", - "worker_id": "airflow-worker-qlmrk", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1mjy76tfizbily", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:10:05.141779303Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1mjy76tfizbilz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:10:05.141865964Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1mjy76tfizbim0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:10:05.161895606Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1mjy76tfizbim1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:10:05.161962459Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Executing on 2023-09-13 11:00:00+00:00", - "insertId": "1mjy76tfizbim2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:10:05.972851722Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1310", - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-qlmrk", - "execution-date": "2023-09-13T11:00:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Started process 559 to run task", - "insertId": "1mjy76tfizbim3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:10:06.015093505Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T11:00:00+00:00", - "process": "standard_task_runner.py:55", - "map-index": "-1", - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T11:00:00+00:00', '--job-id', '988', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp1m11t36n']", - "insertId": "1mjy76tfizbim4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:10:06.015756601Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T11:00:00+00:00", - "try-number": "1", - "process": "standard_task_runner.py:82" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Job 988: Subtask echo", - "insertId": "1mjy76tfizbim5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:10:06.017753266Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T11:00:00+00:00", - "worker_id": "airflow-worker-qlmrk", - "map-index": "-1", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:83", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Running on host airflow-worker-qlmrk", - "insertId": "1mjy76tfizbim6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:10:06.405704638Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "task_command.py:393", - "execution-date": "2023-09-13T11:00:00+00:00", - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T11:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T11:00:00+00:00", - "insertId": "1mjy76tfizbim7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:10:06.609537461Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1518", - "map-index": "-1", - "task-id": "echo", - "try-number": "1", - "worker_id": "airflow-worker-qlmrk", - "execution-date": "2023-09-13T11:00:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1mjy76tfizbim8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:10:06.615006451Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "process": "subprocess.py:63", - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "execution-date": "2023-09-13T11:00:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1mjy76tfizbim9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:10:06.616871180Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T11:00:00+00:00", - "worker_id": "airflow-worker-qlmrk", - "try-number": "1", - "process": "subprocess.py:75", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Output:", - "insertId": "1mjy76tfizbima", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:10:06.774478093Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "subprocess.py:86", - "execution-date": "2023-09-13T11:00:00+00:00", - "worker_id": "airflow-worker-qlmrk", - "task-id": "echo", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "test", - "insertId": "1mjy76tfizbimb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:10:06.782152132Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T11:00:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-qlmrk", - "process": "subprocess.py:93", - "map-index": "-1", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1mjy76tfizbimc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:10:06.783005310Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-qlmrk", - "try-number": "1", - "execution-date": "2023-09-13T11:00:00+00:00", - "task-id": "echo", - "process": "subprocess.py:97", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T110000, start_date=20230913T111004, end_date=20230913T111006", - "insertId": "1mjy76tfizbimd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:10:06.828967953Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "execution-date": "2023-09-13T11:00:00+00:00", - "process": "taskinstance.py:1328", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1mjy76tfizbime", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:10:07.639356431Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "local_task_job.py:212", - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T11:00:00+00:00", - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1mjy76tfizbimf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:10:07.698195638Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk", - "try-number": "1", - "process": "taskinstance.py:2599", - "execution-date": "2023-09-13T11:00:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:08.834929183Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[9736a748-577b-4863-936b-1ca0a10dcbe9] succeeded in 5.125397779978812s: None", - "insertId": "1uvljhdflxh5mi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:10:07.854976910Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:10:13.990675469Z" - }, - { - "textPayload": "I0913 11:11:17.311270 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "14ssg84fdaxwjg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:11:17.311499713Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:11:24.278744646Z" - }, - { - "textPayload": "I0913 11:11:17.313117 1 airflowworkerset_controller.go:268] \"controllers/AirflowWorkerSet: Worker uses old template. Recreating.\" worker name=\"airflow-worker-qlmrk\"", - "insertId": "14ssg84fdaxwjh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:11:17.313338645Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:11:24.278744646Z" - }, - { - "textPayload": "I0913 11:11:17.349303 1 airflowworkerset_controller.go:77] \"controllers/AirflowWorkerSet: Template changed, workers recreated.\"", - "insertId": "14ssg84fdaxwji", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:11:17.349585165Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:11:24.278744646Z" - }, - { - "textPayload": "I0913 11:11:17.350830 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "14ssg84fdaxwjj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:11:17.351025391Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:11:24.278744646Z" - }, - { - "textPayload": "Caught SIGTERM signal!", - "insertId": "d1xhuyfm2xgxi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:11:17.371343049Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:11:23.208592542Z" - }, - { - "textPayload": "Passing SIGTERM to Airflow process.", - "insertId": "d1xhuyfm2xgxj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:11:17.371388271Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:11:23.208592542Z" - }, - { - "textPayload": "", - "insertId": "d1xhuyfm2xgxk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:11:17.371415128Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:11:23.208592542Z" - }, - { - "textPayload": "worker: Warm shutdown (MainProcess)", - "insertId": "d1xhuyfm2xgxl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:11:17.371423195Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:11:23.208592542Z" - }, - { - "textPayload": "I0913 11:11:17.416054 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "14ssg84fdaxwjk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:11:17.416331538Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:11:24.278744646Z" - }, - { - "textPayload": "Exiting due to SIGTERM.", - "insertId": "1iybm61fihcd3i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:11:23.125304552Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qlmrk" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:11:29.257314592Z" - }, - { - "textPayload": "I0913 11:11:23.684008 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "nbaj2ifp1uv5w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:11:23.686707485Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:11:29.411773301Z" - }, - { - "textPayload": "I0913 11:11:23.685169 1 airflowworkerset_controller.go:97] \"controllers/AirflowWorkerSet: Workers scale up needed.\" current number of workers=0 desired=1 scaling up by=1", - "insertId": "nbaj2ifp1uv5x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:11:23.686819819Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:11:29.411773301Z" - }, - { - "textPayload": "I0913 11:11:24.867046 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "nbaj2ifp1uv5y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:11:24.867246851Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:11:29.411773301Z" - }, - { - "textPayload": "I0913 11:11:24.954416 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "nbaj2ifp1uv5z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:11:24.955198153Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:11:29.411773301Z" - }, - { - "textPayload": "Starting the process, got command: worker", - "insertId": "28bgxpfp8eczy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:11:26.270283920Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "Initializing airflow.cfg.", - "insertId": "28bgxpfp8eczz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:11:26.273767336Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "airflow.cfg initialization is done.", - "insertId": "28bgxpfp8ed00", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:11:26.288438227Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "I0913 11:11:26.911961 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "nbaj2ifp1uv60", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:11:26.913853340Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:11:29.411773301Z" - }, - { - "textPayload": "I0913 11:11:27.157713 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "nbaj2ifp1uv61", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:11:27.160029576Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:11:29.411773301Z" - }, - { - "textPayload": "Setupping GCS Fuse.", - "insertId": "28bgxpfp8ed01", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:11:33.338664529Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "gcsfuse mount seems ready, proceeding.", - "insertId": "28bgxpfp8ed02", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:11:33.339007779Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "Initializing kube_config.", - "insertId": "28bgxpfp8ed03", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:11:33.405972474Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "Fetching cluster endpoint and auth data.", - "insertId": "28bgxpfp8ed04", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:11:40.558844271Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "kubeconfig entry generated for us-west1-openlineage-1614b57c-gke.", - "insertId": "28bgxpfp8ed05", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:11:40.798155172Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "/home/airflow/composer_kube_config is initialized", - "insertId": "28bgxpfp8ed06", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:11:46.240439637Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "Waiting for dags and plugins synchronization.", - "insertId": "28bgxpfp8ed07", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:11:46.241007867Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "28bgxpfp8ed08", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:11:46.241392828Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "Searching for recent worker pod evictions", - "insertId": "28bgxpfp8ed09", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:11:46.252834292Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "28bgxpfp8ed0a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:11:51.303333065Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "Finished searching for recent worker pod evictions", - "insertId": "28bgxpfp8ed0b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:11:53.470104625Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "28bgxpfp8ed0c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:11:56.314053484Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "28bgxpfp8ed0d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:12:01.319083319Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:06.459641227Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "y55jdnfi8o55b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:12:06.325537801Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:11.859794893Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "n18v79fj3fr4u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:12:11.331838039Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:16.839068629Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "xb3bxzfj11uh3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:12:16.339373254Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:21.837862213Z" - }, - { - "textPayload": "I0913 11:12:18.072306 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1h9zaycfj0pngm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:12:18.072585314Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:12:23.643231638Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1no1646fpaciy8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:12:21.346690728Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:26.835942744Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1aw5z45fj5t8kz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:12:26.353842427Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:31.842308794Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "fjs92bfpab6qp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:12:31.367517335Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:36.837210519Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "r701dhfizvinr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:12:36.374521981Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:41.839127485Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1nds5mcflxyb0g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:12:41.382029853Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:46.835496160Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1p1njdvfm2g92h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:12:46.390292421Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:51.837059846Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "4q7ifdf6wd3b4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:12:51.397649879Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:12:56.839433874Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "4qezw1fj18ixj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:12:56.407274629Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:13:01.834562042Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1trn0fdf6iq4nv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:13:01.416049814Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:13:06.837136035Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "170v286f42l6g3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:13:06.431144588Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:13:11.863090613Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "15wv3f1fj25aj8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:13:11.438505967Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:13:16.826051158Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1k1vn5pfj0qhyp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:13:16.445880373Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:13:21.851367602Z" - }, - { - "textPayload": "Dags and plugins are synced", - "insertId": "17ka3lff8f8ezm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:13:21.452867202Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:13:26.833489040Z" - }, - { - "textPayload": "Starting Airflow Celery Flower API.", - "insertId": "17ka3lff8f8ezn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:13:21.454058649Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:13:26.833489040Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1mjrh8kflywrl2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:13:42.926745365Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:13:44.952873581Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1mjrh8kflywrl3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:13:42.928640660Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:13:44.952873581Z" - }, - { - "textPayload": " ", - "insertId": "s1cteafiy49uf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:13:57.974405214Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": " -------------- celery@airflow-worker-pc6sj v5.2.7 (dawn-chorus)", - "insertId": "s1cteafiy49ug", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:13:57.974466474Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "s1cteafiy49uh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:13:57.974475055Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "-- ******* ---- Linux-5.15.109+-x86_64-with-glibc2.27 2023-09-13 11:13:57", - "insertId": "s1cteafiy49ui", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:13:57.974481558Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "- *** --- * --- ", - "insertId": "s1cteafiy49uj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:13:57.974487038Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "- ** ---------- [config]", - "insertId": "s1cteafiy49uk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:13:57.974493360Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "- ** ---------- .> app: airflow.executors.celery_executor:0x7966e60ef370", - "insertId": "s1cteafiy49ul", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:13:57.974520427Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "- ** ---------- .> transport: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "s1cteafiy49um", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:13:57.974535864Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "- ** ---------- .> results: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "s1cteafiy49un", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:13:57.974544865Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "- *** --- * --- .> concurrency: 6 (prefork)", - "insertId": "s1cteafiy49uo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:13:57.974550872Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)", - "insertId": "s1cteafiy49up", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:13:57.974555877Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "s1cteafiy49uq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:13:57.974561137Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": " -------------- [queues]", - "insertId": "s1cteafiy49ur", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:13:57.974566119Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": " .> default exchange=default(direct) key=default", - "insertId": "s1cteafiy49us", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:13:57.974571181Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": " ", - "insertId": "s1cteafiy49ut", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:13:57.974576578Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "", - "insertId": "s1cteafiy49uu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:13:57.974581534Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "[tasks]", - "insertId": "s1cteafiy49uv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:13:57.974586681Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": " . airflow.executors.celery_executor.execute_command", - "insertId": "s1cteafiy49uw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:13:57.974591756Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "", - "insertId": "s1cteafiy49ux", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:13:57.974596987Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "Connected to redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "s1cteafiy49uy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:14:03.530965289Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj", - "process": "connection.py:22" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "mingle: searching for neighbors", - "insertId": "s1cteafiy49uz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:14:03.545264663Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj", - "process": "mingle.py:40" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:04.076170913Z" - }, - { - "textPayload": "mingle: all alone", - "insertId": "mhnv7kf3jre2u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:14:04.630451301Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj", - "process": "mingle.py:49" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:09.885745298Z" - }, - { - "textPayload": "celery@airflow-worker-pc6sj ready.", - "insertId": "mhnv7kf3jre2v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:14:04.670614802Z", - "severity": "INFO", - "labels": { - "process": "worker.py:176", - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:09.885745298Z" - }, - { - "textPayload": "Events of group {task} enabled by remote.", - "insertId": "mhnv7kf3jre2w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:14:06.932661009Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj", - "process": "control.py:277" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:14:09.885745298Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "13ytouqf6u2mjg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:18:30.426721122Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:18:36.075254274Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[c5d4b0c9-8f89-4920-b8fb-444cb7d33d69] received", - "insertId": "2c1mfifjxzn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:20:47.726239150Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "[c5d4b0c9-8f89-4920-b8fb-444cb7d33d69] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T11:10:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "2c1mfifjxzo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:20:47.767646495Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "2c1mfifjxzp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:20:48.127037839Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-pc6sj", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "2c1mfifjxzq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:20:48.129844178Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-pc6sj", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "2c1mfifjxzr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:20:48.254114969Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "2c1mfifjxzs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:20:49.204396851Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Running on host airflow-worker-pc6sj", - "insertId": "2c1mfifjxzt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:20:49.726258770Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "2c1mfifjxzu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:20:49.864023916Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "worker_id": "airflow-worker-pc6sj", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-13T11:10:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "2c1mfifjxzv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:20:49.889080563Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "workflow": "airflow_monitoring", - "try-number": "1", - "execution-date": "2023-09-13T11:10:00+00:00", - "task-id": "echo", - "map-index": "-1", - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "2c1mfifjxzw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:20:49.889976156Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1289", - "try-number": "1", - "execution-date": "2023-09-13T11:10:00+00:00", - "task-id": "echo", - "worker_id": "airflow-worker-pc6sj", - "map-index": "-1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "2c1mfifjxzx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:20:49.890745377Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T11:10:00+00:00", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-pc6sj", - "process": "taskinstance.py:1290", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "2c1mfifjxzy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:20:49.891371805Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "process": "taskinstance.py:1291", - "worker_id": "airflow-worker-pc6sj", - "task-id": "echo", - "execution-date": "2023-09-13T11:10:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "2c1mfifjxzz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:20:50.096840571Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "2c1mfifjy00", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:20:50.096884925Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "2c1mfifjy01", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:20:50.119139566Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "2c1mfifjy02", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:20:50.119179218Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Executing on 2023-09-13 11:10:00+00:00", - "insertId": "2c1mfifjy03", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:20:51.069697115Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "taskinstance.py:1310", - "worker_id": "airflow-worker-pc6sj", - "execution-date": "2023-09-13T11:10:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Started process 318 to run task", - "insertId": "2c1mfifjy04", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:20:51.107741295Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T11:10:00+00:00", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:55", - "map-index": "-1", - "worker_id": "airflow-worker-pc6sj", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T11:10:00+00:00', '--job-id', '992', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp5ih3k25v']", - "insertId": "2c1mfifjy05", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:20:51.109985549Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj", - "execution-date": "2023-09-13T11:10:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1", - "task-id": "echo", - "process": "standard_task_runner.py:82", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Job 992: Subtask echo", - "insertId": "2c1mfifjy06", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:20:51.111386325Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T11:10:00+00:00", - "process": "standard_task_runner.py:83", - "map-index": "-1", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Running on host airflow-worker-pc6sj", - "insertId": "2c1mfifjy07", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:20:51.475499187Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj", - "try-number": "1", - "process": "task_command.py:393", - "task-id": "echo", - "map-index": "-1", - "execution-date": "2023-09-13T11:10:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T11:10:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T11:10:00+00:00", - "insertId": "2c1mfifjy08", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:20:51.683798707Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "worker_id": "airflow-worker-pc6sj", - "map-index": "-1", - "process": "taskinstance.py:1518", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T11:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "2c1mfifjy09", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:20:51.686032324Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-pc6sj", - "execution-date": "2023-09-13T11:10:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "process": "subprocess.py:63", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "2c1mfifjy0a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:20:51.688891692Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "task-id": "echo", - "execution-date": "2023-09-13T11:10:00+00:00", - "process": "subprocess.py:75", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Output:", - "insertId": "2c1mfifjy0b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:20:51.815417190Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:86", - "worker_id": "airflow-worker-pc6sj", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T11:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "test", - "insertId": "2c1mfifjy0c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:20:51.823324206Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj", - "task-id": "echo", - "map-index": "-1", - "process": "subprocess.py:93", - "try-number": "1", - "execution-date": "2023-09-13T11:10:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "2c1mfifjy0d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:20:51.824798974Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:97", - "map-index": "-1", - "execution-date": "2023-09-13T11:10:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-pc6sj", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T111000, start_date=20230913T112049, end_date=20230913T112051", - "insertId": "2c1mfifjy0e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:20:51.884771182Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-pc6sj", - "workflow": "airflow_monitoring", - "try-number": "1", - "process": "taskinstance.py:1328", - "execution-date": "2023-09-13T11:10:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:53.253293019Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "18e9ifjfj5qgbd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:20:53.133349934Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-pc6sj", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "local_task_job.py:212", - "execution-date": "2023-09-13T11:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:59.328345381Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "18e9ifjfj5qgbe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:20:53.212970847Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T11:10:00+00:00", - "try-number": "1", - "process": "taskinstance.py:2599", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-pc6sj", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:59.328345381Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[c5d4b0c9-8f89-4920-b8fb-444cb7d33d69] succeeded in 5.637812717992347s: None", - "insertId": "18e9ifjfj5qgbf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:20:53.368737689Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:20:59.328345381Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1ndsfnlfif4t19", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:23:43.110140680Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:23:48.937473519Z" - }, - { - "textPayload": "I0913 11:25:06.273257 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1nnwcz7fpb8ytk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:25:06.273439182Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:13.313114136Z" - }, - { - "textPayload": "I0913 11:25:06.274683 1 airflowworkerset_controller.go:268] \"controllers/AirflowWorkerSet: Worker uses old template. Recreating.\" worker name=\"airflow-worker-pc6sj\"", - "insertId": "1nnwcz7fpb8ytl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:25:06.274862014Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:13.313114136Z" - }, - { - "textPayload": "I0913 11:25:06.293176 1 airflowworkerset_controller.go:77] \"controllers/AirflowWorkerSet: Template changed, workers recreated.\"", - "insertId": "1nnwcz7fpb8ytm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:25:06.293343696Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:13.313114136Z" - }, - { - "textPayload": "I0913 11:25:06.293698 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1nnwcz7fpb8ytn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:25:06.293837311Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:13.313114136Z" - }, - { - "textPayload": "Caught SIGTERM signal!", - "insertId": "b44on9figqq3y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:25:06.323953554Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:25:11.577773991Z" - }, - { - "textPayload": "", - "insertId": "b44on9figqq40", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:25:06.324964609Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:25:11.577773991Z" - }, - { - "textPayload": "Passing SIGTERM to Airflow process.", - "insertId": "b44on9figqq3z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:25:06.325054490Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:25:11.577773991Z" - }, - { - "textPayload": "worker: Warm shutdown (MainProcess)", - "insertId": "b44on9figqq41", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:25:06.325091016Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:25:11.577773991Z" - }, - { - "textPayload": "I0913 11:25:06.348771 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1nnwcz7fpb8yto", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:25:06.349035042Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:13.313114136Z" - }, - { - "textPayload": "Exiting due to SIGTERM.", - "insertId": "b44on9figqq42", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:25:09.924380595Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-pc6sj" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:25:11.577773991Z" - }, - { - "textPayload": "I0913 11:25:11.040876 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1nnwcz7fpb8ytp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:25:11.041089801Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:13.313114136Z" - }, - { - "textPayload": "I0913 11:25:11.042095 1 airflowworkerset_controller.go:97] \"controllers/AirflowWorkerSet: Workers scale up needed.\" current number of workers=0 desired=1 scaling up by=1", - "insertId": "1nnwcz7fpb8ytq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:25:11.042233490Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:13.313114136Z" - }, - { - "textPayload": "I0913 11:25:11.310789 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1nnwcz7fpb8ytr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:25:11.311035660Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:13.313114136Z" - }, - { - "textPayload": "I0913 11:25:11.395601 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1nnwcz7fpb8yts", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:25:11.395827649Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:13.313114136Z" - }, - { - "textPayload": "I0913 11:25:11.397118 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1nnwcz7fpb8ytt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:25:11.397279463Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:13.313114136Z" - }, - { - "textPayload": "I0913 11:25:11.946241 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1nnwcz7fpb8ytu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:25:11.946460065Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:13.313114136Z" - }, - { - "textPayload": "I0913 11:25:11.989793 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1nnwcz7fpb8ytv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:25:11.989963998Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:13.313114136Z" - }, - { - "textPayload": "I0913 11:25:11.997130 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1nnwcz7fpb8ytw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:25:11.997289947Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:13.313114136Z" - }, - { - "textPayload": "Starting the process, got command: worker", - "insertId": "17uiq7cfp5g6a2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:25:13.104107177Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "Initializing airflow.cfg.", - "insertId": "17uiq7cfp5g6a3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:25:13.141078068Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "airflow.cfg initialization is done.", - "insertId": "17uiq7cfp5g6a4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:25:13.156552792Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "I0913 11:25:14.037725 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "10316sef8rly28", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:25:14.037924311Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:20.407679247Z" - }, - { - "textPayload": "I0913 11:25:14.076376 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "10316sef8rly29", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:25:14.076643571Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:20.407679247Z" - }, - { - "textPayload": "Setupping GCS Fuse.", - "insertId": "17uiq7cfp5g6a5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:25:21.040939714Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "gcsfuse mount seems ready, proceeding.", - "insertId": "17uiq7cfp5g6a6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:25:21.041511495Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "Initializing kube_config.", - "insertId": "17uiq7cfp5g6a7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:25:21.054546342Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "Fetching cluster endpoint and auth data.", - "insertId": "17uiq7cfp5g6a8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:25:28.308293095Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "kubeconfig entry generated for us-west1-openlineage-1614b57c-gke.", - "insertId": "17uiq7cfp5g6a9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:25:28.507521324Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "I0913 11:25:32.474668 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "qxdjmifp7d5ut", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:25:32.474898388Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:25:39.222739078Z" - }, - { - "textPayload": "/home/airflow/composer_kube_config is initialized", - "insertId": "17uiq7cfp5g6aa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:25:33.684934802Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "Waiting for dags and plugins synchronization.", - "insertId": "17uiq7cfp5g6ab", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:25:33.685421480Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "17uiq7cfp5g6ac", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:25:33.685482216Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "Searching for recent worker pod evictions", - "insertId": "17uiq7cfp5g6ad", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:25:33.714045412Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "17uiq7cfp5g6ae", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:25:38.712140774Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "Finished searching for recent worker pod evictions", - "insertId": "17uiq7cfp5g6af", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:25:40.460733867Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "17uiq7cfp5g6ag", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:25:43.724090442Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "17uiq7cfp5g6ah", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:25:48.730718190Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "17uiq7cfp5g6ai", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:25:53.737263270Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "17uiq7cfp5g6aj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:25:58.746923013Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "17uiq7cfp5g6ak", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:26:03.753122067Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:05.912826916Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "12aqei4fpah2ss", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:26:08.763207676Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:11.850351462Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "qnbw2ifibb9sn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:26:13.770574640Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:16.959410885Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "14igfb5f8dajoq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:26:18.780620560Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:23.823570784Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "ilm0bifp1eht2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:26:23.787519748Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:28.823572144Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "789zgqfifmhkx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:26:28.795535266Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:33.826801075Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1rjug2rfiggki7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:26:33.802510Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:38.892966848Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "12knff2fifh4dm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:26:38.813007253Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:44.827893194Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1iduk6af6luyl2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:26:43.821021039Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:49.824687461Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "19snrd5fj8lq9o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:26:48.833030826Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:54.828271958Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "15wndgafidc9of", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:26:53.839386672Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:26:59.824449707Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "qnir2zfpa7e7i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:26:58.850239634Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:04.835423077Z" - }, - { - "textPayload": "Dags and plugins are synced", - "insertId": "1rjfw4rfj2zsuc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:27:03.858199171Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:09.887193760Z" - }, - { - "textPayload": "Starting Airflow Celery Flower API.", - "insertId": "1rjfw4rfj2zsud", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:27:03.859333597Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:09.887193760Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1no5ztff6m6033", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:27:25.714414972Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:31.040698283Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1no5ztff6m6034", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:27:25.730767313Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:31.040698283Z" - }, - { - "textPayload": " ", - "insertId": "whaw58fih8ux9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:27:40.543808008Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": " -------------- celery@airflow-worker-qj84r v5.2.7 (dawn-chorus)", - "insertId": "whaw58fih8uxa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:27:40.543859138Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "whaw58fih8uxb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:27:40.543866719Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": "-- ******* ---- Linux-5.15.109+-x86_64-with-glibc2.27 2023-09-13 11:27:40", - "insertId": "whaw58fih8uxc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:27:40.543873009Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": "- *** --- * --- ", - "insertId": "whaw58fih8uxd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:27:40.543878862Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": "- ** ---------- [config]", - "insertId": "whaw58fih8uxe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:27:40.543885834Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": "- ** ---------- .> app: airflow.executors.celery_executor:0x7d61de21f5b0", - "insertId": "whaw58fih8uxf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:27:40.543891745Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": "- ** ---------- .> transport: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "whaw58fih8uxg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:27:40.543920863Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": "- ** ---------- .> results: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "whaw58fih8uxh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:27:40.543930156Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": "- *** --- * --- .> concurrency: 6 (prefork)", - "insertId": "whaw58fih8uxi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:27:40.543936832Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": "-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)", - "insertId": "whaw58fih8uxj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:27:40.543942863Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "whaw58fih8uxk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:27:40.543948485Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": " -------------- [queues]", - "insertId": "whaw58fih8uxl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:27:40.543954509Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": " .> default exchange=default(direct) key=default", - "insertId": "whaw58fih8uxm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:27:40.543959778Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": " ", - "insertId": "whaw58fih8uxn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:27:40.543964978Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": "", - "insertId": "whaw58fih8uxo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:27:40.543970431Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": "[tasks]", - "insertId": "whaw58fih8uxp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:27:40.543976560Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": " . airflow.executors.celery_executor.execute_command", - "insertId": "whaw58fih8uxq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:27:40.543986417Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": "", - "insertId": "whaw58fih8uxr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:27:40.543993795Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:45.183018633Z" - }, - { - "textPayload": "Connected to redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "6xvra3fm31epc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:27:46.216918368Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r", - "process": "connection.py:22" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:51.288631477Z" - }, - { - "textPayload": "mingle: searching for neighbors", - "insertId": "6xvra3fm31epd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:27:46.240908143Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r", - "process": "mingle.py:40" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:51.288631477Z" - }, - { - "textPayload": "mingle: all alone", - "insertId": "6xvra3fm31epe", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:27:47.326261344Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r", - "process": "mingle.py:49" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:51.288631477Z" - }, - { - "textPayload": "celery@airflow-worker-qj84r ready.", - "insertId": "6xvra3fm31epf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:27:47.357812632Z", - "severity": "INFO", - "labels": { - "process": "worker.py:176", - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:51.288631477Z" - }, - { - "textPayload": "Events of group {task} enabled by remote.", - "insertId": "6xvra3fm31epg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:27:50.027428963Z", - "severity": "INFO", - "labels": { - "process": "control.py:277", - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:27:51.288631477Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[95b8466c-e6a5-4729-8389-bcbcd2ff399a] received", - "insertId": "17ksjmifp7x0vc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:30:03.311130307Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "[95b8466c-e6a5-4729-8389-bcbcd2ff399a] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T11:20:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "17ksjmifp7x0vd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:30:03.357480433Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "17ksjmifp7x0ve", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:30:03.914478670Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qj84r", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "17ksjmifp7x0vf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:30:03.917805804Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qj84r", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "17ksjmifp7x0vg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:30:04.050087902Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qj84r", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "17ksjmifp7x0vh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:30:05.058657443Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "Running on host airflow-worker-qj84r", - "insertId": "17ksjmifp7x0vi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:30:05.661507097Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "17ksjmifp7x0vj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:30:05.787084383Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T11:20:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-qj84r", - "task-id": "echo", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "17ksjmifp7x0vk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:30:05.809187974Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T11:20:00+00:00", - "worker_id": "airflow-worker-qj84r", - "map-index": "-1", - "task-id": "echo", - "process": "taskinstance.py:1091", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "17ksjmifp7x0vl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:30:05.810250479Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "taskinstance.py:1289", - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T11:20:00+00:00", - "task-id": "echo", - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "17ksjmifp7x0vm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:30:05.810653598Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1290", - "task-id": "echo", - "worker_id": "airflow-worker-qj84r", - "map-index": "-1", - "execution-date": "2023-09-13T11:20:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "17ksjmifp7x0vn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:30:05.811483326Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-qj84r", - "process": "taskinstance.py:1291", - "execution-date": "2023-09-13T11:20:00+00:00", - "try-number": "1", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "17ksjmifp7x0vo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:30:06.152179303Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "17ksjmifp7x0vp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:30:06.152247074Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "17ksjmifp7x0vq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:30:06.173876763Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "17ksjmifp7x0vr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:30:06.173945562Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "Executing on 2023-09-13 11:20:00+00:00", - "insertId": "17ksjmifp7x0vs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:30:07.487942875Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1310", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-qj84r", - "execution-date": "2023-09-13T11:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T11:20:00+00:00', '--job-id', '994', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp_ivd0fo6']", - "insertId": "17ksjmifp7x0vt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:30:07.528450981Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:82", - "task-id": "echo", - "execution-date": "2023-09-13T11:20:00+00:00", - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-qj84r", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "Started process 249 to run task", - "insertId": "17ksjmifp7x0vu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:30:07.529073878Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T11:20:00+00:00", - "task-id": "echo", - "worker_id": "airflow-worker-qj84r", - "process": "standard_task_runner.py:55" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "Job 994: Subtask echo", - "insertId": "17ksjmifp7x0vv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:30:07.530330829Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T11:20:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-qj84r", - "task-id": "echo", - "process": "standard_task_runner.py:83", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:08.783716944Z" - }, - { - "textPayload": "Running on host airflow-worker-qj84r", - "insertId": "ezy19pfm19s5i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:30:07.916502867Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r", - "process": "task_command.py:393", - "execution-date": "2023-09-13T11:20:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:13.882135800Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T11:20:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T11:20:00+00:00", - "insertId": "ezy19pfm19s5j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:30:08.094890343Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T11:20:00+00:00", - "process": "taskinstance.py:1518", - "worker_id": "airflow-worker-qj84r", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:13.882135800Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "ezy19pfm19s5k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:30:08.097159541Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T11:20:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-qj84r", - "task-id": "echo", - "process": "subprocess.py:63" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:13.882135800Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "ezy19pfm19s5l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:30:08.099083541Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T11:20:00+00:00", - "map-index": "-1", - "process": "subprocess.py:75", - "worker_id": "airflow-worker-qj84r", - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:13.882135800Z" - }, - { - "textPayload": "Output:", - "insertId": "ezy19pfm19s5m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:30:08.256230752Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-qj84r", - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T11:20:00+00:00", - "process": "subprocess.py:86", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:13.882135800Z" - }, - { - "textPayload": "test", - "insertId": "ezy19pfm19s5n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:30:08.265532655Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-qj84r", - "workflow": "airflow_monitoring", - "task-id": "echo", - "execution-date": "2023-09-13T11:20:00+00:00", - "process": "subprocess.py:93", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:13.882135800Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "ezy19pfm19s5o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:30:08.266591011Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T11:20:00+00:00", - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-qj84r", - "process": "subprocess.py:97", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:13.882135800Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T112000, start_date=20230913T113005, end_date=20230913T113008", - "insertId": "ezy19pfm19s5p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:30:08.335528438Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T11:20:00+00:00", - "map-index": "-1", - "process": "taskinstance.py:1328", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-qj84r", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:13.882135800Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "ezy19pfm19s5q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:30:09.130619559Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "local_task_job.py:212", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-qj84r", - "execution-date": "2023-09-13T11:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:13.882135800Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "ezy19pfm19s5r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:30:09.212556628Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T11:20:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-qj84r", - "task-id": "echo", - "map-index": "-1", - "process": "taskinstance.py:2599" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:13.882135800Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[95b8466c-e6a5-4729-8389-bcbcd2ff399a] succeeded in 6.0669647970062215s: None", - "insertId": "ezy19pfm19s5s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:30:09.383373929Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:30:13.882135800Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1l5pnbvf8d4ee7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:32:13.544899206Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:32:16.009577339Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1xn8s00fpefpeq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:37:21.737352454Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:37:26.979848918Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[db2beb03-a300-46a2-9a11-4683fa3ddd5d] received", - "insertId": "12uuhqxflyx821", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:02.008681638Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "[db2beb03-a300-46a2-9a11-4683fa3ddd5d] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T11:30:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "12uuhqxflyx822", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:02.016446359Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "12uuhqxflyx823", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:02.352648389Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qj84r", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "12uuhqxflyx824", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:02.411793842Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-qj84r", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "12uuhqxflyx825", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:02.567502914Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "12uuhqxflyx826", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:03.505653106Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Running on host airflow-worker-qj84r", - "insertId": "12uuhqxflyx827", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:04.069114934Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "12uuhqxflyx828", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:04.192659410Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "try-number": "1", - "worker_id": "airflow-worker-qj84r", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T11:30:00+00:00", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "12uuhqxflyx829", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:04.212639679Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T11:30:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-qj84r", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "12uuhqxflyx82a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:04.213028768Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "taskinstance.py:1289", - "worker_id": "airflow-worker-qj84r", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T11:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "12uuhqxflyx82b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:04.213493354Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T11:30:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-qj84r", - "process": "taskinstance.py:1290", - "task-id": "echo", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "12uuhqxflyx82c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:04.214456642Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T11:30:00+00:00", - "worker_id": "airflow-worker-qj84r", - "process": "taskinstance.py:1291", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "12uuhqxflyx82d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:04.467431265Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "12uuhqxflyx82e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:04.467476691Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "12uuhqxflyx82f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:04.487031160Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "12uuhqxflyx82g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:04.487070056Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Executing on 2023-09-13 11:30:00+00:00", - "insertId": "12uuhqxflyx82h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:05.319721980Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r", - "task-id": "echo", - "workflow": "airflow_monitoring", - "try-number": "1", - "execution-date": "2023-09-13T11:30:00+00:00", - "process": "taskinstance.py:1310", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Started process 502 to run task", - "insertId": "12uuhqxflyx82i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:05.367949926Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T11:30:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-qj84r", - "process": "standard_task_runner.py:55", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T11:30:00+00:00', '--job-id', '997', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmprisre19q']", - "insertId": "12uuhqxflyx82j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:05.369136002Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:82", - "worker_id": "airflow-worker-qj84r", - "execution-date": "2023-09-13T11:30:00+00:00", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Job 997: Subtask echo", - "insertId": "12uuhqxflyx82k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:05.371182683Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T11:30:00+00:00", - "worker_id": "airflow-worker-qj84r", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "standard_task_runner.py:83", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Running on host airflow-worker-qj84r", - "insertId": "12uuhqxflyx82l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:05.740891990Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-qj84r", - "task-id": "echo", - "try-number": "1", - "process": "task_command.py:393", - "map-index": "-1", - "execution-date": "2023-09-13T11:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T11:30:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T11:30:00+00:00", - "insertId": "12uuhqxflyx82m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:05.939457047Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-qj84r", - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T11:30:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1518" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "12uuhqxflyx82n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:05.941550389Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T11:30:00+00:00", - "map-index": "-1", - "try-number": "1", - "process": "subprocess.py:63", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-qj84r", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "12uuhqxflyx82o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:05.943884123Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r", - "execution-date": "2023-09-13T11:30:00+00:00", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1", - "try-number": "1", - "process": "subprocess.py:75" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Output:", - "insertId": "12uuhqxflyx82p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:06.092460909Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T11:30:00+00:00", - "map-index": "-1", - "process": "subprocess.py:86", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "test", - "insertId": "12uuhqxflyx82q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:06.099347822Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-qj84r", - "try-number": "1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T11:30:00+00:00", - "process": "subprocess.py:93", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "12uuhqxflyx82r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:06.099385370Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "subprocess.py:97", - "map-index": "-1", - "worker_id": "airflow-worker-qj84r", - "execution-date": "2023-09-13T11:30:00+00:00", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T113000, start_date=20230913T114004, end_date=20230913T114006", - "insertId": "12uuhqxflyx82s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:06.156603608Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T11:30:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-qj84r", - "process": "taskinstance.py:1328" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:07.459611817Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "dc28y7f8qx892", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:06.833192690Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "process": "local_task_job.py:212", - "worker_id": "airflow-worker-qj84r", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1", - "execution-date": "2023-09-13T11:30:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:12.937279024Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "dc28y7f8qx893", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:06.883236422Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1", - "process": "taskinstance.py:2599", - "execution-date": "2023-09-13T11:30:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:12.937279024Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[db2beb03-a300-46a2-9a11-4683fa3ddd5d] succeeded in 5.029245264013298s: None", - "insertId": "dc28y7f8qx894", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:07.042730629Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:12.937279024Z" - }, - { - "textPayload": "I0913 11:40:40.275235 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "18elui2fcls2zf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:40.275483675Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:44.935099680Z" - }, - { - "textPayload": "I0913 11:40:40.277119 1 airflowworkerset_controller.go:268] \"controllers/AirflowWorkerSet: Worker uses old template. Recreating.\" worker name=\"airflow-worker-qj84r\"", - "insertId": "18elui2fcls2zg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:40.277306249Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:44.935099680Z" - }, - { - "textPayload": "I0913 11:40:40.301212 1 airflowworkerset_controller.go:77] \"controllers/AirflowWorkerSet: Template changed, workers recreated.\"", - "insertId": "18elui2fcls2zh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:40.301434366Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:44.935099680Z" - }, - { - "textPayload": "I0913 11:40:40.301684 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "18elui2fcls2zi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:40.301773804Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:44.935099680Z" - }, - { - "textPayload": "Caught SIGTERM signal!", - "insertId": "qxdjmifp8jrny", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:40.326421650Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:45.387969681Z" - }, - { - "textPayload": "Passing SIGTERM to Airflow process.", - "insertId": "qxdjmifp8jrnz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:40.326487719Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:45.387969681Z" - }, - { - "textPayload": "", - "insertId": "qxdjmifp8jro0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:40.326784704Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:45.387969681Z" - }, - { - "textPayload": "worker: Warm shutdown (MainProcess)", - "insertId": "qxdjmifp8jro1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:40.326809239Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:45.387969681Z" - }, - { - "textPayload": "I0913 11:40:40.341460 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "18elui2fcls2zj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:40.341706937Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:44.935099680Z" - }, - { - "textPayload": "Exiting due to SIGTERM.", - "insertId": "qxdjmifp8jro2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:44.122104399Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-qj84r" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:40:45.387969681Z" - }, - { - "textPayload": "I0913 11:40:44.719672 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ckbkllf6tuq6c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:44.719902023Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:51.031272527Z" - }, - { - "textPayload": "I0913 11:40:44.720936 1 airflowworkerset_controller.go:97] \"controllers/AirflowWorkerSet: Workers scale up needed.\" current number of workers=0 desired=1 scaling up by=1", - "insertId": "1ckbkllf6tuq6d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:44.721106708Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:51.031272527Z" - }, - { - "textPayload": "I0913 11:40:44.895777 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ckbkllf6tuq6e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:44.896136186Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:51.031272527Z" - }, - { - "textPayload": "I0913 11:40:44.938789 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ckbkllf6tuq6f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:44.939032106Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:51.031272527Z" - }, - { - "textPayload": "I0913 11:40:44.962845 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ckbkllf6tuq6g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:44.963106295Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:51.031272527Z" - }, - { - "textPayload": "I0913 11:40:45.622172 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ckbkllf6tuq6h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:45.622462594Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:51.031272527Z" - }, - { - "textPayload": "I0913 11:40:45.638196 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ckbkllf6tuq6i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:45.638460119Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:51.031272527Z" - }, - { - "textPayload": "I0913 11:40:45.651953 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ckbkllf6tuq6j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:45.653344561Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:51.031272527Z" - }, - { - "textPayload": "Starting the process, got command: worker", - "insertId": "ftte42fp9vnvr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:46.245782106Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:05.621997552Z" - }, - { - "textPayload": "Initializing airflow.cfg.", - "insertId": "ftte42fp9vnvs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:46.247778226Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:05.621997552Z" - }, - { - "textPayload": "airflow.cfg initialization is done.", - "insertId": "ftte42fp9vnvt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:46.275055287Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:05.621997552Z" - }, - { - "textPayload": "I0913 11:40:46.627722 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ckbkllf6tuq6k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:46.628001779Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:51.031272527Z" - }, - { - "textPayload": "I0913 11:40:46.672524 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1ckbkllf6tuq6l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:46.672751970Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:51.031272527Z" - }, - { - "textPayload": "I0913 11:40:52.011257 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "k4w1kfj9oksd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:52.011787711Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T11:40:58.211811628Z" - }, - { - "textPayload": "Setupping GCS Fuse.", - "insertId": "ftte42fp9vnvu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:40:53.534812413Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:05.621997552Z" - }, - { - "textPayload": "gcsfuse mount seems ready, proceeding.", - "insertId": "ftte42fp9vnvv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:40:53.537119010Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:05.621997552Z" - }, - { - "textPayload": "Initializing kube_config.", - "insertId": "ftte42fp9vnvw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:40:53.550010853Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:05.621997552Z" - }, - { - "textPayload": "Fetching cluster endpoint and auth data.", - "insertId": "ftte42fp9vnvx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:41:00.738266589Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:05.621997552Z" - }, - { - "textPayload": "kubeconfig entry generated for us-west1-openlineage-1614b57c-gke.", - "insertId": "ftte42fp9vnvy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:41:00.914637287Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:05.621997552Z" - }, - { - "textPayload": "/home/airflow/composer_kube_config is initialized", - "insertId": "rhgbdifj3rl1b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:41:06.334794268Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:11.729802152Z" - }, - { - "textPayload": "Waiting for dags and plugins synchronization.", - "insertId": "rhgbdifj3rl1c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:41:06.335567069Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:11.729802152Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "rhgbdifj3rl1d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:41:06.335783131Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:11.729802152Z" - }, - { - "textPayload": "Searching for recent worker pod evictions", - "insertId": "rhgbdifj3rl1e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:41:06.345043377Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:11.729802152Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "5a7l6of6wqduv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:41:11.353333796Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:16.850112322Z" - }, - { - "textPayload": "Finished searching for recent worker pod evictions", - "insertId": "5a7l6of6wqduw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:41:13.411732517Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:16.850112322Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1kvuve0fpe84gs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:41:16.369347276Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:21.825414991Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "mr5tjaf6v31yx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:41:21.375513106Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:26.826191090Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "f9jqxcfifng43", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:41:26.382666484Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:31.832079091Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1q5tu8efpg8cm4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:41:31.389619760Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:36.829438735Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1yr8ptufj671uc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:41:36.394784377Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:41.824187232Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "152rm1ofm0s7t5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:41:41.402172692Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:46.827610773Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1gzzi96fdm75pb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:41:46.411114949Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:51.825868077Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1yr82yyfj2y1ul", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:41:51.416083281Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:41:56.828493945Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "289vddfm5eyuj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:41:56.422522845Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:42:01.825194165Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "bnlixrfj56g7k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:42:01.429085326Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:42:06.824407118Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "sb49dzf6tfhpm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:42:06.437849484Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:42:11.841095640Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "atkwylfielkdm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:42:11.447645106Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:42:16.848157816Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1vfatttffeyu1x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:42:16.454995797Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:42:21.823322833Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "4qezw1fj3k9fl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:42:21.461324264Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:42:26.824600096Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1vflbw9f6sv969", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:42:26.467992519Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:42:31.830054608Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "1h9zaycfj31vfd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:42:31.475519833Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:42:36.826914011Z" - }, - { - "textPayload": "Dags and plugins are not synced yet", - "insertId": "gdexkjfihu6ze", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:42:36.481673491Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:42:41.826110878Z" - }, - { - "textPayload": "Dags and plugins are synced", - "insertId": "1rjfw4rfj49rqf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:42:41.489461507Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:42:46.831805689Z" - }, - { - "textPayload": "Starting Airflow Celery Flower API.", - "insertId": "1rjfw4rfj49rqg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:42:41.491187577Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:42:46.831805689Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "15cwy51flz5r4y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:43:04.017118550Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:10.017347171Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "15cwy51flz5r4z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:43:04.033930368Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:10.017347171Z" - }, - { - "textPayload": " ", - "insertId": "xuoq7hf6wv3mq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:43:19.825297487Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": " -------------- celery@airflow-worker-r72xf v5.2.7 (dawn-chorus)", - "insertId": "xuoq7hf6wv3mr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:43:19.825394011Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "xuoq7hf6wv3ms", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:43:19.825404512Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": "-- ******* ---- Linux-5.15.109+-x86_64-with-glibc2.27 2023-09-13 11:43:19", - "insertId": "xuoq7hf6wv3mt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:43:19.825411055Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": "- *** --- * --- ", - "insertId": "xuoq7hf6wv3mu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:43:19.825416877Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": "- ** ---------- [config]", - "insertId": "xuoq7hf6wv3mv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:43:19.825423022Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": "- ** ---------- .> app: airflow.executors.celery_executor:0x7b39a36be610", - "insertId": "xuoq7hf6wv3mw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:43:19.825467440Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": "- ** ---------- .> transport: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "xuoq7hf6wv3mx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:43:19.825480368Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": "- ** ---------- .> results: redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "xuoq7hf6wv3my", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:43:19.825487491Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": "- *** --- * --- .> concurrency: 6 (prefork)", - "insertId": "xuoq7hf6wv3mz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:43:19.825493799Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": "-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)", - "insertId": "xuoq7hf6wv3n0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:43:19.825499363Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": "--- ***** ----- ", - "insertId": "xuoq7hf6wv3n1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:43:19.825504839Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": " -------------- [queues]", - "insertId": "xuoq7hf6wv3n2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:43:19.825510836Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": " .> default exchange=default(direct) key=default", - "insertId": "xuoq7hf6wv3n3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:43:19.825515989Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": " ", - "insertId": "xuoq7hf6wv3n4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:43:19.825521699Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": "", - "insertId": "xuoq7hf6wv3n5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:43:19.825527319Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": "[tasks]", - "insertId": "xuoq7hf6wv3n6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:43:19.825533437Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": " . airflow.executors.celery_executor.execute_command", - "insertId": "xuoq7hf6wv3n7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:43:19.825539215Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": "", - "insertId": "xuoq7hf6wv3n8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:43:19.825555618Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:24.141002921Z" - }, - { - "textPayload": "Connected to redis://airflow-redis-service.composer-system.svc.cluster.local:6379/0", - "insertId": "1ha7idsfilr4g0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:43:25.421833954Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "connection.py:22" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:31.197407141Z" - }, - { - "textPayload": "mingle: searching for neighbors", - "insertId": "1ha7idsfilr4g1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:43:25.432026390Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "mingle.py:40" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:31.197407141Z" - }, - { - "textPayload": "mingle: all alone", - "insertId": "1ha7idsfilr4g2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:43:26.529413454Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "mingle.py:49" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:31.197407141Z" - }, - { - "textPayload": "celery@airflow-worker-r72xf ready.", - "insertId": "1ha7idsfilr4g3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:43:26.566544340Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "worker.py:176" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:31.197407141Z" - }, - { - "textPayload": "Events of group {task} enabled by remote.", - "insertId": "1ha7idsfilr4g4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:43:29.213813870Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "control.py:277" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:43:31.197407141Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "5ud9bhfm1u8me", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:47:50.245134520Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:47:55.979571712Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[90d68e21-59e8-4e9c-8027-eab5631924ea] received", - "insertId": "1v5o2rcf7ih1kf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:50:01.257576493Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:03.833841703Z" - }, - { - "textPayload": "[90d68e21-59e8-4e9c-8027-eab5631924ea] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T11:40:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "1v5o2rcf7ih1kg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:50:01.307358562Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:03.833841703Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1v5o2rcf7ih1kh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:50:01.735143092Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:03.833841703Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1v5o2rcf7ih1ki", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:50:01.737695917Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:03.833841703Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1v5o2rcf7ih1kj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:50:01.858263147Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:03.833841703Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "yyz1o6ficly0q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:50:02.853041783Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "yyz1o6ficly0r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:50:03.432761474Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "yyz1o6ficly0s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:50:03.711208787Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-13T11:40:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "yyz1o6ficly0t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:50:03.732122063Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "echo", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-13T11:40:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "yyz1o6ficly0u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:50:03.732665329Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:1289", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T11:40:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "yyz1o6ficly0v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:50:03.733159607Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T11:40:00+00:00", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "try-number": "1", - "process": "taskinstance.py:1290", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "yyz1o6ficly0w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:50:03.733617493Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T11:40:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "task-id": "echo", - "process": "taskinstance.py:1291", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "yyz1o6ficly0x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:50:03.993487676Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "yyz1o6ficly0y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:50:03.993537573Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "yyz1o6ficly0z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:50:04.044011104Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "yyz1o6ficly10", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:50:04.044049227Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Executing on 2023-09-13 11:40:00+00:00", - "insertId": "yyz1o6ficly11", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:50:04.924325972Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1310", - "map-index": "-1", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T11:40:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Started process 318 to run task", - "insertId": "yyz1o6ficly12", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:50:04.957934830Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T11:40:00+00:00", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:55" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T11:40:00+00:00', '--job-id', '999', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpe0pnes2g']", - "insertId": "yyz1o6ficly13", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:50:04.959229690Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T11:40:00+00:00", - "map-index": "-1", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:82", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Job 999: Subtask echo", - "insertId": "yyz1o6ficly14", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:50:04.960626818Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T11:40:00+00:00", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:83" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "yyz1o6ficly15", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:50:05.319658691Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T11:40:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf", - "process": "task_command.py:393", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T11:40:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T11:40:00+00:00", - "insertId": "yyz1o6ficly16", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:50:05.516364931Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1518", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T11:40:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "yyz1o6ficly17", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:50:05.518191553Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "subprocess.py:63", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T11:40:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "yyz1o6ficly18", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:50:05.519774488Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T11:40:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "process": "subprocess.py:75", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Output:", - "insertId": "yyz1o6ficly19", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T11:50:05.670413698Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:86", - "worker_id": "airflow-worker-r72xf", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T11:40:00+00:00", - "try-number": "1", - "map-index": "-1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "test", - "insertId": "yyz1o6ficly1a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:50:05.679489104Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "subprocess.py:93", - "task-id": "echo", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T11:40:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "yyz1o6ficly1b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:50:05.680769526Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T11:40:00+00:00", - "task-id": "echo", - "process": "subprocess.py:97", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T114000, start_date=20230913T115003, end_date=20230913T115005", - "insertId": "yyz1o6ficly1c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:50:05.722628754Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T11:40:00+00:00", - "process": "taskinstance.py:1328", - "map-index": "-1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "yyz1o6ficly1d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:50:06.462631571Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "process": "local_task_job.py:212", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "execution-date": "2023-09-13T11:40:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "yyz1o6ficly1e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:50:06.544646658Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:2599", - "execution-date": "2023-09-13T11:40:00+00:00", - "task-id": "echo", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[90d68e21-59e8-4e9c-8027-eab5631924ea] succeeded in 5.450612828019075s: None", - "insertId": "yyz1o6ficly1f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T11:50:06.712560515Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:50:08.956951246Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1yhp8opfj4a444", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:53:03.330157002Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:53:09.073027124Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "9629lifp7jw7n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T11:57:55.236740514Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T11:58:00.029533955Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[ed5a4ead-3129-4da5-8e0c-bc464e3d4e7a] received", - "insertId": "14sqdezf6tluls", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:00:01.173114232Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:04.801707771Z" - }, - { - "textPayload": "[ed5a4ead-3129-4da5-8e0c-bc464e3d4e7a] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T11:50:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "14sqdezf6tlult", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:00:01.178935906Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:04.801707771Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "14sqdezf6tlulu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:00:01.615709944Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:04.801707771Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "14sqdezf6tlulv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:00:01.618903780Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:04.801707771Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "14sqdezf6tlulw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:00:01.739509947Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:04.801707771Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "14sqdezf6tlulx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:00:02.708432061Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:04.801707771Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "14sqdezf6tluly", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:00:03.274813092Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:04.801707771Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "14sqdezf6tlulz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:00:03.513438906Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T11:50:00+00:00", - "worker_id": "airflow-worker-r72xf", - "task-id": "echo", - "process": "taskinstance.py:1091", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:04.801707771Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "14sqdezf6tlum0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:00:03.608365219Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "taskinstance.py:1091", - "workflow": "airflow_monitoring", - "task-id": "echo", - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "execution-date": "2023-09-13T11:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:04.801707771Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "14sqdezf6tlum1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:00:03.608731301Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "taskinstance.py:1289", - "worker_id": "airflow-worker-r72xf", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T11:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:04.801707771Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "14sqdezf6tlum2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:00:03.609085425Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T11:50:00+00:00", - "process": "taskinstance.py:1290", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:04.801707771Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "14sqdezf6tlum3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:00:03.609614431Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "workflow": "airflow_monitoring", - "task-id": "echo", - "process": "taskinstance.py:1291", - "map-index": "-1", - "execution-date": "2023-09-13T11:50:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:04.801707771Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "18yhbjhf505hcc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:00:03.900709427Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "18yhbjhf505hcd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:00:03.900784090Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "18yhbjhf505hce", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:00:03.928873985Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "18yhbjhf505hcf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:00:03.928917488Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "Executing on 2023-09-13 11:50:00+00:00", - "insertId": "18yhbjhf505hcg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:00:04.830601510Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "process": "taskinstance.py:1310", - "worker_id": "airflow-worker-r72xf", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T11:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "Started process 541 to run task", - "insertId": "18yhbjhf505hch", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:00:04.868980014Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T11:50:00+00:00", - "worker_id": "airflow-worker-r72xf", - "process": "standard_task_runner.py:55", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T11:50:00+00:00', '--job-id', '1000', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp9n4k4g97']", - "insertId": "18yhbjhf505hci", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:00:04.870271284Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:82", - "try-number": "1", - "task-id": "echo", - "execution-date": "2023-09-13T11:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "Job 1000: Subtask echo", - "insertId": "18yhbjhf505hcj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:00:04.871702698Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "process": "standard_task_runner.py:83", - "execution-date": "2023-09-13T11:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "18yhbjhf505hck", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:00:05.252000374Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T11:50:00+00:00", - "task-id": "echo", - "try-number": "1", - "process": "task_command.py:393", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T11:50:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T11:50:00+00:00", - "insertId": "18yhbjhf505hcl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:00:05.435451290Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "taskinstance.py:1518", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T11:50:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "18yhbjhf505hcm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:00:05.437684966Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T11:50:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "task-id": "echo", - "map-index": "-1", - "process": "subprocess.py:63" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "18yhbjhf505hcn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:00:05.439347741Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T11:50:00+00:00", - "process": "subprocess.py:75", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "Output:", - "insertId": "18yhbjhf505hco", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:00:05.578342243Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "workflow": "airflow_monitoring", - "try-number": "1", - "execution-date": "2023-09-13T11:50:00+00:00", - "process": "subprocess.py:86", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "test", - "insertId": "18yhbjhf505hcp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:00:05.586681917Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T11:50:00+00:00", - "worker_id": "airflow-worker-r72xf", - "process": "subprocess.py:93", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "18yhbjhf505hcq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:00:05.587268200Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T11:50:00+00:00", - "process": "subprocess.py:97", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T115000, start_date=20230913T120003, end_date=20230913T120005", - "insertId": "18yhbjhf505hcr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:00:05.635348913Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1328", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T11:50:00+00:00", - "worker_id": "airflow-worker-r72xf", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "18yhbjhf505hcs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:00:06.291564558Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "workflow": "airflow_monitoring", - "task-id": "echo", - "try-number": "1", - "process": "local_task_job.py:212", - "execution-date": "2023-09-13T11:50:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "18yhbjhf505hct", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:00:06.402895520Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T11:50:00+00:00", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:2599", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-r72xf", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[ed5a4ead-3129-4da5-8e0c-bc464e3d4e7a] succeeded in 5.384852607996436s: None", - "insertId": "18yhbjhf505hcu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:00:06.562140681Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:00:09.930104361Z" - }, - { - "textPayload": "E0913 12:02:55.805391063 613 thd.cc:157] pthread_create failed: Resource temporarily unavailable", - "insertId": "1gzzi96fdnrmps", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:02:55.805800383Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:02:59.813370549Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1gzzi96fdnrmpt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:02:57.440564524Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:02:59.813370549Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1v5m7ysf6uce2b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:08:07.921291425Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:08:14.044321694Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[092f3653-d0b2-4a89-905c-665b2168a1bc] received", - "insertId": "1x3nuypfj936gt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:10:00.468670886Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:03.812729720Z" - }, - { - "textPayload": "[092f3653-d0b2-4a89-905c-665b2168a1bc] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T12:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "1x3nuypfj936gu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:10:00.474817149Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:03.812729720Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1x3nuypfj936gv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:10:00.828348653Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:03.812729720Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1x3nuypfj936gw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:10:00.830744629Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:03.812729720Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1x3nuypfj936gx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:10:00.944468720Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:03.812729720Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "1x3nuypfj936gy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:10:01.911818628Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:03.812729720Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "1x3nuypfj936gz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:10:02.517479197Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:03.812729720Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1x3nuypfj936h0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:10:02.681369399Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "echo", - "process": "taskinstance.py:1091", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:00:00+00:00", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:03.812729720Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1x3nuypfj936h1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:10:02.713255512Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:00:00+00:00", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:03.812729720Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1x3nuypfj936h2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:10:02.714146616Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "taskinstance.py:1289", - "task-id": "echo", - "execution-date": "2023-09-13T12:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:03.812729720Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "1x3nuypfj936h3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:10:02.714877079Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "process": "taskinstance.py:1290", - "map-index": "-1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T12:00:00+00:00", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:03.812729720Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1x3nuypfj936h4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:10:02.715608022Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:00:00+00:00", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "process": "taskinstance.py:1291", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:03.812729720Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1y7k36efbzgubv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:10:02.960843801Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1y7k36efbzgubw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:10:02.960887882Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1y7k36efbzgubx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:10:03.021307126Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1y7k36efbzguby", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:10:03.021360121Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "Executing on 2023-09-13 12:00:00+00:00", - "insertId": "1y7k36efbzgubz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:10:03.867913428Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1310", - "task-id": "echo", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "execution-date": "2023-09-13T12:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "Started process 777 to run task", - "insertId": "1y7k36efbzguc0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:10:03.901947511Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "process": "standard_task_runner.py:55", - "task-id": "echo", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "execution-date": "2023-09-13T12:00:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T12:00:00+00:00', '--job-id', '1001', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmpm37rct5_']", - "insertId": "1y7k36efbzguc1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:10:03.905663233Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T12:00:00+00:00", - "worker_id": "airflow-worker-r72xf", - "task-id": "echo", - "process": "standard_task_runner.py:82", - "workflow": "airflow_monitoring", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "Job 1001: Subtask echo", - "insertId": "1y7k36efbzguc2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:10:03.907087917Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "task-id": "echo", - "execution-date": "2023-09-13T12:00:00+00:00", - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "map-index": "-1", - "process": "standard_task_runner.py:83" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "1y7k36efbzguc3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:10:04.328793562Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:00:00+00:00", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T12:00:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T12:00:00+00:00", - "insertId": "1y7k36efbzguc4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:10:04.552168600Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "execution-date": "2023-09-13T12:00:00+00:00", - "process": "taskinstance.py:1518", - "try-number": "1", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1y7k36efbzguc5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:10:04.554095935Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "echo", - "process": "subprocess.py:63", - "try-number": "1", - "execution-date": "2023-09-13T12:00:00+00:00", - "worker_id": "airflow-worker-r72xf", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1y7k36efbzguc6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:10:04.555631774Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:75", - "task-id": "echo", - "map-index": "-1", - "execution-date": "2023-09-13T12:00:00+00:00", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "Output:", - "insertId": "1y7k36efbzguc7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:10:04.711494388Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "subprocess.py:86", - "execution-date": "2023-09-13T12:00:00+00:00", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "test", - "insertId": "1y7k36efbzguc8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:10:04.719839677Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:00:00+00:00", - "process": "subprocess.py:93", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "task-id": "echo", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1y7k36efbzguc9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:10:04.720571689Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T12:00:00+00:00", - "process": "subprocess.py:97", - "task-id": "echo", - "worker_id": "airflow-worker-r72xf", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T120000, start_date=20230913T121002, end_date=20230913T121004", - "insertId": "1y7k36efbzguca", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:10:04.762517764Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:00:00+00:00", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1328", - "worker_id": "airflow-worker-r72xf", - "task-id": "echo", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1y7k36efbzgucb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:10:05.571999839Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:00:00+00:00", - "task-id": "echo", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf", - "process": "local_task_job.py:212" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1y7k36efbzgucc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:10:05.657425919Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:00:00+00:00", - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:2599", - "workflow": "airflow_monitoring", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[092f3653-d0b2-4a89-905c-665b2168a1bc] succeeded in 5.364713379996829s: None", - "insertId": "1y7k36efbzgucd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:10:05.837161720Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:10:09.888456604Z" - }, - { - "textPayload": "E0913 12:12:56.218907341 848 thd.cc:157] pthread_create failed: Resource temporarily unavailable", - "insertId": "1o7uylsfpf8kub", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:12:56.220302302Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:02.894308674Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "ktpy6gfjbd8ma", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:03.623174839Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:08.989937430Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[f2d2b96b-cb54-4a95-a76d-74a158ce38e0] received", - "insertId": "ktpy6gfjbd8mb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:07.148918229Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:08.989937430Z" - }, - { - "textPayload": "[f2d2b96b-cb54-4a95-a76d-74a158ce38e0] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'run_bq_external_ingestion', 'manual__2023-09-13T12:13:02.076777+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "ktpy6gfjbd8mc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:07.205011887Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:08.989937430Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "ktpy6gfjbd8md", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:08.022935952Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:08.989937430Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "ktpy6gfjbd8me", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:08.025569693Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:08.989937430Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "ktpy6gfjbd8mf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:08.433774474Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:08.989937430Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "1htx5o8fj8dw93", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:11.132748949Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:16.165128955Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "1sdmhokfpdbytj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:17.220674121Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1sdmhokfpdbytk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:17.351204940Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1sdmhokfpdbytl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:17.369368586Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1sdmhokfpdbytm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:17.369824085Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "try-number": "1", - "process": "taskinstance.py:1289", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "1sdmhokfpdbytn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:17.370268416Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "taskinstance.py:1290", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "task-id": "run_bq_external_ingestion", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1sdmhokfpdbyto", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:17.370649889Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "taskinstance.py:1291", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1sdmhokfpdbytp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:17.710130352Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1sdmhokfpdbytq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:17.710156964Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1sdmhokfpdbytr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:17.749521573Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1sdmhokfpdbyts", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:17.749593370Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Executing on 2023-09-13 12:13:02.076777+00:00", - "insertId": "1sdmhokfpdbytt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:18.701606985Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "process": "taskinstance.py:1310", - "try-number": "1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Started process 878 to run task", - "insertId": "1sdmhokfpdbytu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:18.743237028Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "try-number": "1", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'run_bq_external_ingestion', 'manual__2023-09-13T12:13:02.076777+00:00', '--job-id', '1002', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmp29e10h91']", - "insertId": "1sdmhokfpdbytv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:18.743283930Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "standard_task_runner.py:82", - "try-number": "1", - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Job 1002: Subtask run_bq_external_ingestion", - "insertId": "1sdmhokfpdbytw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:18.745004626Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:83", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "try-number": "1", - "task-id": "run_bq_external_ingestion", - "workflow": "data_analytics_dag", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "1sdmhokfpdbytx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:19.128076597Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag", - "process": "task_command.py:393", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=run_bq_external_ingestion\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T12:13:02.076777+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=manual__2023-09-13T12:13:02.076777+00:00", - "insertId": "1sdmhokfpdbyty", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:19.448241690Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "taskinstance.py:1518", - "try-number": "1", - "task-id": "run_bq_external_ingestion", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "1sdmhokfpdbytz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:19.489307489Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "try-number": "1", - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag", - "process": "base.py:73" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Using existing BigQuery table for storing data...", - "insertId": "1sdmhokfpdbyu0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:19.491647435Z", - "severity": "INFO", - "labels": { - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "gcs_to_bigquery.py:375", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "1sdmhokfpdbyu1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:19.492759907Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "credentials_provider.py:353", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "task-id": "run_bq_external_ingestion", - "worker_id": "airflow-worker-r72xf", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Project is not included in destination_project_dataset_table: holiday_weather.holidays; using project \"acceldata-acm\"", - "insertId": "1sdmhokfpdbyu2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:19.510035197Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "bigquery.py:2314", - "try-number": "1", - "task-id": "run_bq_external_ingestion", - "map-index": "-1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Executing: {'load': {'autodetect': True, 'createDisposition': 'CREATE_IF_NEEDED', 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays'}, 'sourceFormat': 'CSV', 'sourceUris': ['gs://openlineagedemo/holidays.csv'], 'writeDisposition': 'WRITE_TRUNCATE', 'ignoreUnknownValues': False, 'schema': {'fields': [{'name': 'Date', 'type': 'DATE'}, {'name': 'Holiday', 'type': 'STRING'}]}, 'skipLeadingRows': 1, 'fieldDelimiter': ',', 'quote': None, 'allowQuotedNewlines': False, 'encoding': 'UTF-8'}}", - "insertId": "1sdmhokfpdbyu3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:19.511343949Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "process": "gcs_to_bigquery.py:379", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "task-id": "run_bq_external_ingestion" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_run_bq_external_ingestion_2023_09_13T12_13_02_076777_00_00_a6068a911228fb61a865591d09efc303", - "insertId": "1sdmhokfpdbyu4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:19.513271195Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag", - "task-id": "run_bq_external_ingestion", - "process": "bigquery.py:1596", - "map-index": "-1", - "execution-date": "2023-09-13T12:13:02.076777+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:23.289743178Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=run_bq_external_ingestion, execution_date=20230913T121302, start_date=20230913T121317, end_date=20230913T121323", - "insertId": "jpflkkf8s8yld", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:23.236554575Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "process": "taskinstance.py:1328", - "task-id": "run_bq_external_ingestion", - "execution-date": "2023-09-13T12:13:02.076777+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[c913e647-a5b5-46b6-9382-aaf6b373e673] received", - "insertId": "jpflkkf8s8yle", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:24.223145136Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "strategy.py:161" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[e31cd580-fc70-4ec5-93e1-8277d5df4a6b] received", - "insertId": "jpflkkf8s8ylf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:24.227885158Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "jpflkkf8s8ylg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:24.248618824Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag", - "task-id": "run_bq_external_ingestion", - "process": "local_task_job.py:212", - "execution-date": "2023-09-13T12:13:02.076777+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "[e31cd580-fc70-4ec5-93e1-8277d5df4a6b] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2021', 'manual__2023-09-13T12:13:02.076777+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "jpflkkf8s8ylh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:24.265428803Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "[c913e647-a5b5-46b6-9382-aaf6b373e673] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2020', 'manual__2023-09-13T12:13:02.076777+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "jpflkkf8s8yli", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:24.265655658Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "jpflkkf8s8ylj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:24.525189683Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "process": "taskinstance.py:2599", - "task-id": "run_bq_external_ingestion", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[f2d2b96b-cb54-4a95-a76d-74a158ce38e0] succeeded in 17.771795433975058s: None", - "insertId": "jpflkkf8s8ylk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:24.925689277Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "jpflkkf8s8yll", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:25.310655798Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "jpflkkf8s8ylm", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:25.314605262Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "jpflkkf8s8yln", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:25.421149024Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "jpflkkf8s8ylo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:25.423949526Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "jpflkkf8s8ylp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:25.634916047Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "jpflkkf8s8ylq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:25.733187905Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "jpflkkf8s8ylr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:27.939716164Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "jpflkkf8s8yls", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:28.020321703Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:29.386251884Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "gnwiwyfpeo9xv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:36.984787444Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "gnwiwyfpeo9xw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:37.097525747Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "gnwiwyfpeo9xx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:37.168717473Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "gnwiwyfpeo9xy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:37.219002321Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "gnwiwyfpeo9xz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:37.219582395Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1289", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "gnwiwyfpeo9y0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:37.220056396Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1290", - "try-number": "1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "gnwiwyfpeo9y1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:37.220501378Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1291", - "try-number": "1", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "gnwiwyfpeo9y2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:37.330869345Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "map-index": "-1", - "process": "taskinstance.py:1091", - "try-number": "1", - "execution-date": "2023-09-13T12:13:02.076777+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "gnwiwyfpeo9y3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:37.413373712Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "gnwiwyfpeo9y4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:37.413805988Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "try-number": "1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "taskinstance.py:1289", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "gnwiwyfpeo9y5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:37.414254975Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "taskinstance.py:1290", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "workflow": "data_analytics_dag", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "gnwiwyfpeo9y6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:37.417016128Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "workflow": "data_analytics_dag", - "try-number": "1", - "process": "taskinstance.py:1291", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "gnwiwyfpeo9y7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:37.708114163Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "gnwiwyfpeo9y8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:37.708151179Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "gnwiwyfpeo9y9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:37.733576318Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "gnwiwyfpeo9ya", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:37.733650328Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Executing on 2023-09-13 12:13:02.076777+00:00", - "insertId": "gnwiwyfpeo9yb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:38.882011101Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1310", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Started process 888 to run task", - "insertId": "gnwiwyfpeo9yc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:38.894536480Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "standard_task_runner.py:55", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2020', 'manual__2023-09-13T12:13:02.076777+00:00', '--job-id', '1003', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpmb_8kpyq']", - "insertId": "gnwiwyfpeo9yd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:38.910474703Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "workflow": "data_analytics_dag", - "try-number": "1", - "process": "standard_task_runner.py:82" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Job 1003: Subtask join_bq_datasets.bq_join_holidays_weather_data_2020", - "insertId": "gnwiwyfpeo9ye", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:38.911511951Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "process": "standard_task_runner.py:83", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "gnwiwyfpeo9yf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:39.308764821Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "gnwiwyfpeo9yg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:39.308839208Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "gnwiwyfpeo9yh", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:39.344433513Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "gnwiwyfpeo9yi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:39.344479747Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "gnwiwyfpeo9yj", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:39.442003392Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2020\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T12:13:02.076777+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=manual__2023-09-13T12:13:02.076777+00:00", - "insertId": "gnwiwyfpeo9yk", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:39.808851100Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1518", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T12:13:02.076777+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "gnwiwyfpeo9yl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:39.851615057Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "base.py:73", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2020 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "gnwiwyfpeo9ym", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:39.854167457Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "bigquery.py:2710", - "workflow": "data_analytics_dag", - "map-index": "-1", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "gnwiwyfpeo9yn", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:39.855068311Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "credentials_provider.py:353", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "try-number": "1", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2020_2023_09_13T12_13_02_076777_00_00_68f39afe6cf69f542bffd2222a8f344c", - "insertId": "gnwiwyfpeo9yo", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:39.870986945Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "bigquery.py:1596", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Executing on 2023-09-13 12:13:02.076777+00:00", - "insertId": "gnwiwyfpeo9yp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:40.006843093Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "taskinstance.py:1310" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Started process 893 to run task", - "insertId": "gnwiwyfpeo9yq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:40.019939137Z", - "severity": "INFO", - "labels": { - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'join_bq_datasets.bq_join_holidays_weather_data_2021', 'manual__2023-09-13T12:13:02.076777+00:00', '--job-id', '1004', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmphjph6pi4']", - "insertId": "gnwiwyfpeo9yr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:40.025287653Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:82", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Job 1004: Subtask join_bq_datasets.bq_join_holidays_weather_data_2021", - "insertId": "gnwiwyfpeo9ys", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:40.025789451Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:83", - "workflow": "data_analytics_dag", - "try-number": "1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "gnwiwyfpeo9yt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:40.442834800Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "workflow": "data_analytics_dag", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=join_bq_datasets.bq_join_holidays_weather_data_2021\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T12:13:02.076777+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=manual__2023-09-13T12:13:02.076777+00:00", - "insertId": "gnwiwyfpeo9yu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:40.718930836Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "try-number": "1", - "process": "taskinstance.py:1518" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "gnwiwyfpeo9yv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:40.771002667Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "base.py:73", - "try-number": "1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Executing: {'query': {'query': '\\n SELECT Holidays.Date, Holiday, id, element, value\\n FROM `acceldata-acm.holiday_weather.holidays` AS Holidays\\n JOIN (SELECT id, date, element, value FROM bigquery-public-data.ghcn_d.ghcnd_2021 AS Table WHERE Table.element=\"TMAX\" AND Table.id=\"USW00094846\") AS Weather\\n ON Holidays.Date = Weather.Date;\\n ', 'useLegacySql': False, 'destinationTable': {'projectId': 'acceldata-acm', 'datasetId': 'holiday_weather', 'tableId': 'holidays_weather_joined'}, 'writeDisposition': 'WRITE_APPEND'}}'", - "insertId": "gnwiwyfpeo9yw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:40.773357778Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "map-index": "-1", - "process": "bigquery.py:2710", - "worker_id": "airflow-worker-r72xf", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "gnwiwyfpeo9yx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:40.773973272Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "try-number": "1", - "workflow": "data_analytics_dag", - "process": "credentials_provider.py:353", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Inserting job airflow_data_analytics_dag_join_bq_datasets_bq_join_holidays_weather_data_2021_2023_09_13T12_13_02_076777_00_00_d60e722fe1802dffe9f3333e4c35c0cb", - "insertId": "gnwiwyfpeo9yy", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:40.790429287Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "workflow": "data_analytics_dag", - "process": "bigquery.py:1596", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2020, execution_date=20230913T121302, start_date=20230913T121337, end_date=20230913T121342", - "insertId": "gnwiwyfpeo9yz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:42.232079593Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "taskinstance.py:1328", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "try-number": "1", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:43.483173506Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "148n48tfige4nu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:42.966292737Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "process": "local_task_job.py:212", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:49.639451447Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "148n48tfige4nv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:43.044706642Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "taskinstance.py:2599", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2020", - "worker_id": "airflow-worker-r72xf", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:49.639451447Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[c913e647-a5b5-46b6-9382-aaf6b373e673] succeeded in 19.08835605799686s: None", - "insertId": "148n48tfige4nw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:43.318372858Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:49.639451447Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=data_analytics_dag, task_id=join_bq_datasets.bq_join_holidays_weather_data_2021, execution_date=20230913T121302, start_date=20230913T121337, end_date=20230913T121343", - "insertId": "148n48tfige4nx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:43.407910468Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-r72xf", - "process": "taskinstance.py:1328", - "map-index": "-1", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:49.639451447Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "148n48tfige4ny", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:44.173395110Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "process": "local_task_job.py:212", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-r72xf", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:49.639451447Z" - }, - { - "textPayload": "1 downstream tasks scheduled from follow-on schedule check", - "insertId": "148n48tfige4nz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:44.254113703Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "try-number": "1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "taskinstance.py:2599", - "task-id": "join_bq_datasets.bq_join_holidays_weather_data_2021", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:49.639451447Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[e31cd580-fc70-4ec5-93e1-8277d5df4a6b] succeeded in 20.240197184000863s: None", - "insertId": "148n48tfige4o0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:44.472822015Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:49.639451447Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[efd1675a-0aec-41f9-92b6-6b07b9b4641c] received", - "insertId": "148n48tfige4o1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:44.893746266Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:49.639451447Z" - }, - { - "textPayload": "[efd1675a-0aec-41f9-92b6-6b07b9b4641c] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'create_batch', 'manual__2023-09-13T12:13:02.076777+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "148n48tfige4o2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:44.899897058Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:49.639451447Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "148n48tfige4o3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:45.240057805Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:49.639451447Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "148n48tfige4o4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:45.240103617Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:49.639451447Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "148n48tfige4o5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:45.520895224Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:49.639451447Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "148n48tfige4o6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:48.022380051Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:13:49.639451447Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "dvqjw8fj75t5n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:56.116297967Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "dvqjw8fj75t5o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:56.403265412Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "workflow": "data_analytics_dag", - "task-id": "create_batch", - "map-index": "-1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "dvqjw8fj75t5p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:56.439102195Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "create_batch", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "dvqjw8fj75t5q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:56.439800907Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "process": "taskinstance.py:1289", - "try-number": "1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "Starting attempt 1 of 3", - "insertId": "dvqjw8fj75t5r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:56.440422671Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "try-number": "1", - "task-id": "create_batch", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "dvqjw8fj75t5s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:56.440937841Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "try-number": "1", - "map-index": "-1", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1291", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "dvqjw8fj75t5t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:57.433973884Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "dvqjw8fj75t5u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:57.434040846Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "dvqjw8fj75t5v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:57.606802890Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "dvqjw8fj75t5w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:57.606871779Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "Executing on 2023-09-13 12:13:02.076777+00:00", - "insertId": "dvqjw8fj75t5x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:13:58.622239661Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "taskinstance.py:1310", - "try-number": "1", - "task-id": "create_batch", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "Started process 915 to run task", - "insertId": "dvqjw8fj75t5y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:58.709024011Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "try-number": "1", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "task-id": "create_batch", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'create_batch', 'manual__2023-09-13T12:13:02.076777+00:00', '--job-id', '1005', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpqygfr9nq']", - "insertId": "dvqjw8fj75t5z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:13:58.722369805Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "task-id": "create_batch", - "process": "standard_task_runner.py:82" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "Job 1005: Subtask create_batch", - "insertId": "dvqjw8fj75t60", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:58.723455391Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "process": "standard_task_runner.py:83", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "dvqjw8fj75t61", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:13:59.634960802Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "task-id": "create_batch", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=create_batch\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T12:13:02.076777+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=manual__2023-09-13T12:13:02.076777+00:00", - "insertId": "dvqjw8fj75t62", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:14:00.635701027Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "process": "taskinstance.py:1518", - "task-id": "create_batch", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "dvqjw8fj75t63", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:14:00.732563915Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "1", - "workflow": "data_analytics_dag", - "task-id": "create_batch", - "process": "base.py:73", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "Creating batch data-processing-20230913t121302", - "insertId": "dvqjw8fj75t64", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:14:00.734682506Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "try-number": "1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "dataproc.py:2349", - "task-id": "create_batch", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "Once started, the batch job will be available at https://console.cloud.google.com/dataproc/batches/us-west1/data-processing-20230913t121302/monitoring?project=acceldata-acm", - "insertId": "dvqjw8fj75t65", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:14:00.735430823Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "workflow": "data_analytics_dag", - "process": "dataproc.py:2350", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "create_batch" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "dvqjw8fj75t66", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:14:00.736592535Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "create_batch", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "credentials_provider.py:353" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:02.871867104Z" - }, - { - "textPayload": "I0913 12:14:10.517252 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "mhj7p8fm1fnhd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:14:10.517488735Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T12:14:16.673571016Z" - }, - { - "textPayload": "Task failed with exception\nTraceback (most recent call last):\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/dataproc.py\", line 261, in wait_for_operation\n return operation.result(timeout=timeout, retry=result_retry)\n File \"/opt/python3.8/lib/python3.8/site-packages/google/api_core/future/polling.py\", line 261, in result\n raise self._exception\ngoogle.api_core.exceptions.Aborted: 409 Constraint constraints/compute.requireOsLogin violated for project 910708407740.\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2371, in execute\n result = hook.wait_for_operation(\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/dataproc.py\", line 264, in wait_for_operation\n raise AirflowException(error)\nairflow.exceptions.AirflowException: 409 Constraint constraints/compute.requireOsLogin violated for project 910708407740.", - "insertId": "1q5vifcfm6vyc2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:14:17.793039193Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "create_batch", - "try-number": "1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1778" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:23.121804325Z" - }, - { - "textPayload": "Marking task as UP_FOR_RETRY. dag_id=data_analytics_dag, task_id=create_batch, execution_date=20230913T121302, start_date=20230913T121356, end_date=20230913T121417", - "insertId": "1q5vifcfm6vyc3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:14:17.801190676Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "taskinstance.py:1328" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:23.121804325Z" - }, - { - "textPayload": "Failed to execute job 1005 for task create_batch (409 Constraint constraints/compute.requireOsLogin violated for project 910708407740.; 915)", - "insertId": "1q5vifcfm6vyc4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:14:17.838197178Z", - "severity": "ERROR", - "labels": { - "task-id": "create_batch", - "worker_id": "airflow-worker-r72xf", - "process": "standard_task_runner.py:100", - "workflow": "data_analytics_dag", - "try-number": "1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:23.121804325Z" - }, - { - "textPayload": "Task exited with return code 1", - "insertId": "1q5vifcfm6vyc5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:14:18.025601615Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "task-id": "create_batch", - "process": "local_task_job.py:212", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:23.121804325Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1q5vifcfm6vyc6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:14:18.075148880Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:2599", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:23.121804325Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[efd1675a-0aec-41f9-92b6-6b07b9b4641c] succeeded in 33.42658004097757s: None", - "insertId": "1q5vifcfm6vyc7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:14:18.324312201Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:14:23.121804325Z" - }, - { - "textPayload": "I0913 12:15:08.447997 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "4fzudxf6weebl", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:15:08.448261501Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T12:15:15.126689950Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1fm24p7fiqejga", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:18:05.524917413Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:18:10.458008352Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[0283793c-458c-40be-9845-281507d7db41] received", - "insertId": "rh99o2f6uq532", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:19:18.982432927Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:24.248379026Z" - }, - { - "textPayload": "[0283793c-458c-40be-9845-281507d7db41] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'create_batch', 'manual__2023-09-13T12:13:02.076777+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "rh99o2f6uq533", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:19:18.989494341Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:24.248379026Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "rh99o2f6uq534", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:19:19.411456311Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:24.248379026Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "rh99o2f6uq535", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:19:19.413793761Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:24.248379026Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "rh99o2f6uq536", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:19:19.525781775Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "utils.py:430" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:24.248379026Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "rh99o2f6uq537", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:19:20.341264036Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:24.248379026Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "u9dqtsfpaar7l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:19:24.441573680Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "u9dqtsfpaar7m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:19:24.560866533Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "create_batch", - "try-number": "2", - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1091", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "u9dqtsfpaar7n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:19:24.578270425Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "process": "taskinstance.py:1091", - "try-number": "2", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "u9dqtsfpaar7o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:19:24.578864166Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "try-number": "2", - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "create_batch", - "process": "taskinstance.py:1289" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Starting attempt 2 of 3", - "insertId": "u9dqtsfpaar7p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:19:24.579796378Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "process": "taskinstance.py:1290", - "try-number": "2", - "workflow": "data_analytics_dag", - "task-id": "create_batch", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "u9dqtsfpaar7q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:19:24.579955614Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "task-id": "create_batch", - "worker_id": "airflow-worker-r72xf", - "process": "taskinstance.py:1291", - "workflow": "data_analytics_dag", - "try-number": "2", - "execution-date": "2023-09-13T12:13:02.076777+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "u9dqtsfpaar7r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:19:25.080669894Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "u9dqtsfpaar7s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:19:25.080717366Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "u9dqtsfpaar7t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:19:25.128596755Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "u9dqtsfpaar7u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:19:25.128640297Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Executing on 2023-09-13 12:13:02.076777+00:00", - "insertId": "u9dqtsfpaar7v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:19:26.276182833Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "2", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1310", - "worker_id": "airflow-worker-r72xf", - "task-id": "create_batch", - "execution-date": "2023-09-13T12:13:02.076777+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Started process 1046 to run task", - "insertId": "u9dqtsfpaar7w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:19:26.319357250Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "map-index": "-1", - "task-id": "create_batch", - "process": "standard_task_runner.py:55", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'create_batch', 'manual__2023-09-13T12:13:02.076777+00:00', '--job-id', '1006', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpol7pazd8']", - "insertId": "u9dqtsfpaar7x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:19:26.320089419Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-r72xf", - "try-number": "2", - "task-id": "create_batch", - "map-index": "-1", - "process": "standard_task_runner.py:82", - "execution-date": "2023-09-13T12:13:02.076777+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Job 1006: Subtask create_batch", - "insertId": "u9dqtsfpaar7y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:19:26.321106856Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "task-id": "create_batch", - "try-number": "2", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "map-index": "-1", - "process": "standard_task_runner.py:83" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "u9dqtsfpaar7z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:19:26.660841462Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "task-id": "create_batch", - "process": "task_command.py:393", - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=create_batch\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T12:13:02.076777+00:00\nAIRFLOW_CTX_TRY_NUMBER=2\nAIRFLOW_CTX_DAG_RUN_ID=manual__2023-09-13T12:13:02.076777+00:00", - "insertId": "u9dqtsfpaar80", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:19:27.041824351Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "process": "taskinstance.py:1518", - "task-id": "create_batch", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "try-number": "2", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "u9dqtsfpaar81", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:19:27.080492932Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "task-id": "create_batch", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "process": "base.py:73" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Creating batch data-processing-20230913t121302", - "insertId": "u9dqtsfpaar82", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:19:27.082193217Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "try-number": "2", - "process": "dataproc.py:2349", - "workflow": "data_analytics_dag", - "task-id": "create_batch", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Once started, the batch job will be available at https://console.cloud.google.com/dataproc/batches/us-west1/data-processing-20230913t121302/monitoring?project=acceldata-acm", - "insertId": "u9dqtsfpaar83", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:19:27.082530549Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "process": "dataproc.py:2350", - "task-id": "create_batch" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "u9dqtsfpaar84", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:19:27.083372310Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "task-id": "create_batch", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "try-number": "2", - "process": "credentials_provider.py:353", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Batch with given id already exists", - "insertId": "u9dqtsfpaar85", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:19:28.606159748Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf", - "task-id": "create_batch", - "try-number": "2", - "process": "dataproc.py:2394" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Attaching to the job data-processing-20230913t121302 if it is still running.", - "insertId": "u9dqtsfpaar86", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:19:28.606718038Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "dataproc.py:2399", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag", - "task-id": "create_batch", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Task failed with exception\nTraceback (most recent call last):\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2426, in execute\n self.handle_batch_status(context, result.state, batch_id)\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2454, in handle_batch_status\n raise AirflowException(\"Batch job %s failed. Driver Logs: %s\", batch_id, link)\nairflow.exceptions.AirflowException: ('Batch job %s failed. Driver Logs: %s', 'data-processing-20230913t121302', 'https://console.cloud.google.com/dataproc/batches/us-west1/data-processing-20230913t121302/monitoring?project=acceldata-acm')", - "insertId": "u9dqtsfpaar87", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:19:28.767867750Z", - "severity": "ERROR", - "labels": { - "process": "taskinstance.py:1778", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "task-id": "create_batch", - "try-number": "2" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Marking task as UP_FOR_RETRY. dag_id=data_analytics_dag, task_id=create_batch, execution_date=20230913T121302, start_date=20230913T121924, end_date=20230913T121928", - "insertId": "u9dqtsfpaar88", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:19:28.776749204Z", - "severity": "INFO", - "labels": { - "try-number": "2", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "create_batch", - "process": "taskinstance.py:1328", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Failed to execute job 1006 for task create_batch (('Batch job %s failed. Driver Logs: %s', 'data-processing-20230913t121302', 'https://console.cloud.google.com/dataproc/batches/us-west1/data-processing-20230913t121302/monitoring?project=acceldata-acm'); 1046)", - "insertId": "u9dqtsfpaar89", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:19:28.808086631Z", - "severity": "ERROR", - "labels": { - "map-index": "-1", - "try-number": "2", - "task-id": "create_batch", - "process": "standard_task_runner.py:100", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:29.336713508Z" - }, - { - "textPayload": "Task exited with return code 1", - "insertId": "ukf4wfpdccf1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:19:28.986740070Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "try-number": "2", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-r72xf", - "process": "local_task_job.py:212", - "task-id": "create_batch" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:34.422875854Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "ukf4wfpdccf2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:19:29.039530615Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "try-number": "2", - "map-index": "-1", - "process": "taskinstance.py:2599" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:34.422875854Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[0283793c-458c-40be-9845-281507d7db41] succeeded in 10.232915115979267s: None", - "insertId": "ukf4wfpdccf3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:19:29.219482541Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:19:34.422875854Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[f617a2cb-f97b-4664-a85d-0c5fa8dc65ef] received", - "insertId": "1mjt84efj7sobr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:20:01.179100096Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "[f617a2cb-f97b-4664-a85d-0c5fa8dc65ef] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T12:10:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "1mjt84efj7sobs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:20:01.184320438Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "1mjt84efj7sobt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:20:01.605252368Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "1mjt84efj7sobu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:20:01.608503898Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "1mjt84efj7sobv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:20:01.734295139Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "1mjt84efj7sobw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:20:02.637578926Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "1mjt84efj7sobx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:20:03.261890616Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1mjt84efj7soby", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:20:03.430967974Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T12:10:00+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1mjt84efj7sobz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:20:03.516476284Z", - "severity": "INFO", - "labels": { - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf", - "task-id": "echo", - "process": "taskinstance.py:1091", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T12:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1mjt84efj7soc0", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:20:03.517418964Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1289", - "worker_id": "airflow-worker-r72xf", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "execution-date": "2023-09-13T12:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "1mjt84efj7soc1", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:20:03.518043346Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:10:00+00:00", - "try-number": "1", - "process": "taskinstance.py:1290", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "task-id": "echo", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1mjt84efj7soc2", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:20:03.518646844Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "map-index": "-1", - "execution-date": "2023-09-13T12:10:00+00:00", - "process": "taskinstance.py:1291", - "task-id": "echo", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1mjt84efj7soc3", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:20:03.903612909Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1mjt84efj7soc4", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:20:03.903687714Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1mjt84efj7soc5", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:20:03.930351094Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1mjt84efj7soc6", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:20:03.930422845Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Executing on 2023-09-13 12:10:00+00:00", - "insertId": "1mjt84efj7soc7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:20:04.605501095Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "taskinstance.py:1310", - "task-id": "echo", - "execution-date": "2023-09-13T12:10:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T12:10:00+00:00', '--job-id', '1007', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmplg3g80np']", - "insertId": "1mjt84efj7soc8", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:20:04.650946984Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:82", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "execution-date": "2023-09-13T12:10:00+00:00", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Started process 1075 to run task", - "insertId": "1mjt84efj7soc9", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:20:04.652077872Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:55", - "try-number": "1", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Job 1007: Subtask echo", - "insertId": "1mjt84efj7soca", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:20:04.652516853Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:10:00+00:00", - "task-id": "echo", - "worker_id": "airflow-worker-r72xf", - "workflow": "airflow_monitoring", - "try-number": "1", - "map-index": "-1", - "process": "standard_task_runner.py:83" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "1mjt84efj7socb", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:20:05.009451765Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T12:10:00+00:00", - "workflow": "airflow_monitoring", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T12:10:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T12:10:00+00:00", - "insertId": "1mjt84efj7socc", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:20:05.188987304Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "process": "taskinstance.py:1518", - "task-id": "echo", - "execution-date": "2023-09-13T12:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1mjt84efj7socd", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:20:05.191100716Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring", - "process": "subprocess.py:63", - "execution-date": "2023-09-13T12:10:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1mjt84efj7soce", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:20:05.192802978Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "subprocess.py:75", - "execution-date": "2023-09-13T12:10:00+00:00", - "workflow": "airflow_monitoring", - "map-index": "-1", - "task-id": "echo", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Output:", - "insertId": "1mjt84efj7socf", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:20:05.416159338Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "execution-date": "2023-09-13T12:10:00+00:00", - "process": "subprocess.py:86", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "test", - "insertId": "1mjt84efj7socg", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:20:05.423313711Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "execution-date": "2023-09-13T12:10:00+00:00", - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "subprocess.py:93", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1mjt84efj7soch", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:20:05.424468600Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:10:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1", - "task-id": "echo", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "process": "subprocess.py:97" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T121000, start_date=20230913T122003, end_date=20230913T122005", - "insertId": "1mjt84efj7soci", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:20:05.466691112Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1328", - "map-index": "-1", - "execution-date": "2023-09-13T12:10:00+00:00", - "worker_id": "airflow-worker-r72xf", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:06.843281127Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "ivplxcfm4kjfp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:20:06.235641326Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:10:00+00:00", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "process": "local_task_job.py:212" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:11.860406731Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "ivplxcfm4kjfq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:20:06.316605974Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T12:10:00+00:00", - "process": "taskinstance.py:2599", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:11.860406731Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[f617a2cb-f97b-4664-a85d-0c5fa8dc65ef] succeeded in 5.289006081002299s: None", - "insertId": "ivplxcfm4kjfr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:20:06.471457483Z", - "severity": "INFO", - "labels": { - "process": "trace.py:131", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:20:11.860406731Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1trej3bfm5hxfi", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:23:07.654122311Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:23:12.660620903Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[9c759406-6f01-4ce0-aa15-3391a9aef45d] received", - "insertId": "198q49ffijo9bu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:29.060506303Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:34.191279974Z" - }, - { - "textPayload": "[9c759406-6f01-4ce0-aa15-3391a9aef45d] Executing command in Celery: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'create_batch', 'manual__2023-09-13T12:13:02.076777+00:00', '--local', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py']", - "insertId": "198q49ffijo9bv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:29.068469445Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "celery_executor.py:90" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:34.191279974Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "198q49ffijo9bw", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:24:29.425076712Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:34.191279974Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "198q49ffijo9bx", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:24:29.427488556Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:34.191279974Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "198q49ffijo9by", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:24:29.548674240Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:34.191279974Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/data_analytics_dag.py", - "insertId": "198q49ffijo9bz", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:24:30.453350594Z", - "severity": "INFO", - "labels": { - "process": "dagbag.py:532", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:34.191279974Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "e5g909fj8r536", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:34.177317263Z", - "severity": "INFO", - "labels": { - "process": "task_command.py:393", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "e5g909fj8r537", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:34.306621786Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "try-number": "3", - "process": "taskinstance.py:1091", - "execution-date": "2023-09-13T12:13:02.076777+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "e5g909fj8r538", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:24:34.329062497Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "try-number": "3", - "map-index": "-1", - "process": "taskinstance.py:1091", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "e5g909fj8r539", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:24:34.329902701Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf", - "process": "taskinstance.py:1289", - "map-index": "-1", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "try-number": "3" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Starting attempt 3 of 3", - "insertId": "e5g909fj8r53a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:24:34.330477295Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "try-number": "3", - "process": "taskinstance.py:1290", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "create_batch", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "e5g909fj8r53b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:24:34.330962071Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "taskinstance.py:1291", - "try-number": "3", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "e5g909fj8r53c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:24:34.770176477Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "e5g909fj8r53d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:24:34.770283493Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "e5g909fj8r53e", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:24:34.816680979Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "e5g909fj8r53f", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:34.816713172Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Executing on 2023-09-13 12:13:02.076777+00:00", - "insertId": "e5g909fj8r53g", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:35.827702412Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "worker_id": "airflow-worker-r72xf", - "try-number": "3", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:1310", - "map-index": "-1", - "execution-date": "2023-09-13T12:13:02.076777+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Started process 1195 to run task", - "insertId": "e5g909fj8r53h", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:24:35.884981869Z", - "severity": "INFO", - "labels": { - "process": "standard_task_runner.py:55", - "workflow": "data_analytics_dag", - "task-id": "create_batch", - "try-number": "3", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'data_analytics_dag', 'create_batch', 'manual__2023-09-13T12:13:02.076777+00:00', '--job-id', '1008', '--raw', '--subdir', 'DAGS_FOLDER/data_analytics_dag.py', '--cfg-path', '/tmp/tmpl_fji2ni']", - "insertId": "e5g909fj8r53i", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:35.885361249Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "worker_id": "airflow-worker-r72xf", - "process": "standard_task_runner.py:82", - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "try-number": "3" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Job 1008: Subtask create_batch", - "insertId": "e5g909fj8r53j", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:35.886324265Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "try-number": "3", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "task-id": "create_batch", - "process": "standard_task_runner.py:83" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "e5g909fj8r53k", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:24:36.308257526Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag", - "process": "task_command.py:393", - "worker_id": "airflow-worker-r72xf", - "task-id": "create_batch", - "try-number": "3" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=data_analytics_dag\nAIRFLOW_CTX_TASK_ID=create_batch\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T12:13:02.076777+00:00\nAIRFLOW_CTX_TRY_NUMBER=3\nAIRFLOW_CTX_DAG_RUN_ID=manual__2023-09-13T12:13:02.076777+00:00", - "insertId": "e5g909fj8r53l", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:36.734672850Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "map-index": "-1", - "process": "taskinstance.py:1518", - "try-number": "3", - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Using connection ID 'google_cloud_default' for task execution.", - "insertId": "e5g909fj8r53m", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:36.773107313Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "try-number": "3", - "process": "base.py:73", - "task-id": "create_batch", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Creating batch data-processing-20230913t121302", - "insertId": "e5g909fj8r53n", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:24:36.774584415Z", - "severity": "INFO", - "labels": { - "task-id": "create_batch", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "workflow": "data_analytics_dag", - "try-number": "3", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "process": "dataproc.py:2349" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Once started, the batch job will be available at https://console.cloud.google.com/dataproc/batches/us-west1/data-processing-20230913t121302/monitoring?project=acceldata-acm", - "insertId": "e5g909fj8r53o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:24:36.775113443Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "process": "dataproc.py:2350", - "task-id": "create_batch", - "try-number": "3", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Getting connection using `google.auth.default()` since no explicit credentials are provided.", - "insertId": "e5g909fj8r53p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:36.775921272Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "try-number": "3", - "workflow": "data_analytics_dag", - "task-id": "create_batch", - "process": "credentials_provider.py:353" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Batch with given id already exists", - "insertId": "e5g909fj8r53q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:38.259701617Z", - "severity": "INFO", - "labels": { - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-r72xf", - "task-id": "create_batch", - "try-number": "3", - "process": "dataproc.py:2394", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Attaching to the job data-processing-20230913t121302 if it is still running.", - "insertId": "e5g909fj8r53r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:38.260742226Z", - "severity": "INFO", - "labels": { - "process": "dataproc.py:2399", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "workflow": "data_analytics_dag", - "worker_id": "airflow-worker-r72xf", - "try-number": "3", - "task-id": "create_batch" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Task failed with exception\nTraceback (most recent call last):\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2426, in execute\n self.handle_batch_status(context, result.state, batch_id)\n File \"/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/dataproc.py\", line 2454, in handle_batch_status\n raise AirflowException(\"Batch job %s failed. Driver Logs: %s\", batch_id, link)\nairflow.exceptions.AirflowException: ('Batch job %s failed. Driver Logs: %s', 'data-processing-20230913t121302', 'https://console.cloud.google.com/dataproc/batches/us-west1/data-processing-20230913t121302/monitoring?project=acceldata-acm')", - "insertId": "e5g909fj8r53s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:38.426447611Z", - "severity": "ERROR", - "labels": { - "process": "taskinstance.py:1778", - "worker_id": "airflow-worker-r72xf", - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "try-number": "3", - "task-id": "create_batch" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "Marking task as FAILED. dag_id=data_analytics_dag, task_id=create_batch, execution_date=20230913T121302, start_date=20230913T122434, end_date=20230913T122438", - "insertId": "e5g909fj8r53t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:38.432107349Z", - "severity": "INFO", - "labels": { - "try-number": "3", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "process": "taskinstance.py:1328", - "worker_id": "airflow-worker-r72xf", - "task-id": "create_batch", - "map-index": "-1", - "workflow": "data_analytics_dag" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:39.297728403Z" - }, - { - "textPayload": "I0913 12:24:38.583231 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "1bgf4n0fj5ugr7", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:38.583450438Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T12:24:45.248922312Z" - }, - { - "textPayload": "Failed to execute job 1008 for task create_batch (('Batch job %s failed. Driver Logs: %s', 'data-processing-20230913t121302', 'https://console.cloud.google.com/dataproc/batches/us-west1/data-processing-20230913t121302/monitoring?project=acceldata-acm'); 1195)", - "insertId": "1t7u2cgfpgq7zq", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:24:38.955626331Z", - "severity": "ERROR", - "labels": { - "task-id": "create_batch", - "try-number": "3", - "workflow": "data_analytics_dag", - "map-index": "-1", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "worker_id": "airflow-worker-r72xf", - "process": "standard_task_runner.py:100" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:44.414729149Z" - }, - { - "textPayload": "Task exited with return code 1", - "insertId": "1t7u2cgfpgq7zr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:39.120311091Z", - "severity": "INFO", - "labels": { - "process": "local_task_job.py:212", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "task-id": "create_batch", - "workflow": "data_analytics_dag", - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "try-number": "3" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:44.414729149Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1t7u2cgfpgq7zs", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:24:39.198478437Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:13:02.076777+00:00", - "map-index": "-1", - "task-id": "create_batch", - "worker_id": "airflow-worker-r72xf", - "try-number": "3", - "workflow": "data_analytics_dag", - "process": "taskinstance.py:2599" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:44.414729149Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[9c759406-6f01-4ce0-aa15-3391a9aef45d] succeeded in 10.336296185007086s: None", - "insertId": "1t7u2cgfpgq7zt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:24:39.401346006Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:24:44.414729149Z" - }, - { - "textPayload": "I0913 12:25:34.504344 1 airflowworkerset_controller.go:61] \"controllers/AirflowWorkerSet: Reconcile\"", - "insertId": "14ii4dzf6y1qpa", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:25:34.504565605Z", - "severity": "INFO", - "logName": "projects/acceldata-acm/logs/airflow-worker-set", - "receiveTimestamp": "2023-09-13T12:25:41.577309181Z" - }, - { - "textPayload": "/opt/python3.8/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \"sqlalchemy<2.0\". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)", - "insertId": "1p240vzf6wbfcp", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:28:09.815294650Z", - "severity": "WARNING", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:28:15.389203937Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[afefbd18-45cd-42b2-ae60-568077361565] received", - "insertId": "jzsz8ofj9g1yr", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:30:00.355004066Z", - "severity": "INFO", - "labels": { - "process": "strategy.py:161", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:02.261354629Z" - }, - { - "textPayload": "[afefbd18-45cd-42b2-ae60-568077361565] Executing command in Celery: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T12:20:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py']", - "insertId": "jzsz8ofj9g1ys", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:30:00.366240297Z", - "severity": "INFO", - "labels": { - "process": "celery_executor.py:90", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:02.261354629Z" - }, - { - "textPayload": "No module named 'boto3'", - "insertId": "jzsz8ofj9g1yt", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:30:00.761017355Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:02.261354629Z" - }, - { - "textPayload": "No module named 'botocore'", - "insertId": "jzsz8ofj9g1yu", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:30:00.761578811Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:02.261354629Z" - }, - { - "textPayload": "No module named 'airflow.providers.sftp'", - "insertId": "jzsz8ofj9g1yv", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:30:00.925306052Z", - "severity": "WARNING", - "labels": { - "process": "utils.py:430", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:02.261354629Z" - }, - { - "textPayload": "Filling up the DagBag from /home/airflow/gcs/dags/airflow_monitoring.py", - "insertId": "1sds7spfirc37o", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:30:01.844359092Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "dagbag.py:532" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "1sds7spfirc37p", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:30:02.506331461Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "task_command.py:393" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Dependencies all met for dep_context=non-requeueable deps ti=", - "insertId": "1sds7spfirc37q", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:30:02.696464707Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1091", - "try-number": "1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "map-index": "-1", - "execution-date": "2023-09-13T12:20:00+00:00", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Dependencies all met for dep_context=requeueable deps ti=", - "insertId": "1sds7spfirc37r", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:30:02.722963759Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "execution-date": "2023-09-13T12:20:00+00:00", - "process": "taskinstance.py:1091" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1sds7spfirc37s", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:30:02.723531398Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:20:00+00:00", - "try-number": "1", - "map-index": "-1", - "workflow": "airflow_monitoring", - "task-id": "echo", - "process": "taskinstance.py:1289" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Starting attempt 1 of 2", - "insertId": "1sds7spfirc37t", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:30:02.724104631Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:20:00+00:00", - "map-index": "-1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1290" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "\n--------------------------------------------------------------------------------", - "insertId": "1sds7spfirc37u", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:30:02.724553265Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "process": "taskinstance.py:1291", - "execution-date": "2023-09-13T12:20:00+00:00", - "workflow": "airflow_monitoring", - "try-number": "1" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1sds7spfirc37v", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:30:02.990782414Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1sds7spfirc37w", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:30:02.990850733Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "fatal: not a git repository (or any parent up to mount point /home/airflow)", - "insertId": "1sds7spfirc37x", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:30:03.016915817Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).", - "insertId": "1sds7spfirc37y", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:30:03.016964945Z", - "severity": "ERROR", - "labels": { - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Executing on 2023-09-13 12:20:00+00:00", - "insertId": "1sds7spfirc37z", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:30:03.828184872Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:20:00+00:00", - "try-number": "1", - "task-id": "echo", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "workflow": "airflow_monitoring", - "process": "taskinstance.py:1310" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Running: ['airflow', 'tasks', 'run', 'airflow_monitoring', 'echo', 'scheduled__2023-09-13T12:20:00+00:00', '--job-id', '1009', '--raw', '--subdir', 'DAGS_FOLDER/airflow_monitoring.py', '--cfg-path', '/tmp/tmp9h1fsha_']", - "insertId": "1sds7spfirc380", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:30:03.869546822Z", - "severity": "INFO", - "labels": { - "execution-date": "2023-09-13T12:20:00+00:00", - "process": "standard_task_runner.py:82", - "task-id": "echo", - "map-index": "-1", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Started process 1316 to run task", - "insertId": "1sds7spfirc381", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "project_id": "acceldata-acm", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:30:03.869624856Z", - "severity": "INFO", - "labels": { - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T12:20:00+00:00", - "try-number": "1", - "process": "standard_task_runner.py:55", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Job 1009: Subtask echo", - "insertId": "1sds7spfirc382", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:30:03.870196638Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "try-number": "1", - "execution-date": "2023-09-13T12:20:00+00:00", - "process": "standard_task_runner.py:83", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Running on host airflow-worker-r72xf", - "insertId": "1sds7spfirc383", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "environment_name": "openlineage", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:30:04.271133866Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T12:20:00+00:00", - "map-index": "-1", - "process": "task_command.py:393", - "workflow": "airflow_monitoring", - "try-number": "1", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=airflow_monitoring\nAIRFLOW_CTX_TASK_ID=echo\nAIRFLOW_CTX_EXECUTION_DATE=2023-09-13T12:20:00+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-09-13T12:20:00+00:00", - "insertId": "1sds7spfirc384", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:30:04.486781918Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1518", - "task-id": "echo", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "execution-date": "2023-09-13T12:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Tmp dir root location: \n /tmp", - "insertId": "1sds7spfirc385", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:30:04.489284466Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "try-number": "1", - "execution-date": "2023-09-13T12:20:00+00:00", - "map-index": "-1", - "process": "subprocess.py:63", - "worker_id": "airflow-worker-r72xf", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Running command: ['/usr/bin/bash', '-c', 'echo test']", - "insertId": "1sds7spfirc386", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:30:04.491786431Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "try-number": "1", - "task-id": "echo", - "process": "subprocess.py:75", - "execution-date": "2023-09-13T12:20:00+00:00", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Output:", - "insertId": "1sds7spfirc387", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:30:04.664947323Z", - "severity": "INFO", - "labels": { - "try-number": "1", - "task-id": "echo", - "workflow": "airflow_monitoring", - "map-index": "-1", - "process": "subprocess.py:86", - "execution-date": "2023-09-13T12:20:00+00:00", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "test", - "insertId": "1sds7spfirc388", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:30:04.672112243Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "process": "subprocess.py:93", - "execution-date": "2023-09-13T12:20:00+00:00", - "map-index": "-1", - "try-number": "1", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Command exited with return code 0", - "insertId": "1sds7spfirc389", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:30:04.672955929Z", - "severity": "INFO", - "labels": { - "process": "subprocess.py:97", - "execution-date": "2023-09-13T12:20:00+00:00", - "task-id": "echo", - "map-index": "-1", - "try-number": "1", - "worker_id": "airflow-worker-r72xf", - "workflow": "airflow_monitoring" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Marking task as SUCCESS. dag_id=airflow_monitoring, task_id=echo, execution_date=20230913T122000, start_date=20230913T123002, end_date=20230913T123004", - "insertId": "1sds7spfirc38a", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "environment_name": "openlineage", - "location": "us-west1", - "project_id": "acceldata-acm" - } - }, - "timestamp": "2023-09-13T12:30:04.726978392Z", - "severity": "INFO", - "labels": { - "process": "taskinstance.py:1328", - "workflow": "airflow_monitoring", - "execution-date": "2023-09-13T12:20:00+00:00", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "try-number": "1", - "task-id": "echo" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Task exited with return code 0", - "insertId": "1sds7spfirc38b", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "environment_name": "openlineage", - "location": "us-west1" - } - }, - "timestamp": "2023-09-13T12:30:05.460575421Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "execution-date": "2023-09-13T12:20:00+00:00", - "try-number": "1", - "workflow": "airflow_monitoring", - "map-index": "-1", - "worker_id": "airflow-worker-r72xf", - "process": "local_task_job.py:212" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "0 downstream tasks scheduled from follow-on schedule check", - "insertId": "1sds7spfirc38c", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "project_id": "acceldata-acm", - "location": "us-west1", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:30:05.538392388Z", - "severity": "INFO", - "labels": { - "task-id": "echo", - "workflow": "airflow_monitoring", - "worker_id": "airflow-worker-r72xf", - "map-index": "-1", - "process": "taskinstance.py:2599", - "try-number": "1", - "execution-date": "2023-09-13T12:20:00+00:00" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - }, - { - "textPayload": "Task airflow.executors.celery_executor.execute_command[afefbd18-45cd-42b2-ae60-568077361565] succeeded in 5.373693040019134s: None", - "insertId": "1sds7spfirc38d", - "resource": { - "type": "cloud_composer_environment", - "labels": { - "location": "us-west1", - "project_id": "acceldata-acm", - "environment_name": "openlineage" - } - }, - "timestamp": "2023-09-13T12:30:05.736903251Z", - "severity": "INFO", - "labels": { - "worker_id": "airflow-worker-r72xf", - "process": "trace.py:131" - }, - "logName": "projects/acceldata-acm/logs/airflow-worker", - "receiveTimestamp": "2023-09-13T12:30:07.342728641Z" - } -] \ No newline at end of file diff --git a/slack-archive/html/files/C01CK9T7HKR/F05S5NP9N7M.png b/slack-archive/html/files/C01CK9T7HKR/F05S5NP9N7M.png deleted file mode 100644 index 4de2e4c..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05S5NP9N7M.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05S67LRMNE.png b/slack-archive/html/files/C01CK9T7HKR/F05S67LRMNE.png deleted file mode 100644 index c66d455..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05S67LRMNE.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05SJ3DJ5CH.png b/slack-archive/html/files/C01CK9T7HKR/F05SJ3DJ5CH.png deleted file mode 100644 index 84178a3..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05SJ3DJ5CH.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05SUDUQEDN.png b/slack-archive/html/files/C01CK9T7HKR/F05SUDUQEDN.png deleted file mode 100644 index b2f1f51..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05SUDUQEDN.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05TJ6PA3NG.txt b/slack-archive/html/files/C01CK9T7HKR/F05TJ6PA3NG.txt deleted file mode 100644 index b9cea2c..0000000 --- a/slack-archive/html/files/C01CK9T7HKR/F05TJ6PA3NG.txt +++ /dev/null @@ -1,1326 +0,0 @@ -23/09/22 03:12:03 INFO DriverDaemon$: Started Log4j2 -23/09/22 03:12:06 INFO DatabricksMain$$anon$1: Configured feature flag data source LaunchDarkly -23/09/22 03:12:06 INFO DatabricksMain$$anon$1: Load feature flag from LaunchDarkly -23/09/22 03:12:06 WARN DatabricksMain$$anon$1: REGION environment variable is not defined. getConfForCurrentRegion will always return default value -23/09/22 03:12:06 INFO DriverDaemon$: Current JVM Version 1.8.0_362 -23/09/22 03:12:06 INFO DriverDaemon$: ========== driver starting up ========== -23/09/22 03:12:06 INFO DriverDaemon$: Java: Azul Systems, Inc. 1.8.0_362 -23/09/22 03:12:06 INFO DriverDaemon$: OS: Linux/amd64 5.15.0-1042-azure -23/09/22 03:12:06 INFO DriverDaemon$: CWD: /databricks/driver -23/09/22 03:12:06 INFO DriverDaemon$: Mem: Max: 6.3G loaded GCs: PS Scavenge, PS MarkSweep -23/09/22 03:12:06 INFO DriverDaemon$: Logging multibyte characters: ✓ -23/09/22 03:12:06 INFO DriverDaemon$: 'publicFile.rolling.rewrite' appender in root logger: class org.apache.logging.log4j.core.appender.rewrite.RewriteAppender -23/09/22 03:12:06 INFO DriverDaemon$: == Modules: -23/09/22 03:12:08 INFO DriverDaemon$: Starting prometheus metrics log export timer -23/09/22 03:12:08 INFO DriverConf: Configured feature flag data source LaunchDarkly -23/09/22 03:12:08 INFO DriverConf: Load feature flag from LaunchDarkly -23/09/22 03:12:08 WARN DriverConf: REGION environment variable is not defined. getConfForCurrentRegion will always return default value -23/09/22 03:12:08 INFO DriverDaemon$: Loaded JDBC drivers in 190 ms -23/09/22 03:12:08 INFO DriverDaemon$: Universe Git Hash: 1e4a70bdafc31fab94e8e4a9c01a52855f6e151d -23/09/22 03:12:08 INFO DriverDaemon$: Spark Git Hash: 6deb9aa8cd233e381216b0ac25d7cfb153f8af95 -23/09/22 03:12:08 INFO SparkConfUtils$: Customize spark config according to file /tmp/custom-spark.conf -23/09/22 03:12:08 WARN RunHelpers$: Missing tag isolation client: java.util.NoSuchElementException: key not found: TagDefinition(clientType,The client type for a request, used for isolating resources for the request.,false,false,List(),DATA_LABEL_UNSPECIFIED) -23/09/22 03:12:08 INFO DatabricksILoop$: Creating throwaway interpreter -23/09/22 03:12:08 INFO MetastoreMonitor$: Internal metastore configured -23/09/22 03:12:08 INFO DataSourceFactory$: DataSource Jdbc URL: jdbc:mariadb://consolidated-westus2-prod-metastore-addl-2.mysql.database.azure.com:3306/organization4679476628690204?useSSL=true&sslMode=VERIFY_CA&disableSslHostnameVerification=true&trustServerCertificate=false&serverSslCert=/databricks/common/mysql-ssl-ca-cert.crt -23/09/22 03:12:08 INFO ConcurrentRateLimiterConfParser$: No additional configuration supplied to the concurrent rate-limiter. Defaults would be used. -23/09/22 03:12:08 INFO ConcurrentRateLimiterConfParser$: Service com.databricks.backend.daemon.driver.DriverCorral concurrent rate-limiter ConcurrentRateLimitConfig - Dry-Run: false | Dimension: WORKSPACE | API: DEFAULT | High: 100 | Low: 50 -23/09/22 03:12:08 INFO ConcurrentRateLimiterConfParser$: Service com.databricks.backend.daemon.driver.DriverCorral concurrent rate-limiter ConcurrentRateLimitConfig - Dry-Run: false | Dimension: ACCOUNT_ID | API: DEFAULT | High: 100 | Low: 50 -23/09/22 03:12:08 INFO DriverCorral: Creating the driver context -23/09/22 03:12:08 INFO DatabricksILoop$: Class Server Dir: /local_disk0/tmp/repl/spark-4347861282214610666-415cbfc1-bc72-4ecc-8182-d24eda276af6 -23/09/22 03:12:09 INFO HikariDataSource: metastore-monitor - Starting... -23/09/22 03:12:09 INFO HikariDataSource: metastore-monitor - Start completed. -23/09/22 03:12:09 INFO SparkConfUtils$: Customize spark config according to file /tmp/custom-spark.conf -23/09/22 03:12:09 WARN SparkConf: The configuration key 'spark.akka.frameSize' has been deprecated as of Spark 1.6 and may be removed in the future. Please use the new key 'spark.rpc.message.maxSize' instead. -23/09/22 03:12:09 INFO SparkContext: Running Spark version 3.3.0 -23/09/22 03:12:10 INFO ResourceUtils: ============================================================== -23/09/22 03:12:10 INFO ResourceUtils: No custom resources configured for spark.driver. -23/09/22 03:12:10 INFO ResourceUtils: ============================================================== -23/09/22 03:12:10 INFO SparkContext: Submitted application: Databricks Shell -23/09/22 03:12:10 INFO HikariDataSource: metastore-monitor - Shutdown initiated... -23/09/22 03:12:10 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 7284, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) -23/09/22 03:12:10 INFO HikariDataSource: metastore-monitor - Shutdown completed. -23/09/22 03:12:10 INFO ResourceProfile: Limiting resource is cpu -23/09/22 03:12:10 INFO ResourceProfileManager: Added ResourceProfile id: 0 -23/09/22 03:12:10 INFO MetastoreMonitor: Metastore healthcheck successful (connection duration = 1514 milliseconds) -23/09/22 03:12:10 INFO SecurityManager: Changing view acls to: root -23/09/22 03:12:10 INFO SecurityManager: Changing modify acls to: root -23/09/22 03:12:10 INFO SecurityManager: Changing view acls groups to: -23/09/22 03:12:10 INFO SecurityManager: Changing modify acls groups to: -23/09/22 03:12:10 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() -23/09/22 03:12:10 INFO Utils: Successfully started service 'sparkDriver' on port 40381. -23/09/22 03:12:10 INFO SparkEnv: Registering MapOutputTracker -23/09/22 03:12:11 INFO SparkEnv: Registering BlockManagerMaster -23/09/22 03:12:11 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information -23/09/22 03:12:11 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up -23/09/22 03:12:11 INFO SparkEnv: Registering BlockManagerMasterHeartbeat -23/09/22 03:12:11 INFO DiskBlockManager: Created local directory at /local_disk0/blockmgr-d9671de5-06a4-41c4-a396-6164c52e9d6e -23/09/22 03:12:11 INFO MemoryStore: MemoryStore started with capacity 3.3 GiB -23/09/22 03:12:11 INFO SparkEnv: Registering OutputCommitCoordinator -23/09/22 03:12:11 INFO SparkContext: Spark configuration: -eventLog.rolloverIntervalSeconds=900 -libraryDownload.sleepIntervalSeconds=5 -libraryDownload.timeoutSeconds=180 -spark.akka.frameSize=256 -spark.app.name=Databricks Shell -spark.app.startTime=1695352329292 -spark.cleaner.referenceTracking.blocking=false -spark.databricks.acl.client=com.databricks.spark.sql.acl.client.SparkSqlAclClient -spark.databricks.acl.dfAclsEnabled=false -spark.databricks.acl.enabled=false -spark.databricks.acl.provider=com.databricks.sql.acl.ReflectionBackedAclProvider -spark.databricks.acl.scim.client=com.databricks.spark.sql.acl.client.DriverToWebappScimClient -spark.databricks.automl.serviceEnabled=true -spark.databricks.cloudProvider=Azure -spark.databricks.cloudfetch.hasRegionSupport=true -spark.databricks.cloudfetch.requesterClassName=*********(redacted) -spark.databricks.clusterSource=UI -spark.databricks.clusterUsageTags.attribute_tag_budget= -spark.databricks.clusterUsageTags.attribute_tag_dust_execution_env= -spark.databricks.clusterUsageTags.attribute_tag_dust_maintainer= -spark.databricks.clusterUsageTags.attribute_tag_dust_suite= -spark.databricks.clusterUsageTags.attribute_tag_service= -spark.databricks.clusterUsageTags.autoTerminationMinutes=15 -spark.databricks.clusterUsageTags.azureSubscriptionId=a4f54399-8db8-4849-adcc-a42aed1fb97f -spark.databricks.clusterUsageTags.cloudProvider=Azure -spark.databricks.clusterUsageTags.clusterAllTags=[{"key":"Vendor","value":"Databricks"},{"key":"Creator","value":"jason.yip@tredence.com"},{"key":"ClusterName","value":"jason.yip@tredence.com's Cluster"},{"key":"ClusterId","value":"0808-055325-43kdx9a4"},{"key":"Environment","value":"POC"},{"key":"Project","value":"SI"},{"key":"DatabricksEnvironment","value":"workerenv-4679476628690204"}] -spark.databricks.clusterUsageTags.clusterAvailability=SPOT_WITH_FALLBACK_AZURE -spark.databricks.clusterUsageTags.clusterCreator=Webapp -spark.databricks.clusterUsageTags.clusterFirstOnDemand=1 -spark.databricks.clusterUsageTags.clusterGeneration=58 -spark.databricks.clusterUsageTags.clusterId=0808-055325-43kdx9a4 -spark.databricks.clusterUsageTags.clusterLastActivityTime=1695351309363 -spark.databricks.clusterUsageTags.clusterLogDeliveryEnabled=true -spark.databricks.clusterUsageTags.clusterLogDestination=dbfs:/cluster-logs -spark.databricks.clusterUsageTags.clusterMaxWorkers=2 -spark.databricks.clusterUsageTags.clusterMetastoreAccessType=RDS_DIRECT -spark.databricks.clusterUsageTags.clusterMinWorkers=1 -spark.databricks.clusterUsageTags.clusterName=jason.yip@tredence.com's Cluster -spark.databricks.clusterUsageTags.clusterNoDriverDaemon=false -spark.databricks.clusterUsageTags.clusterNodeType=Standard_DS3_v2 -spark.databricks.clusterUsageTags.clusterNumCustomTags=0 -spark.databricks.clusterUsageTags.clusterNumSshKeys=0 -spark.databricks.clusterUsageTags.clusterOwnerOrgId=4679476628690204 -spark.databricks.clusterUsageTags.clusterOwnerUserId=*********(redacted) -spark.databricks.clusterUsageTags.clusterPinned=false -spark.databricks.clusterUsageTags.clusterPythonVersion=3 -spark.databricks.clusterUsageTags.clusterResourceClass=default -spark.databricks.clusterUsageTags.clusterScalingType=autoscaling -spark.databricks.clusterUsageTags.clusterSizeType=VM_CONTAINER -spark.databricks.clusterUsageTags.clusterSku=STANDARD_SKU -spark.databricks.clusterUsageTags.clusterSpotBidMaxPrice=-1.0 -spark.databricks.clusterUsageTags.clusterState=Pending -spark.databricks.clusterUsageTags.clusterStateMessage=Starting Spark -spark.databricks.clusterUsageTags.clusterTargetWorkers=1 -spark.databricks.clusterUsageTags.clusterUnityCatalogMode=*********(redacted) -spark.databricks.clusterUsageTags.clusterWorkers=1 -spark.databricks.clusterUsageTags.containerType=LXC -spark.databricks.clusterUsageTags.dataPlaneRegion=westus2 -spark.databricks.clusterUsageTags.driverContainerId=5603f7b1e1d64b3fb68a6cbede3b5d75 -spark.databricks.clusterUsageTags.driverContainerPrivateIp=10.11.115.134 -spark.databricks.clusterUsageTags.driverInstanceId=48c9146699474759853905f5e39b09cf -spark.databricks.clusterUsageTags.driverInstancePrivateIp=10.11.115.198 -spark.databricks.clusterUsageTags.driverNodeType=Standard_DS3_v2 -spark.databricks.clusterUsageTags.effectiveSparkVersion=11.3.x-cpu-ml-scala2.12 -spark.databricks.clusterUsageTags.enableCredentialPassthrough=*********(redacted) -spark.databricks.clusterUsageTags.enableDfAcls=false -spark.databricks.clusterUsageTags.enableElasticDisk=true -spark.databricks.clusterUsageTags.enableGlueCatalogCredentialPassthrough=*********(redacted) -spark.databricks.clusterUsageTags.enableJdbcAutoStart=true -spark.databricks.clusterUsageTags.enableJobsAutostart=true -spark.databricks.clusterUsageTags.enableLocalDiskEncryption=false -spark.databricks.clusterUsageTags.enableSqlAclsOnly=false -spark.databricks.clusterUsageTags.hailEnabled=false -spark.databricks.clusterUsageTags.ignoreTerminationEventInAlerting=false -spark.databricks.clusterUsageTags.instanceWorkerEnvId=workerenv-4679476628690204 -spark.databricks.clusterUsageTags.instanceWorkerEnvNetworkType=vnet-injection -spark.databricks.clusterUsageTags.isDpCpPrivateLinkEnabled=false -spark.databricks.clusterUsageTags.isIMv2Enabled=false -spark.databricks.clusterUsageTags.isServicePrincipalCluster=false -spark.databricks.clusterUsageTags.isSingleUserCluster=*********(redacted) -spark.databricks.clusterUsageTags.managedResourceGroup=databricks-rg-SI-ADB-runlzayanl524 -spark.databricks.clusterUsageTags.ngrokNpipEnabled=true -spark.databricks.clusterUsageTags.numPerClusterInitScriptsV2=1 -spark.databricks.clusterUsageTags.numPerClusterInitScriptsV2Abfss=0 -spark.databricks.clusterUsageTags.numPerClusterInitScriptsV2Dbfs=1 -spark.databricks.clusterUsageTags.numPerClusterInitScriptsV2File=0 -spark.databricks.clusterUsageTags.numPerClusterInitScriptsV2Gcs=0 -spark.databricks.clusterUsageTags.numPerClusterInitScriptsV2S3=0 -spark.databricks.clusterUsageTags.numPerClusterInitScriptsV2Volumes=0 -spark.databricks.clusterUsageTags.numPerClusterInitScriptsV2Workspace=0 -spark.databricks.clusterUsageTags.numPerGlobalInitScriptsV2=0 -spark.databricks.clusterUsageTags.orgId=4679476628690204 -spark.databricks.clusterUsageTags.privateLinkEnabled=false -spark.databricks.clusterUsageTags.region=westus2 -spark.databricks.clusterUsageTags.runtimeEngine=STANDARD -spark.databricks.clusterUsageTags.sparkEnvVarContainsBacktick=false -spark.databricks.clusterUsageTags.sparkEnvVarContainsDollarSign=false -spark.databricks.clusterUsageTags.sparkEnvVarContainsDoubleQuotes=false -spark.databricks.clusterUsageTags.sparkEnvVarContainsEscape=false -spark.databricks.clusterUsageTags.sparkEnvVarContainsNewline=false -spark.databricks.clusterUsageTags.sparkEnvVarContainsSingleQuotes=false -spark.databricks.clusterUsageTags.sparkImageLabel=release__11.3.x-snapshot-cpu-ml-scala2.12__databricks-universe__11.3.20__1e4a70b__6deb9aa__jenkins__2a43af3__format-3 -spark.databricks.clusterUsageTags.sparkMasterUrlType=*********(redacted) -spark.databricks.clusterUsageTags.sparkVersion=11.3.x-cpu-ml-scala2.12 -spark.databricks.clusterUsageTags.userId=*********(redacted) -spark.databricks.clusterUsageTags.userProvidedRemoteVolumeCount=*********(redacted) -spark.databricks.clusterUsageTags.userProvidedRemoteVolumeSizeGb=*********(redacted) -spark.databricks.clusterUsageTags.userProvidedRemoteVolumeType=*********(redacted) -spark.databricks.clusterUsageTags.userProvidedSparkVersion=*********(redacted) -spark.databricks.clusterUsageTags.workerEnvironmentId=workerenv-4679476628690204 -spark.databricks.credential.aws.secretKey.redactor=*********(redacted) -spark.databricks.credential.redactor=*********(redacted) -spark.databricks.credential.scope.fs.adls.gen2.tokenProviderClassName=*********(redacted) -spark.databricks.credential.scope.fs.gs.auth.access.tokenProviderClassName=*********(redacted) -spark.databricks.credential.scope.fs.impl=*********(redacted) -spark.databricks.credential.scope.fs.s3a.tokenProviderClassName=*********(redacted) -spark.databricks.delta.logStore.crossCloud.fatal=true -spark.databricks.delta.multiClusterWrites.enabled=true -spark.databricks.delta.preview.enabled=true -spark.databricks.driverNfs.clusterWidePythonLibsEnabled=true -spark.databricks.driverNfs.enabled=true -spark.databricks.driverNfs.pathSuffix=.ephemeral_nfs -spark.databricks.driverNodeTypeId=Standard_DS3_v2 -spark.databricks.enablePublicDbfsFuse=false -spark.databricks.eventLog.dir=eventlogs -spark.databricks.eventLog.enabled=true -spark.databricks.eventLog.listenerClassName=com.databricks.backend.daemon.driver.DBCEventLoggingListener -spark.databricks.io.directoryCommit.enableLogicalDelete=false -spark.databricks.managedCatalog.clientClassName=com.databricks.managedcatalog.ManagedCatalogClientImpl -spark.databricks.metrics.filesystem_io_metrics=true -spark.databricks.mlflow.autologging.enabled=true -spark.databricks.overrideDefaultCommitProtocol=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol -spark.databricks.passthrough.adls.gen2.tokenProviderClassName=*********(redacted) -spark.databricks.passthrough.adls.tokenProviderClassName=*********(redacted) -spark.databricks.passthrough.enabled=false -spark.databricks.passthrough.glue.credentialsProviderFactoryClassName=*********(redacted) -spark.databricks.passthrough.glue.executorServiceFactoryClassName=*********(redacted) -spark.databricks.passthrough.oauth.refresher.impl=*********(redacted) -spark.databricks.passthrough.s3a.threadPoolExecutor.factory.class=com.databricks.backend.daemon.driver.aws.S3APassthroughThreadPoolExecutorFactory -spark.databricks.passthrough.s3a.tokenProviderClassName=*********(redacted) -spark.databricks.preemption.enabled=true -spark.databricks.privateLinkEnabled=false -spark.databricks.python.defaultPythonRepl=ipykernel -spark.databricks.redactor=com.databricks.spark.util.DatabricksSparkLogRedactorProxy -spark.databricks.repl.enableClassFileCleanup=true -spark.databricks.secret.envVar.keys.toRedact=*********(redacted) -spark.databricks.secret.sparkConf.keys.toRedact=*********(redacted) -spark.databricks.service.dbutils.repl.backend=com.databricks.dbconnect.ReplDBUtils -spark.databricks.service.dbutils.server.backend=com.databricks.dbconnect.SparkServerDBUtils -spark.databricks.session.share=false -spark.databricks.sparkContextId=4347861282214610666 -spark.databricks.sql.configMapperClass=com.databricks.dbsql.config.SqlConfigMapperBridge -spark.databricks.tahoe.logStore.aws.class=com.databricks.tahoe.store.MultiClusterLogStore -spark.databricks.tahoe.logStore.azure.class=com.databricks.tahoe.store.AzureLogStore -spark.databricks.tahoe.logStore.class=com.databricks.tahoe.store.DelegatingLogStore -spark.databricks.tahoe.logStore.gcp.class=com.databricks.tahoe.store.GCPLogStore -spark.databricks.unityCatalog.credentialManager.apiTokenProviderClassName=*********(redacted) -spark.databricks.unityCatalog.credentialManager.tokenRefreshEnabled=*********(redacted) -spark.databricks.unityCatalog.enabled=false -spark.databricks.workerNodeTypeId=Standard_DS3_v2 -spark.databricks.workspaceUrl=*********(redacted) -spark.databricks.wsfs.workspacePrivatePreview=true -spark.databricks.wsfsPublicPreview=true -spark.delta.sharing.profile.provider.class=*********(redacted) -spark.driver.allowMultipleContexts=false -spark.driver.extraJavaOptions=-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -spark.driver.host=10.11.115.134 -spark.driver.maxResultSize=4g -spark.driver.port=40381 -spark.driver.tempDirectory=/local_disk0/tmp -spark.eventLog.enabled=false -spark.executor.extraClassPath=/databricks/spark/dbconf/log4j/executor:/databricks/spark/dbconf/jets3t/:/databricks/spark/dbconf/hadoop:/databricks/hive/conf:/databricks/jars/* -spark.executor.extraJavaOptions=-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djava.io.tmpdir=/local_disk0/tmp -XX:ReservedCodeCacheSize=512m -XX:+UseCodeCacheFlushing -XX:PerMethodRecompilationCutoff=-1 -XX:PerBytecodeRecompilationCutoff=-1 -Djava.security.properties=/databricks/spark/dbconf/java/extra.security -XX:-UseContainerSupport -XX:+PrintFlagsFinal -XX:+PrintGCDateStamps -XX:+PrintGCDetails -verbose:gc -Xss4m -Djava.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni -Djavax.xml.datatype.DatatypeFactory=com.sun.org.apache.xerces.internal.jaxp.datatype.DatatypeFactoryImpl -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl -Djavax.xml.parsers.SAXParserFactory=com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl -Djavax.xml.validation.SchemaFactory:http://www.w3.org/2001/XMLSchema=com.sun.org.apache.xerces.internal.jaxp.validation.XMLSchemaFactory -Dorg.xml.sax.driver=com.sun.org.apache.xerces.internal.parsers.SAXParser -Dorg.w3c.dom.DOMImplementationSourceList=com.sun.org.apache.xerces.internal.dom.DOMXSImplementationSourceImpl -Djavax.net.ssl.sessionCacheSize=10000 -Dscala.reflect.runtime.disable.typetag.cache=true -Dcom.google.cloud.spark.bigquery.repackaged.io.netty.tryReflectionSetAccessible=true -Dlog4j2.formatMsgNoLookups=true -Ddatabricks.serviceName=spark-executor-1 -spark.executor.id=driver -spark.executor.memory=7284m -spark.executor.tempDirectory=/local_disk0/tmp -spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener -spark.files.fetchFailure.unRegisterOutputOnHost=true -spark.files.overwrite=true -spark.files.useFetchCache=false -spark.hadoop.databricks.dbfs.client.version=v2 -spark.hadoop.databricks.fs.perfMetrics.enable=true -spark.hadoop.databricks.s3.amazonS3Client.cache.enabled=true -spark.hadoop.databricks.s3.create.deleteUnnecessaryFakeDirectories=false -spark.hadoop.databricks.s3.verifyBucketExists.enabled=false -spark.hadoop.databricks.s3commit.client.sslTrustAll=false -spark.hadoop.fs.AbstractFileSystem.gs.impl=shaded.databricks.com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS -spark.hadoop.fs.abfs.impl=shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemHadoop3 -spark.hadoop.fs.abfs.impl.disable.cache=true -spark.hadoop.fs.abfss.impl=shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystemHadoop3 -spark.hadoop.fs.abfss.impl.disable.cache=true -spark.hadoop.fs.adl.impl=com.databricks.adl.AdlFileSystem -spark.hadoop.fs.adl.impl.disable.cache=true -spark.hadoop.fs.azure.authorization.caching.enable=false -spark.hadoop.fs.azure.cache.invalidator.type=com.databricks.encryption.utils.CacheInvalidatorImpl -spark.hadoop.fs.azure.skip.metrics=true -spark.hadoop.fs.azure.user.agent.prefix=*********(redacted) -spark.hadoop.fs.cpfs-abfss.impl=*********(redacted) -spark.hadoop.fs.cpfs-abfss.impl.disable.cache=true -spark.hadoop.fs.cpfs-adl.impl=*********(redacted) -spark.hadoop.fs.cpfs-adl.impl.disable.cache=true -spark.hadoop.fs.cpfs-s3.impl=*********(redacted) -spark.hadoop.fs.cpfs-s3a.impl=*********(redacted) -spark.hadoop.fs.cpfs-s3n.impl=*********(redacted) -spark.hadoop.fs.dbfs.impl=com.databricks.backend.daemon.data.client.DbfsHadoop3 -spark.hadoop.fs.dbfsartifacts.impl=com.databricks.backend.daemon.data.client.DBFSV1 -spark.hadoop.fs.fcfs-abfs.impl=*********(redacted) -spark.hadoop.fs.fcfs-abfs.impl.disable.cache=true -spark.hadoop.fs.fcfs-abfss.impl=*********(redacted) -spark.hadoop.fs.fcfs-abfss.impl.disable.cache=true -spark.hadoop.fs.fcfs-s3.impl=*********(redacted) -spark.hadoop.fs.fcfs-s3.impl.disable.cache=true -spark.hadoop.fs.fcfs-s3a.impl=*********(redacted) -spark.hadoop.fs.fcfs-s3a.impl.disable.cache=true -spark.hadoop.fs.fcfs-s3n.impl=*********(redacted) -spark.hadoop.fs.fcfs-s3n.impl.disable.cache=true -spark.hadoop.fs.fcfs-wasb.impl=*********(redacted) -spark.hadoop.fs.fcfs-wasb.impl.disable.cache=true -spark.hadoop.fs.fcfs-wasbs.impl=*********(redacted) -spark.hadoop.fs.fcfs-wasbs.impl.disable.cache=true -spark.hadoop.fs.file.impl=com.databricks.backend.daemon.driver.WorkspaceLocalFileSystem -spark.hadoop.fs.gs.impl=shaded.databricks.com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemHadoop3 -spark.hadoop.fs.gs.impl.disable.cache=true -spark.hadoop.fs.gs.outputstream.upload.chunk.size=16777216 -spark.hadoop.fs.idbfs.impl=com.databricks.io.idbfs.IdbfsFileSystem -spark.hadoop.fs.mcfs-s3a.impl=com.databricks.sql.acl.fs.ManagedCatalogFileSystem -spark.hadoop.fs.mlflowdbfs.impl=com.databricks.mlflowdbfs.MlflowdbfsFileSystem -spark.hadoop.fs.s3.impl=shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystemHadoop3 -spark.hadoop.fs.s3.impl.disable.cache=true -spark.hadoop.fs.s3a.assumed.role.credentials.provider=*********(redacted) -spark.hadoop.fs.s3a.attempts.maximum=10 -spark.hadoop.fs.s3a.block.size=67108864 -spark.hadoop.fs.s3a.connection.maximum=200 -spark.hadoop.fs.s3a.connection.timeout=50000 -spark.hadoop.fs.s3a.fast.upload=true -spark.hadoop.fs.s3a.fast.upload.active.blocks=32 -spark.hadoop.fs.s3a.fast.upload.default=true -spark.hadoop.fs.s3a.impl=shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystemHadoop3 -spark.hadoop.fs.s3a.impl.disable.cache=true -spark.hadoop.fs.s3a.max.total.tasks=1000 -spark.hadoop.fs.s3a.multipart.size=10485760 -spark.hadoop.fs.s3a.multipart.threshold=104857600 -spark.hadoop.fs.s3a.retry.limit=20 -spark.hadoop.fs.s3a.retry.throttle.interval=500ms -spark.hadoop.fs.s3a.threads.max=136 -spark.hadoop.fs.s3n.impl=shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystemHadoop3 -spark.hadoop.fs.s3n.impl.disable.cache=true -spark.hadoop.fs.stage.impl=com.databricks.backend.daemon.driver.managedcatalog.PersonalStagingFileSystem -spark.hadoop.fs.stage.impl.disable.cache=true -spark.hadoop.fs.wasb.impl=shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem -spark.hadoop.fs.wasb.impl.disable.cache=true -spark.hadoop.fs.wasbs.impl=shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem -spark.hadoop.fs.wasbs.impl.disable.cache=true -spark.hadoop.hive.hmshandler.retry.attempts=10 -spark.hadoop.hive.hmshandler.retry.interval=2000 -spark.hadoop.hive.server2.enable.doAs=false -spark.hadoop.hive.server2.idle.operation.timeout=7200000 -spark.hadoop.hive.server2.idle.session.timeout=900000 -spark.hadoop.hive.server2.keystore.password=*********(redacted) -spark.hadoop.hive.server2.keystore.path=/databricks/keys/jetty-ssl-driver-keystore.jks -spark.hadoop.hive.server2.session.check.interval=60000 -spark.hadoop.hive.server2.thrift.http.cookie.auth.enabled=false -spark.hadoop.hive.server2.thrift.http.port=10000 -spark.hadoop.hive.server2.transport.mode=http -spark.hadoop.hive.server2.use.SSL=true -spark.hadoop.hive.warehouse.subdir.inherit.perms=false -spark.hadoop.mapred.output.committer.class=com.databricks.backend.daemon.data.client.DirectOutputCommitter -spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 -spark.hadoop.parquet.abfs.readahead.optimization.enabled=true -spark.hadoop.parquet.block.size.row.check.max=10 -spark.hadoop.parquet.block.size.row.check.min=10 -spark.hadoop.parquet.filter.columnindex.enabled=false -spark.hadoop.parquet.memory.pool.ratio=0.5 -spark.hadoop.parquet.page.metadata.validation.enabled=true -spark.hadoop.parquet.page.size.check.estimate=false -spark.hadoop.parquet.page.verify-checksum.enabled=true -spark.hadoop.parquet.page.write-checksum.enabled=true -spark.hadoop.spark.databricks.io.parquet.verifyChecksumOnWrite.enabled=false -spark.hadoop.spark.databricks.io.parquet.verifyChecksumOnWrite.throwsException=false -spark.hadoop.spark.driverproxy.customHeadersToProperties=*********(redacted) -spark.hadoop.spark.hadoop.aws.glue.cache.db.size=1000 -spark.hadoop.spark.hadoop.aws.glue.cache.db.ttl-mins=30 -spark.hadoop.spark.hadoop.aws.glue.cache.table.size=1000 -spark.hadoop.spark.hadoop.aws.glue.cache.table.ttl-mins=30 -spark.hadoop.spark.sql.parquet.output.committer.class=org.apache.spark.sql.parquet.DirectParquetOutputCommitter -spark.hadoop.spark.sql.sources.outputCommitterClass=com.databricks.backend.daemon.data.client.MapReduceDirectOutputCommitter -spark.home=/databricks/spark -spark.logConf=true -spark.master=spark://10.11.115.134:7077 -spark.metrics.conf=/databricks/spark/conf/metrics.properties -spark.openlineage.endpoint=api/v1/lineage -spark.openlineage.namespace=adb-5445974573286168.8#default -spark.openlineage.url=*********(redacted) -spark.openlineage.url.param.code=*********(redacted) -spark.r.backendConnectionTimeout=604800 -spark.r.numRBackendThreads=1 -spark.rdd.compress=true -spark.repl.class.outputDir=/local_disk0/tmp/repl/spark-4347861282214610666-415cbfc1-bc72-4ecc-8182-d24eda276af6 -spark.rpc.message.maxSize=256 -spark.scheduler.listenerbus.eventqueue.capacity=20000 -spark.scheduler.mode=FAIR -spark.serializer.objectStreamReset=100 -spark.shuffle.manager=SORT -spark.shuffle.memoryFraction=0.2 -spark.shuffle.reduceLocality.enabled=false -spark.shuffle.service.enabled=true -spark.shuffle.service.port=4048 -spark.sparklyr-backend.threads=1 -spark.sparkr.use.daemon=false -spark.speculation=false -spark.speculation.multiplier=3 -spark.speculation.quantile=0.9 -spark.sql.allowMultipleContexts=false -spark.sql.hive.convertCTAS=true -spark.sql.hive.convertMetastoreParquet=true -spark.sql.hive.metastore.jars=/databricks/databricks-hive/* -spark.sql.hive.metastore.sharedPrefixes=org.mariadb.jdbc,com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,microsoft.sql.DateTimeOffset,microsoft.sql.Types,com.databricks,com.codahale,com.fasterxml.jackson,shaded.databricks -spark.sql.hive.metastore.version=0.13.0 -spark.sql.legacy.createHiveTableByDefault=false -spark.sql.parquet.cacheMetadata=true -spark.sql.parquet.compression.codec=snappy -spark.sql.sources.commitProtocolClass=com.databricks.sql.transaction.directory.DirectoryAtomicCommitProtocol -spark.sql.sources.default=delta -spark.sql.streaming.checkpointFileManagerClass=com.databricks.spark.sql.streaming.DatabricksCheckpointFileManager -spark.sql.streaming.stopTimeout=15s -spark.sql.warehouse.dir=*********(redacted) -spark.storage.blockManagerTimeoutIntervalMs=300000 -spark.storage.memoryFraction=0.5 -spark.streaming.driver.writeAheadLog.allowBatching=true -spark.streaming.driver.writeAheadLog.closeFileAfterWrite=true -spark.task.reaper.enabled=true -spark.task.reaper.killTimeout=60s -spark.ui.port=40001 -spark.ui.prometheus.enabled=true -spark.worker.aioaLazyConfig.dbfsReadinessCheckClientClass=com.databricks.backend.daemon.driver.NephosDbfsReadinessCheckClient -spark.worker.aioaLazyConfig.iamReadinessCheckClientClass=com.databricks.backend.daemon.driver.NephosIamRoleCheckClient -spark.worker.cleanup.enabled=false -23/09/22 03:12:11 WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set. -23/09/22 03:12:11 INFO log: Logging initialized @14416ms to org.eclipse.jetty.util.log.Slf4jLog -23/09/22 03:12:11 INFO Server: jetty-9.4.46.v20220331; built: 2022-03-31T16:38:08.030Z; git: bc17a0369a11ecf40bb92c839b9ef0a8ac50ea18; jvm 1.8.0_362-b09 -23/09/22 03:12:11 INFO Server: Started @14686ms -23/09/22 03:12:12 INFO AbstractConnector: Started ServerConnector@544300a6{HTTP/1.1, (http/1.1)}{10.11.115.134:40001} -23/09/22 03:12:12 INFO Utils: Successfully started service 'SparkUI' on port 40001. -23/09/22 03:12:12 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@18fa5af6{/,null,AVAILABLE,@Spark} -23/09/22 03:12:12 WARN FairSchedulableBuilder: Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration. -23/09/22 03:12:12 INFO FairSchedulableBuilder: Created default pool: default, schedulingMode: FIFO, minShare: 0, weight: 1 -23/09/22 03:12:12 INFO DatabricksEdgeConfigs: serverlessEnabled : false -23/09/22 03:12:12 INFO DatabricksEdgeConfigs: perfPackEnabled : false -23/09/22 03:12:12 INFO DatabricksEdgeConfigs: classicSqlEnabled : false -23/09/22 03:12:12 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://10.11.115.134:7077... -23/09/22 03:12:12 INFO TransportClientFactory: Successfully created connection to /10.11.115.134:7077 after 111 ms (0 ms spent in bootstraps) -23/09/22 03:12:13 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20230922031213-0000 -23/09/22 03:12:13 INFO TaskSchedulerImpl: Task preemption enabled. -23/09/22 03:12:13 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44293. -23/09/22 03:12:13 INFO NettyBlockTransferService: Server created on 10.11.115.134:44293 -23/09/22 03:12:13 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy -23/09/22 03:12:13 INFO BlockManager: external shuffle service port = 4048 -23/09/22 03:12:13 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.11.115.134, 44293, None) -23/09/22 03:12:13 INFO BlockManagerMasterEndpoint: Registering block manager 10.11.115.134:44293 with 3.3 GiB RAM, BlockManagerId(driver, 10.11.115.134, 44293, None) -23/09/22 03:12:13 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.11.115.134, 44293, None) -23/09/22 03:12:13 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20230922031213-0000/0 on worker-20230922031209-10.11.115.133-34159 (10.11.115.133:34159) with 4 core(s) -23/09/22 03:12:13 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.11.115.134, 44293, None) -23/09/22 03:12:13 INFO StandaloneSchedulerBackend: Granted executor ID app-20230922031213-0000/0 on hostPort 10.11.115.133:34159 with 4 core(s), 7.1 GiB RAM -23/09/22 03:12:13 INFO DatabricksUtils: Disabling Databricks event logging listener because spark.extraListeners does not contain the Databricks event logger class -23/09/22 03:12:13 INFO SparkContext: Registered listener io.openlineage.spark.agent.OpenLineageSparkListener -23/09/22 03:12:13 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230922031213-0000/0 is now RUNNING -23/09/22 03:12:14 INFO ContextHandler: Stopped o.e.j.s.ServletContextHandler@18fa5af6{/,null,STOPPED,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@b307030{/jobs,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@263f6e96{/jobs/json,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@39acf187{/jobs/job,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@dd3e1e3{/jobs/job/json,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@7878459f{/stages,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@5d24703e{/stages/json,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@17554316{/stages/stage,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@5af1b221{/stages/stage/json,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@7d49fe37{/stages/pool,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@231c521e{/stages/pool/json,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@1be3a294{/storage,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@729d1428{/storage/json,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@1728d307{/storage/rdd,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@3f0b5619{/storage/rdd/json,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@36ce9eaf{/environment,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@1f27f354{/environment/json,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@4425b6ed{/executors,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@5039c2cf{/executors/json,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@5ca006ac{/executors/threadDump,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@1372696b{/executors/threadDump/json,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@206e5183{/executors/heapHistogram,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@32eb38e5{/executors/heapHistogram/json,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@21539796{/static,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@68ea1eb5{/,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@d7c00de{/api,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@6046fba0{/metrics,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@755033c5{/jobs/job/kill,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@5b49b1df{/stages/stage/kill,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@118c1faa{/metrics/json,null,AVAILABLE,@Spark} -23/09/22 03:12:14 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 -23/09/22 03:12:14 INFO SparkContext: Loading Spark Service RPC Server. Classloader stack:List(com.databricks.backend.daemon.driver.ClassLoaders$MultiReplClassLoader@512dc0e0, com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader@556e4588, sun.misc.Launcher$AppClassLoader@1c53fd30, sun.misc.Launcher$ExtClassLoader@35a9782c) -23/09/22 03:12:15 INFO SparkServiceRPCServer: Initializing Spark Service RPC Server. Classloader stack: List(com.databricks.backend.daemon.driver.ClassLoaders$MultiReplClassLoader@512dc0e0, com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader@556e4588, sun.misc.Launcher$AppClassLoader@1c53fd30, sun.misc.Launcher$ExtClassLoader@35a9782c) -23/09/22 03:12:15 INFO SparkServiceRPCServer: Starting Spark Service RPC Server -23/09/22 03:12:15 INFO SparkServiceRPCServer: Starting Spark Service RPC Server. Classloader stack: List(com.databricks.backend.daemon.driver.ClassLoaders$MultiReplClassLoader@512dc0e0, com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader@556e4588, sun.misc.Launcher$AppClassLoader@1c53fd30, sun.misc.Launcher$ExtClassLoader@35a9782c) -23/09/22 03:12:15 INFO Server: jetty-9.4.46.v20220331; built: 2022-03-31T16:38:08.030Z; git: bc17a0369a11ecf40bb92c839b9ef0a8ac50ea18; jvm 1.8.0_362-b09 -23/09/22 03:12:15 INFO AbstractConnector: Started ServerConnector@457d3f54{HTTP/1.1, (http/1.1)}{0.0.0.0:15001} -23/09/22 03:12:15 INFO Server: Started @18100ms -23/09/22 03:12:15 INFO DatabricksILoop$: Finished creating throwaway interpreter -23/09/22 03:12:15 INFO DatabricksILoop$: Successfully registered spark metrics in Prometheus registry -23/09/22 03:12:15 INFO DatabricksILoop$: Successfully initialized SparkContext -23/09/22 03:12:16 INFO SharedState: Scheduler stats enabled. -23/09/22 03:12:16 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir. -23/09/22 03:12:16 INFO SharedState: Warehouse path is 'dbfs:/user/hive/warehouse'. -23/09/22 03:12:16 INFO AsyncEventQueue: Process of event SparkListenerApplicationStart(Databricks Shell,Some(app-20230922031213-0000),1695352329292,root,None,None,None) by listener OpenLineageSparkListener took 1.933345606s. -23/09/22 03:12:16 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@134ec0f3{/storage/iocache,null,AVAILABLE,@Spark} -23/09/22 03:12:16 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@491f3fb0{/storage/iocache/json,null,AVAILABLE,@Spark} -23/09/22 03:12:16 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@2a738d47{/SQL,null,AVAILABLE,@Spark} -23/09/22 03:12:16 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@8bd9d08{/SQL/json,null,AVAILABLE,@Spark} -23/09/22 03:12:16 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@d8a2b1b{/SQL/execution,null,AVAILABLE,@Spark} -23/09/22 03:12:16 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@3328db4f{/SQL/execution/json,null,AVAILABLE,@Spark} -23/09/22 03:12:16 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@23169374{/static/sql,null,AVAILABLE,@Spark} -23/09/22 03:12:16 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead. -23/09/22 03:12:16 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead. -23/09/22 03:12:19 INFO DriverConf: Configured feature flag data source LaunchDarkly -23/09/22 03:12:19 INFO DriverConf: Load feature flag from LaunchDarkly -23/09/22 03:12:19 WARN DriverConf: REGION environment variable is not defined. getConfForCurrentRegion will always return default value -23/09/22 03:12:21 INFO StandaloneSchedulerBackend$StandaloneDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.11.115.133:57974) with ID 0, ResourceProfileId 0 -23/09/22 03:12:21 INFO DatabricksMountsStore: Mount store initialization: Attempting to get the list of mounts from metadata manager of DBFS -23/09/22 03:12:21 INFO log: Logging initialized @24438ms to shaded.v9_4.org.eclipse.jetty.util.log.Slf4jLog -23/09/22 03:12:21 INFO DynamicRpcConf: Configured feature flag data source LaunchDarkly -23/09/22 03:12:21 INFO DynamicRpcConf: Load feature flag from LaunchDarkly -23/09/22 03:12:21 WARN DynamicRpcConf: REGION environment variable is not defined. getConfForCurrentRegion will always return default value -23/09/22 03:12:22 INFO TypeUtil: JVM Runtime does not support Modules -23/09/22 03:12:22 INFO DatabricksMountsStore: Mount store initialization: Received a list of 9 mounts accessible from metadata manager of DBFS -23/09/22 03:12:22 INFO DatabricksMountsStore: Updated mounts cache. Changes: List((+,DbfsMountPoint(s3a://databricks-datasets-california/, /databricks-datasets)), (+,DbfsMountPoint(uc-volumes:/Volumes, /Volumes)), (+,DbfsMountPoint(unsupported-access-mechanism-for-path--use-mlflow-client:/, /databricks/mlflow-tracking)), (+,DbfsMountPoint(wasbs://dbstorage32gi53vs6kgpo.blob.core.windows.net/4679476628690204, /databricks-results)), (+,DbfsMountPoint(unsupported-access-mechanism-for-path--use-mlflow-client:/, /databricks/mlflow-registry)), (+,DbfsMountPoint(dbfs-reserved-path:/uc-volumes-reserved, /Volume)), (+,DbfsMountPoint(dbfs-reserved-path:/uc-volumes-reserved, /volumes)), (+,DbfsMountPoint(wasbs://dbstorage32gi53vs6kgpo.blob.core.windows.net/4679476628690204, /)), (+,DbfsMountPoint(dbfs-reserved-path:/uc-volumes-reserved, /volume))) -23/09/22 03:12:22 INFO BlockManagerMasterEndpoint: Registering block manager 10.11.115.133:45037 with 3.6 GiB RAM, BlockManagerId(0, 10.11.115.133, 45037, None) -23/09/22 03:12:22 INFO DatabricksFileSystemV2Factory: Creating wasbs file system for wasbs://root@dbstorage32gi53vs6kgpo.blob.core.windows.net -23/09/22 03:12:23 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:12:23 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:12:23 INFO DbfsHadoop3: Initialized DBFS with DBFSV2 as the delegate. -23/09/22 03:12:23 INFO HiveConf: Found configuration file file:/databricks/hive/conf/hive-site.xml -23/09/22 03:12:23 INFO SessionManager: HiveServer2: Background operation thread pool size: 100 -23/09/22 03:12:23 INFO SessionManager: HiveServer2: Background operation thread wait queue size: 100 -23/09/22 03:12:23 INFO SessionManager: HiveServer2: Background operation thread keepalive time: 10 seconds -23/09/22 03:12:23 INFO AbstractService: Service:OperationManager is inited. -23/09/22 03:12:23 INFO AbstractService: Service:SessionManager is inited. -23/09/22 03:12:23 INFO SparkSQLCLIService: Service: CLIService is inited. -23/09/22 03:12:23 INFO AbstractService: Service:ThriftHttpCLIService is inited. -23/09/22 03:12:23 INFO HiveThriftServer2: Service: HiveServer2 is inited. -23/09/22 03:12:23 INFO AbstractService: Service:OperationManager is started. -23/09/22 03:12:23 INFO AbstractService: Service:SessionManager is started. -23/09/22 03:12:23 INFO SparkSQLCLIService: Service: CLIService is started. -23/09/22 03:12:23 INFO AbstractService: Service:ThriftHttpCLIService is started. -23/09/22 03:12:23 INFO ThriftCLIService: HTTP Server SSL: adding excluded protocols: [SSLv2, SSLv3] -23/09/22 03:12:23 INFO ThriftCLIService: HTTP Server SSL: SslContextFactory.getExcludeProtocols = [SSL, SSLv2, SSLv2Hello, SSLv3] -23/09/22 03:12:23 INFO Server: jetty-9.4.46.v20220331; built: 2022-03-31T16:38:08.030Z; git: bc17a0369a11ecf40bb92c839b9ef0a8ac50ea18; jvm 1.8.0_362-b09 -23/09/22 03:12:23 INFO session: DefaultSessionIdManager workerName=node0 -23/09/22 03:12:23 INFO session: No SessionScavenger set, using defaults -23/09/22 03:12:23 INFO session: node0 Scavenging every 660000ms -23/09/22 03:12:23 WARN SecurityHandler: ServletContext@o.e.j.s.ServletContextHandler@3c6c87fa{/,null,STARTING} has uncovered http methods for path: /* -23/09/22 03:12:23 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@3c6c87fa{/,null,AVAILABLE} -23/09/22 03:12:23 INFO SslContextFactory: x509=X509@12f85dc8(1,h=[az-westus.workers.prod.ns.databricks.com],a=[],w=[]) for Server@54a04eae[provider=null,keyStore=file:///databricks/keys/jetty-ssl-driver-keystore.jks,trustStore=null] -23/09/22 03:12:23 INFO AbstractConnector: Started ServerConnector@40f49d72{SSL, (ssl, http/1.1)}{0.0.0.0:10000} -23/09/22 03:12:23 INFO Server: Started @26472ms -23/09/22 03:12:23 INFO ThriftCLIService: Started ThriftHttpCLIService in https mode on port 10000 path=/cliservice/* with 5...500 worker threads -23/09/22 03:12:23 INFO AbstractService: Service:HiveServer2 is started. -23/09/22 03:12:23 INFO HiveThriftServer2: HiveThriftServer2 started -23/09/22 03:12:23 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@150eab74{/sqlserver,null,AVAILABLE,@Spark} -23/09/22 03:12:23 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@43f670f3{/sqlserver/json,null,AVAILABLE,@Spark} -23/09/22 03:12:23 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@95f61c2{/sqlserver/session,null,AVAILABLE,@Spark} -23/09/22 03:12:23 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@41f05f1{/sqlserver/session/json,null,AVAILABLE,@Spark} -23/09/22 03:12:23 INFO LibraryResolutionManager: Preferred maven central mirror is configured to https://maven-central.storage-download.googleapis.com/maven2/ -23/09/22 03:12:23 INFO DriverCorral: Creating the driver context -23/09/22 03:12:23 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint -23/09/22 03:12:24 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@fcd300e{/StreamingQuery,null,AVAILABLE,@Spark} -23/09/22 03:12:24 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@5e9c6d8a{/StreamingQuery/json,null,AVAILABLE,@Spark} -23/09/22 03:12:24 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@24cdc97f{/StreamingQuery/statistics,null,AVAILABLE,@Spark} -23/09/22 03:12:24 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@466d87a1{/StreamingQuery/statistics/json,null,AVAILABLE,@Spark} -23/09/22 03:12:24 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@1215982f{/static/sql,null,AVAILABLE,@Spark} -23/09/22 03:12:24 INFO JettyServer$: Creating thread pool with name ... -23/09/22 03:12:24 INFO JettyServer$: Thread pool created -23/09/22 03:12:24 INFO JettyServer$: Creating thread pool with name ... -23/09/22 03:12:24 INFO JettyServer$: Thread pool created -23/09/22 03:12:24 INFO DriverDaemon: Starting driver daemon... -23/09/22 03:12:24 INFO SparkConfUtils$: Customize spark config according to file /tmp/custom-spark.conf -23/09/22 03:12:24 WARN SparkConf: The configuration key 'spark.akka.frameSize' has been deprecated as of Spark 1.6 and may be removed in the future. Please use the new key 'spark.rpc.message.maxSize' instead. -23/09/22 03:12:24 INFO DriverDaemon$: Attempting to run: 'set up ttyd daemon' -23/09/22 03:12:24 INFO DriverDaemon$: Attempting to run: 'Configuring RStudio daemon' -23/09/22 03:12:24 INFO DriverDaemon$: Resetting the default python executable -23/09/22 03:12:24 INFO Utils: resolved command to be run: List(virtualenv, /local_disk0/.ephemeral_nfs/cluster_libraries/python, -p, /databricks/python/bin/python, --no-download, --no-setuptools, --no-wheel) -23/09/22 03:12:26 INFO DatabricksUtils: created python virtualenv: /local_disk0/.ephemeral_nfs/cluster_libraries/python -23/09/22 03:12:26 INFO Utils: resolved command to be run: List(/databricks/python/bin/python, -c, import sys; dirs=[p for p in sys.path if 'package' in p]; print(' '.join(dirs))) -23/09/22 03:12:26 INFO Utils: resolved command to be run: List(/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/python, -c, from distutils.sysconfig import get_python_lib; print(get_python_lib())) -23/09/22 03:12:26 INFO DatabricksUtils: created sites.pth at /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/sites.pth -23/09/22 03:12:26 INFO ClusterWidePythonEnvManager: Registered /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages with the WatchService sun.nio.fs.LinuxWatchService$LinuxWatchKey@1ab6a093 -23/09/22 03:12:26 INFO DriverDaemon$: Attempting to run: 'Update root virtualenv' -23/09/22 03:12:26 INFO Utils: resolved command to be run: WrappedArray(getconf, PAGESIZE) -23/09/22 03:12:26 INFO DriverDaemon$: Finished updating /etc/environment -23/09/22 03:12:26 INFO DriverDaemon$$anon$1: Message out thread ready -23/09/22 03:12:26 INFO Server: jetty-9.4.46.v20220331; built: 2022-03-31T16:38:08.030Z; git: bc17a0369a11ecf40bb92c839b9ef0a8ac50ea18; jvm 1.8.0_362-b09 -23/09/22 03:12:26 INFO AbstractConnector: Started ServerConnector@59960ae9{HTTP/1.1, (http/1.1)}{0.0.0.0:6061} -23/09/22 03:12:26 INFO Server: Started @28920ms -23/09/22 03:12:26 INFO Server: jetty-9.4.46.v20220331; built: 2022-03-31T16:38:08.030Z; git: bc17a0369a11ecf40bb92c839b9ef0a8ac50ea18; jvm 1.8.0_362-b09 -23/09/22 03:12:26 INFO SslContextFactory: x509=X509@7cd50c3d(1,h=[az-westus.workers.prod.ns.databricks.com],a=[],w=[]) for Server@7d97a1a0[provider=null,keyStore=null,trustStore=null] -23/09/22 03:12:26 WARN config: Weak cipher suite TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA enabled for Server@7d97a1a0[provider=null,keyStore=null,trustStore=null] -23/09/22 03:12:26 WARN config: Weak cipher suite TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA enabled for Server@7d97a1a0[provider=null,keyStore=null,trustStore=null] -23/09/22 03:12:26 INFO AbstractConnector: Started ServerConnector@2d4eba14{SSL, (ssl, http/1.1)}{0.0.0.0:6062} -23/09/22 03:12:26 INFO Server: Started @28983ms -23/09/22 03:12:26 INFO DriverDaemon: Started comm channel server -23/09/22 03:12:26 INFO DriverDaemon: Driver daemon started. -23/09/22 03:12:26 INFO DynamicInfoServiceConf: Configured feature flag data source LaunchDarkly -23/09/22 03:12:26 INFO DynamicInfoServiceConf: Load feature flag from LaunchDarkly -23/09/22 03:12:26 WARN DynamicInfoServiceConf: REGION environment variable is not defined. getConfForCurrentRegion will always return default value -23/09/22 03:12:26 INFO FeatureFlagRegister$$anon$1: Configured feature flag data source LaunchDarkly -23/09/22 03:12:26 INFO FeatureFlagRegister$$anon$1: Load feature flag from LaunchDarkly -23/09/22 03:12:26 WARN FeatureFlagRegister$$anon$1: REGION environment variable is not defined. getConfForCurrentRegion will always return default value -23/09/22 03:12:26 WARN FeatureFlagRegister$$anon$2: REGION environment variable is not defined. getConfForCurrentRegion will always return default value -23/09/22 03:12:27 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead. -23/09/22 03:12:27 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead. -23/09/22 03:12:27 INFO DriverCorral: Loading the root classloader -23/09/22 03:12:27 INFO DriverCorral: Starting sql repl ReplId-3e938-b2918-a40bd-0 -23/09/22 03:12:27 INFO DriverCorral: Starting sql repl ReplId-4866c-e0496-c4e7c-0 -23/09/22 03:12:27 INFO DriverCorral: Starting sql repl ReplId-5be81-765d5-a450d-b -23/09/22 03:12:27 INFO SQLDriverWrapper: setupRepl:ReplId-5be81-765d5-a450d-b: finished to load -23/09/22 03:12:27 INFO SQLDriverWrapper: setupRepl:ReplId-4866c-e0496-c4e7c-0: finished to load -23/09/22 03:12:28 INFO DriverCorral: Starting sql repl ReplId-7fb2e-41a1e-7bb98-6 -23/09/22 03:12:28 INFO SQLDriverWrapper: setupRepl:ReplId-3e938-b2918-a40bd-0: finished to load -23/09/22 03:12:28 INFO SQLDriverWrapper: setupRepl:ReplId-7fb2e-41a1e-7bb98-6: finished to load -23/09/22 03:12:28 INFO DriverCorral: Starting sql repl ReplId-366bb-6c8fc-7e848-1 -23/09/22 03:12:28 INFO SQLDriverWrapper: setupRepl:ReplId-366bb-6c8fc-7e848-1: finished to load -23/09/22 03:12:28 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead. -23/09/22 03:12:28 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead. -23/09/22 03:12:28 INFO DriverCorral: Starting r repl ReplId-37c67-11e71-085f2-b -23/09/22 03:12:28 INFO ROutputStreamHandler: Connection succeeded on port 33759 -23/09/22 03:12:28 INFO ROutputStreamHandler: Connection succeeded on port 38659 -23/09/22 03:12:28 INFO RDriverLocal: 1. RDriverLocal.9f9878f0-af22-4610-8015-9ba9cba97f56: object created with for ReplId-37c67-11e71-085f2-b. -23/09/22 03:12:28 INFO RDriverLocal: 2. RDriverLocal.9f9878f0-af22-4610-8015-9ba9cba97f56: initializing ... -23/09/22 03:12:28 INFO RDriverLocal: 3. RDriverLocal.9f9878f0-af22-4610-8015-9ba9cba97f56: started RBackend thread on port 44567 -23/09/22 03:12:28 INFO RDriverLocal: 4. RDriverLocal.9f9878f0-af22-4610-8015-9ba9cba97f56: waiting for SparkR to be installed ... -23/09/22 03:12:31 WARN DriverDaemon: ShouldUseAutoscalingInfo exception thrown, not logging stack trace. This is used for control flow and is ok to ignore -23/09/22 03:12:45 INFO RDriverLocal$: SparkR installation completed. -23/09/22 03:12:45 INFO RDriverLocal: 5. RDriverLocal.9f9878f0-af22-4610-8015-9ba9cba97f56: launching R process ... -23/09/22 03:12:45 INFO RDriverLocal: 6. RDriverLocal.9f9878f0-af22-4610-8015-9ba9cba97f56: cgroup isolation disabled, not placing R process in REPL cgroup. -23/09/22 03:12:45 INFO RDriverLocal: 7. RDriverLocal.9f9878f0-af22-4610-8015-9ba9cba97f56: starting R process on port 1100 (attempt 1) ... -23/09/22 03:12:45 INFO RDriverLocal$: Debugging command for R process builder: SIMBASPARKINI=/etc/simba.sparkodbc.ini R_LIBS=/local_disk0/.ephemeral_nfs/envs/rEnv-f2101999-5405-42d1-9a13-54e56b10c595:/databricks/spark/R/lib:/local_disk0/.ephemeral_nfs/cluster_libraries/r LD_LIBRARY_PATH=/opt/simba/sparkodbc/lib/64/ SPARKR_BACKEND_CONNECTION_TIMEOUT=604800 DB_STREAM_BEACON_STRING_START=DATABRICKS_STREAM_START-ReplId-37c67-11e71-085f2-b DB_STDOUT_STREAM_PORT=33759 SPARKR_BACKEND_AUTH_SECRET=66ff8ddb37f72244f65671addc6c280315e049a38bc9f2d69956c1351b9dff0a DB_STREAM_BEACON_STRING_END=DATABRICKS_STREAM_END-ReplId-37c67-11e71-085f2-b EXISTING_SPARKR_BACKEND_PORT=44567 ODBCINI=/etc/odbc.ini DB_STDERR_STREAM_PORT=38659 /bin/bash /local_disk0/tmp/_startR.sh2234149851276446982resource.r /local_disk0/tmp/_rServeScript.r3565608712795707271resource.r 1100 None -23/09/22 03:12:45 INFO RDriverLocal: 8. RDriverLocal.9f9878f0-af22-4610-8015-9ba9cba97f56: setting up BufferedStreamThread with bufferSize: 1000. -23/09/22 03:12:47 INFO RDriverLocal: 9. RDriverLocal.9f9878f0-af22-4610-8015-9ba9cba97f56: R process started with RServe listening on port 1100. -23/09/22 03:12:47 INFO RDriverLocal: 10. RDriverLocal.9f9878f0-af22-4610-8015-9ba9cba97f56: starting interpreter to talk to R process ... -23/09/22 03:12:47 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect. -23/09/22 03:12:48 INFO ROutputStreamHandler: Successfully connected to stdout in the RShell. -23/09/22 03:12:48 INFO ROutputStreamHandler: Successfully connected to stderr in the RShell. -23/09/22 03:12:48 INFO RDriverLocal: 11. RDriverLocal.9f9878f0-af22-4610-8015-9ba9cba97f56: R interpreter is connected. -23/09/22 03:12:48 INFO RDriverWrapper: setupRepl:ReplId-37c67-11e71-085f2-b: finished to load -23/09/22 03:13:34 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead. -23/09/22 03:13:34 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead. -23/09/22 03:13:34 INFO DriverCorral: Starting python repl ReplId-285c6-06788-c5eb5-e -23/09/22 03:13:34 INFO JupyterDriverLocal: Starting gateway server for repl ReplId-285c6-06788-c5eb5-e -23/09/22 03:13:34 INFO PythonPy4JUtil: Using pinned thread mode in Py4J -23/09/22 03:13:35 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead. -23/09/22 03:13:35 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead. -23/09/22 03:13:35 INFO DynamicTracingConf: Configured feature flag data source LaunchDarkly -23/09/22 03:13:35 INFO DynamicTracingConf: Load feature flag from LaunchDarkly -23/09/22 03:13:35 WARN DynamicTracingConf: REGION environment variable is not defined. getConfForCurrentRegion will always return default value -23/09/22 03:13:35 INFO DriverCorral: Starting sql repl ReplId-7a2d7-f0b9e-5e69d-c -23/09/22 03:13:35 INFO SQLDriverWrapper: setupRepl:ReplId-7a2d7-f0b9e-5e69d-c: finished to load -23/09/22 03:13:35 INFO CommChannelWebSocket: onWebSocketConnect: websocket connected with session: WebSocketSession[websocket=JettyAnnotatedEventDriver[com.databricks.backend.daemon.driver.CommChannelWebSocket@30cb2f61],behavior=SERVER,connection=WebSocketServerConnection@1fc76509::DecryptedEndPoint@1a0baa83{l=/10.11.115.134:6062,r=/10.11.115.198:55930,OPEN,fill=-,flush=-,to=160/7200000},remote=WebSocketRemoteEndpoint@3e406be6[batching=true],incoming=JettyAnnotatedEventDriver[com.databricks.backend.daemon.driver.CommChannelWebSocket@30cb2f61],outgoing=ExtensionStack[queueSize=0,extensions=[],incoming=org.eclipse.jetty.websocket.common.WebSocketSession,outgoing=org.eclipse.jetty.websocket.server.WebSocketServerConnection]] -23/09/22 03:13:35 INFO OutgoingDirectNotebookMessageBuffer: Start MessageSendTask with session: 162088433 -23/09/22 03:13:37 INFO VirtualenvCloneHelper: Creating notebook-scoped virtualenv for b418d423-c52b-4877-8abc-07050e47b11d -23/09/22 03:13:37 INFO VirtualenvCloneHelper: Creating notebook-scoped virtualenv for f74596f4-5304-42b9-9f73-fa7bc858b89c -23/09/22 03:13:37 INFO Utils: resolved command to be run: List(virtualenv, /local_disk0/.ephemeral_nfs/envs/pythonEnv-f74596f4-5304-42b9-9f73-fa7bc858b89c, -p, /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/python, --no-download, --no-setuptools, --no-wheel) -23/09/22 03:13:37 INFO Utils: resolved command to be run: List(virtualenv, /local_disk0/.ephemeral_nfs/envs/pythonEnv-b418d423-c52b-4877-8abc-07050e47b11d, -p, /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/python, --no-download, --no-setuptools, --no-wheel) -23/09/22 03:13:37 INFO DatabricksUtils: created python virtualenv: /local_disk0/.ephemeral_nfs/envs/pythonEnv-f74596f4-5304-42b9-9f73-fa7bc858b89c -23/09/22 03:13:37 INFO Utils: resolved command to be run: List(/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/python, -c, import sys; dirs=[p for p in sys.path if 'package' in p]; print(' '.join(dirs))) -23/09/22 03:13:37 INFO DatabricksUtils: created python virtualenv: /local_disk0/.ephemeral_nfs/envs/pythonEnv-b418d423-c52b-4877-8abc-07050e47b11d -23/09/22 03:13:37 INFO Utils: resolved command to be run: List(/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/python, -c, import sys; dirs=[p for p in sys.path if 'package' in p]; print(' '.join(dirs))) -23/09/22 03:13:37 INFO Utils: resolved command to be run: List(/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74596f4-5304-42b9-9f73-fa7bc858b89c/bin/python, -c, from distutils.sysconfig import get_python_lib; print(get_python_lib())) -23/09/22 03:13:37 INFO Utils: resolved command to be run: List(/local_disk0/.ephemeral_nfs/envs/pythonEnv-b418d423-c52b-4877-8abc-07050e47b11d/bin/python, -c, from distutils.sysconfig import get_python_lib; print(get_python_lib())) -23/09/22 03:13:37 INFO DatabricksUtils: created sites.pth at /local_disk0/.ephemeral_nfs/envs/pythonEnv-f74596f4-5304-42b9-9f73-fa7bc858b89c/lib/python3.9/site-packages/sites.pth -23/09/22 03:13:37 INFO NotebookScopedPythonEnvManager: Time spent to start virtualenv /local_disk0/.ephemeral_nfs/envs/pythonEnv-f74596f4-5304-42b9-9f73-fa7bc858b89c is 462(ms) -23/09/22 03:13:37 INFO NotebookScopedPythonEnvManager: Registered /local_disk0/.ephemeral_nfs/envs/pythonEnv-f74596f4-5304-42b9-9f73-fa7bc858b89c/lib/python3.9/site-packages with the WatchService sun.nio.fs.LinuxWatchService$LinuxWatchKey@75a9c171 -23/09/22 03:13:37 INFO DatabricksUtils: created sites.pth at /local_disk0/.ephemeral_nfs/envs/pythonEnv-b418d423-c52b-4877-8abc-07050e47b11d/lib/python3.9/site-packages/sites.pth -23/09/22 03:13:37 INFO NotebookScopedPythonEnvManager: Time spent to start virtualenv /local_disk0/.ephemeral_nfs/envs/pythonEnv-b418d423-c52b-4877-8abc-07050e47b11d is 495(ms) -23/09/22 03:13:37 INFO NotebookScopedPythonEnvManager: Registered /local_disk0/.ephemeral_nfs/envs/pythonEnv-b418d423-c52b-4877-8abc-07050e47b11d/lib/python3.9/site-packages with the WatchService sun.nio.fs.LinuxWatchService$LinuxWatchKey@1df8f4ae -23/09/22 03:13:37 INFO IpykernelUtils$: Python process builder: [bash, /local_disk0/.ephemeral_nfs/envs/pythonEnv-f74596f4-5304-42b9-9f73-fa7bc858b89c/python_start_f74596f4-5304-42b9-9f73-fa7bc858b89c.sh, /databricks/spark/python/pyspark/wrapped_python.py, root, /local_disk0/.ephemeral_nfs/envs/pythonEnv-f74596f4-5304-42b9-9f73-fa7bc858b89c/bin/python, /databricks/python_shell/scripts/db_ipykernel_launcher.py, -f, /databricks/kernel-connections/7979f0a86bbf7b0beee0790480e55b2f93f4ffb25c936be5716b2ec62f608d01.json] -23/09/22 03:13:37 INFO IpykernelUtils$: Cgroup isolation disabled, not placing python process in repl cgroup -23/09/22 03:13:37 INFO ProgressReporter$: Added result fetcher for 8803832534457543132_7062199902851827812_65e2f9e7-9eb1-4d20-b3b7-bcf8c99891cf -23/09/22 03:13:38 INFO ClusterLoadMonitor: Added query with execution ID:0. Current active queries:1 -23/09/22 03:13:38 INFO LogicalPlanStats: Setting LogicalPlanStats visitor to com.databricks.sql.optimizer.statsEstimation.DatabricksLogicalPlanStatsVisitor$ -23/09/22 03:13:38 INFO ClusterLoadAvgHelper: Current cluster load: 1, Old Ema: 0.0, New Ema: 1.0 -23/09/22 03:13:40 INFO SecuredHiveExternalCatalog: creating hiveClient from java.lang.Throwable - at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:79) - at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:77) - at org.apache.spark.sql.hive.HiveExternalCatalog.maybeSynchronized(HiveExternalCatalog.scala:113) - at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$1(HiveExternalCatalog.scala:153) - at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:377) - at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:363) - at com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34) - at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:152) - at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:313) - at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:263) - at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:253) - at org.apache.spark.sql.internal.SharedState.$anonfun$globalTempViewManager$1(SharedState.scala:336) - at org.apache.spark.sql.internal.SharedState.$anonfun$globalTempViewExternalCatalogNameCheck$1(SharedState.scala:308) - at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) - at scala.util.Try$.apply(Try.scala:213) - at org.apache.spark.sql.internal.SharedState.globalTempViewExternalCatalogNameCheck(SharedState.scala:308) - at org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:336) - at org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:332) - at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$hiveCatalog$2(HiveSessionStateBuilder.scala:78) - at org.apache.spark.sql.catalyst.catalog.SessionCatalogImpl.globalTempViewManager$lzycompute(SessionCatalog.scala:554) - at org.apache.spark.sql.catalyst.catalog.SessionCatalogImpl.globalTempViewManager(SessionCatalog.scala:554) - at org.apache.spark.sql.catalyst.catalog.SessionCatalogImpl.setCurrentDatabaseWithoutCheck(SessionCatalog.scala:831) - at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.setCurrentDatabaseWithoutCheck(ManagedCatalogSessionCatalog.scala:503) - at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.setCurrentCatalog(ManagedCatalogSessionCatalog.scala:366) - at com.databricks.sql.DatabricksCatalogManager.setCurrentCatalog(DatabricksCatalogManager.scala:135) - at org.apache.spark.sql.execution.command.SetCatalogCommand.run(SetCatalogCommand.scala:30) - at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80) - at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78) - at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89) - at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:229) - at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:249) - at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:399) - at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:194) - at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985) - at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:148) - at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:349) - at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:229) - at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:214) - at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:227) - at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:220) - at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) - at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:99) - at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) - at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31) - at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:298) - at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:294) - at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31) - at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31) - at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488) - at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:220) - at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:354) - at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:220) - at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:174) - at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:165) - at org.apache.spark.sql.Dataset.(Dataset.scala:238) - at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:107) - at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985) - at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:104) - at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:820) - at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985) - at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:815) - at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:695) - at com.databricks.backend.daemon.driver.SQLDriverLocal.$anonfun$executeSql$1(SQLDriverLocal.scala:91) - at scala.collection.immutable.List.map(List.scala:293) - at com.databricks.backend.daemon.driver.SQLDriverLocal.executeSql(SQLDriverLocal.scala:37) - at com.databricks.backend.daemon.driver.SQLDriverLocal.repl(SQLDriverLocal.scala:145) - at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$24(DriverLocal.scala:740) - at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:124) - at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$21(DriverLocal.scala:723) - at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:403) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:147) - at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:401) - at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:398) - at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:62) - at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:446) - at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:431) - at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:62) - at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:700) - at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:622) - at scala.util.Try$.apply(Try.scala:213) - at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:614) - at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:533) - at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:568) - at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:438) - at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:381) - at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:232) - at java.lang.Thread.run(Thread.java:750) - -23/09/22 03:13:40 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead. -23/09/22 03:13:40 INFO HiveUtils: Initializing HiveMetastoreConnection version 0.13.0 using file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.hive.shims--hive-shims-common--org.apache.hive.shims__hive-shims-common__0.13.1-databricks-8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.xerial.snappy--snappy-java--org.xerial.snappy__snappy-java__1.0.5.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.mortbay.jetty--jetty--org.mortbay.jetty__jetty__6.1.26.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.codehaus.jackson--jackson-mapper-asl--org.codehaus.jackson__jackson-mapper-asl__1.9.13.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--javax.servlet--servlet-api--javax.servlet__servlet-api__2.5.jar:file:/databricks/databricks-hive/----ws_3_3--mvn--hadoop3--org.slf4j--slf4j-api--org.slf4j__slf4j-api__1.7.36.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.hive--hive-common--org.apache.hive__hive-common__0.13.1-databricks-8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.avro--avro--org.apache.avro__avro__1.7.5.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.httpcomponents--httpclient--org.apache.httpcomponents__httpclient__4.4.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.commons--commons-lang3--org.apache.commons__commons-lang3__3.4.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--stax--stax-api--stax__stax-api__1.0.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.thrift--libfb303--org.apache.thrift__libfb303__0.9.0.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.hive.shims--hive-shims-common-secure--org.apache.hive.shims__hive-shims-common-secure__0.13.1-databricks-8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.antlr--stringtemplate--org.antlr__stringtemplate__3.2.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.hive--hive-serde--org.apache.hive__hive-serde__0.13.1-databricks-8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--net.sf.jpam--jpam--net.sf.jpam__jpam__1.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--com.google.code.findbugs--jsr305--com.google.code.findbugs__jsr305__1.3.9.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.hive--hive-exec--org.apache.hive__hive-exec__0.13.1-databricks-8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.datanucleus--datanucleus-rdbms--org.datanucleus__datanucleus-rdbms__4.1.19.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.geronimo.specs--geronimo-jaspic_1.0_spec--org.apache.geronimo.specs__geronimo-jaspic_1.0_spec__1.0.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.ow2.asm--asm--org.ow2.asm__asm__4.0.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.hive--hive-cli--org.apache.hive__hive-cli__0.13.1-databricks-8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--asm--asm--asm__asm__3.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--antlr--antlr--antlr__antlr__2.7.7.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.mortbay.jetty--servlet-api--org.mortbay.jetty__servlet-api__2.5-20081211.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.hive.shims--hive-shims-0.20S--org.apache.hive.shims__hive-shims-0.20S__0.13.1-databricks-8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.ant--ant-launcher--org.apache.ant__ant-launcher__1.9.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.velocity--velocity--org.apache.velocity__velocity__1.5.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--javax.mail--mail--javax.mail__mail__1.4.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.thrift--libthrift--org.apache.thrift__libthrift__0.9.2.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--com.google.guava--guava--com.google.guava__guava__11.0.2.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.mortbay.jetty--jetty-util--org.mortbay.jetty__jetty-util__6.1.26.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.hive--hive-service--org.apache.hive__hive-service__0.13.1-databricks-8.jar:file:/databricks/databricks-hive/----ws_3_3--mvn--hadoop3--org.apache.logging.log4j--log4j-slf4j-impl--org.apache.logging.log4j__log4j-slf4j-impl__2.18.0.jar:file:/databricks/databricks-hive/----ws_3_3--mvn--hadoop3--org.apache.logging.log4j--log4j-api--org.apache.logging.log4j__log4j-api__2.18.0.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--junit--junit--junit__junit__3.8.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.commons--commons-compress--org.apache.commons__commons-compress__1.9.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--commons-logging--commons-logging--commons-logging__commons-logging__1.1.3.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.zookeeper--zookeeper--org.apache.zookeeper__zookeeper__3.4.5.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--javax.jdo--jdo-api--javax.jdo__jdo-api__3.0.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.hive.shims--hive-shims-0.20--org.apache.hive.shims__hive-shims-0.20__0.13.1-databricks-8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.ant--ant--org.apache.ant__ant__1.9.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.objenesis--objenesis--org.objenesis__objenesis__1.2.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--asm--asm-commons--asm__asm-commons__3.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--commons-io--commons-io--commons-io__commons-io__2.5.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--com.thoughtworks.paranamer--paranamer--com.thoughtworks.paranamer__paranamer__2.8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--com.esotericsoftware.reflectasm--reflectasm-shaded--com.esotericsoftware.reflectasm__reflectasm-shaded__1.07.jar:file:/databricks/databricks-hive/----ws_3_3--mvn--hadoop3--org.apache.logging.log4j--log4j-core--org.apache.logging.log4j__log4j-core__2.18.0.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--javax.transaction--jta--javax.transaction__jta__1.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--jline--jline--jline__jline__0.9.94.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.eclipse.jetty.aggregate--jetty-all--org.eclipse.jetty.aggregate__jetty-all__7.6.0.v20120127.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.datanucleus--datanucleus-core--org.datanucleus__datanucleus-core__4.1.17.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--commons-httpclient--commons-httpclient--commons-httpclient__commons-httpclient__3.0.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.antlr--antlr-runtime--org.antlr__antlr-runtime__3.4.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.hive--hive-ant--org.apache.hive__hive-ant__0.13.1-databricks-8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.antlr--ST4--org.antlr__ST4__4.0.4.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--oro--oro--oro__oro__2.0.8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.hive--hive-jdbc--org.apache.hive__hive-jdbc__0.13.1-databricks-8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.hive--hive-beeline--org.apache.hive__hive-beeline__0.13.1-databricks-8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--javax.transaction--transaction-api--javax.transaction__transaction-api__1.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--commons-lang--commons-lang--commons-lang__commons-lang__2.4.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--commons-cli--commons-cli--commons-cli__commons-cli__1.2.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--com.esotericsoftware.kryo--kryo--com.esotericsoftware.kryo__kryo__2.21.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.hive.shims--hive-shims-0.23--org.apache.hive.shims__hive-shims-0.23__0.13.1-databricks-8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.httpcomponents--httpcore--org.apache.httpcomponents__httpcore__4.2.5.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.codehaus.jackson--jackson-core-asl--org.codehaus.jackson__jackson-core-asl__1.9.13.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--asm--asm-tree--asm__asm-tree__3.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--com.esotericsoftware.minlog--minlog--com.esotericsoftware.minlog__minlog__1.2.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.geronimo.specs--geronimo-annotation_1.0_spec--org.apache.geronimo.specs__geronimo-annotation_1.0_spec__1.1.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--commons-codec--commons-codec--commons-codec__commons-codec__1.8.jar:file:/databricks/databricks-hive/----ws_3_3--mvn--hadoop3--org.apache.logging.log4j--log4j-1.2-api--org.apache.logging.log4j__log4j-1.2-api__2.18.0.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.codehaus.groovy--groovy-all--org.codehaus.groovy__groovy-all__2.1.6.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.datanucleus--javax.jdo--org.datanucleus__javax.jdo__3.2.0-m3.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.hive--hive-shims--org.apache.hive__hive-shims__0.13.1-databricks-8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--commons-collections--commons-collections--commons-collections__commons-collections__3.2.2.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--javax.activation--activation--javax.activation__activation__1.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.geronimo.specs--geronimo-jta_1.1_spec--org.apache.geronimo.specs__geronimo-jta_1.1_spec__1.1.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--com.zaxxer--HikariCP--com.zaxxer__HikariCP__2.5.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.derby--derby--org.apache.derby__derby__10.10.1.1.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.apache.hive--hive-metastore--org.apache.hive__hive-metastore__0.13.1-databricks-8.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--com.jolbox--bonecp--com.jolbox__bonecp__0.8.0.RELEASE.jar:file:/databricks/databricks-hive/----ws_3_3--maven-trees--hive-metastore-databricks-log4j2--org.datanucleus--datanucleus-api-jdo--org.datanucleus__datanucleus-api-jdo__4.2.4.jar:file:/databricks/databricks-hive/manifest.jar:file:/databricks/databricks-hive/bonecp-configs.jar -23/09/22 03:13:40 INFO PoolingHiveClient: Hive metastore connection pool implementation is HikariCP -23/09/22 03:13:40 INFO LocalHiveClientsPool: Create Hive Metastore client pool of size 20 -23/09/22 03:13:40 INFO HiveClientImpl: Warehouse location for Hive client (version 0.13.1) is dbfs:/user/hive/warehouse -23/09/22 03:13:41 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore -23/09/22 03:13:41 INFO ConsoleTransport: {"eventTime":"2023-09-22T03:13:40.231Z","producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent","eventType":"START","run":{"runId":"83806b2b-6e39-49a4-a6f2-8efdc67da215","facets":{"spark.logicalPlan":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","plan":[{"class":"org.apache.spark.sql.execution.command.SetCatalogCommand","num-children":0,"catalogName":"hive_metastore"}]},"spark_version":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","spark-version":"3.3.0","openlineage-spark-version":"1.2.2"},"processing_engine":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet","version":"3.3.0","name":"spark","openlineageAdapterVersion":"1.2.2"}}},"job":{"namespace":"adb-5445974573286168.8#default","name":"adb-4679476628690204.4.azuredatabricks.net.execute_set_catalog_command","facets":{}},"inputs":[],"outputs":[]} -23/09/22 03:13:41 INFO AsyncEventQueue: Process of event SparkListenerSQLExecutionStart(executionId=0, ...) by listener OpenLineageSparkListener took 1.055753891s. -23/09/22 03:13:41 INFO ObjectStore: ObjectStore, initialize called -23/09/22 03:13:41 INFO Persistence: Property datanucleus.fixedDatastore unknown - will be ignored -23/09/22 03:13:41 INFO Persistence: Property datanucleus.connectionPool.idleTimeout unknown - will be ignored -23/09/22 03:13:41 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored -23/09/22 03:13:41 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored -23/09/22 03:13:41 INFO ClusterLoadAvgHelper: Current cluster load: 1, Old Ema: 1.0, New Ema: 1.0 -23/09/22 03:13:41 INFO HikariDataSource: HikariPool-1 - Started. -23/09/22 03:13:42 INFO HikariDataSource: HikariPool-2 - Started. -23/09/22 03:13:42 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" -23/09/22 03:13:43 INFO ObjectStore: Initialized ObjectStore -23/09/22 03:13:43 INFO HiveMetaStore: Added admin role in metastore -23/09/22 03:13:43 INFO HiveMetaStore: Added public role in metastore -23/09/22 03:13:43 INFO HiveMetaStore: No user is added in admin role, since config is empty -23/09/22 03:13:44 INFO HiveMetaStore: 0: get_database: default -23/09/22 03:13:44 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: default -23/09/22 03:13:44 INFO HiveMetaStore: 0: get_database: global_temp -23/09/22 03:13:44 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: global_temp -23/09/22 03:13:44 ERROR RetryingHMSHandler: NoSuchObjectException(message:There is no database named global_temp) - at org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:508) - at org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:519) - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) - at java.lang.reflect.Method.invoke(Method.java:498) - at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108) - at com.sun.proxy.$Proxy86.getDatabase(Unknown Source) - at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:796) - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) - at java.lang.reflect.Method.invoke(Method.java:498) - at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) - at com.sun.proxy.$Proxy88.get_database(Unknown Source) - at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:949) - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) - at java.lang.reflect.Method.invoke(Method.java:498) - at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) - at com.sun.proxy.$Proxy89.getDatabase(Unknown Source) - at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1165) - at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1154) - at org.apache.spark.sql.hive.client.Shim_v0_12.databaseExists(HiveShim.scala:619) - at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$databaseExists$1(HiveClientImpl.scala:440) - at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) - at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:337) - at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$retryLocked$1(HiveClientImpl.scala:236) - at org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:274) - at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:228) - at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:317) - at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:440) - at org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$databaseExists$1(PoolingHiveClient.scala:321) - at org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$databaseExists$1$adapted(PoolingHiveClient.scala:320) - at org.apache.spark.sql.hive.client.PoolingHiveClient.withHiveClient(PoolingHiveClient.scala:149) - at org.apache.spark.sql.hive.client.PoolingHiveClient.databaseExists(PoolingHiveClient.scala:320) - at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:313) - at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) - at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80) - at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$2(HiveExternalCatalog.scala:154) - at org.apache.spark.sql.hive.HiveExternalCatalog.maybeSynchronized(HiveExternalCatalog.scala:115) - at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$1(HiveExternalCatalog.scala:153) - at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:377) - at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:363) - at com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34) - at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:152) - at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:313) - at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.databaseExists(ExternalCatalogWithListener.scala:77) - at org.apache.spark.sql.internal.SharedState.$anonfun$globalTempViewExternalCatalogNameCheck$1(SharedState.scala:308) - at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) - at scala.util.Try$.apply(Try.scala:213) - at org.apache.spark.sql.internal.SharedState.globalTempViewExternalCatalogNameCheck(SharedState.scala:308) - at org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:336) - at org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:332) - at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$hiveCatalog$2(HiveSessionStateBuilder.scala:78) - at org.apache.spark.sql.catalyst.catalog.SessionCatalogImpl.globalTempViewManager$lzycompute(SessionCatalog.scala:554) - at org.apache.spark.sql.catalyst.catalog.SessionCatalogImpl.globalTempViewManager(SessionCatalog.scala:554) - at org.apache.spark.sql.catalyst.catalog.SessionCatalogImpl.setCurrentDatabaseWithoutCheck(SessionCatalog.scala:831) - at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.setCurrentDatabaseWithoutCheck(ManagedCatalogSessionCatalog.scala:503) - at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.setCurrentCatalog(ManagedCatalogSessionCatalog.scala:366) - at com.databricks.sql.DatabricksCatalogManager.setCurrentCatalog(DatabricksCatalogManager.scala:135) - at org.apache.spark.sql.execution.command.SetCatalogCommand.run(SetCatalogCommand.scala:30) - at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80) - at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78) - at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89) - at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:229) - at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:249) - at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:399) - at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:194) - at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985) - at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:148) - at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:349) - at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:229) - at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:214) - at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:227) - at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:220) - at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) - at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:99) - at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) - at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31) - at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:298) - at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:294) - at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31) - at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31) - at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488) - at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:220) - at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:354) - at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:220) - at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:174) - at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:165) - at org.apache.spark.sql.Dataset.(Dataset.scala:238) - at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:107) - at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985) - at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:104) - at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:820) - at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985) - at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:815) - at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:695) - at com.databricks.backend.daemon.driver.SQLDriverLocal.$anonfun$executeSql$1(SQLDriverLocal.scala:91) - at scala.collection.immutable.List.map(List.scala:293) - at com.databricks.backend.daemon.driver.SQLDriverLocal.executeSql(SQLDriverLocal.scala:37) - at com.databricks.backend.daemon.driver.SQLDriverLocal.repl(SQLDriverLocal.scala:145) - at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$24(DriverLocal.scala:740) - at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:124) - at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$21(DriverLocal.scala:723) - at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:403) - at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) - at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:147) - at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:401) - at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:398) - at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:62) - at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:446) - at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:431) - at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:62) - at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:700) - at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:622) - at scala.util.Try$.apply(Try.scala:213) - at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:614) - at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:533) - at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:568) - at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:438) - at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:381) - at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:232) - at java.lang.Thread.run(Thread.java:750) - -23/09/22 03:13:44 INFO HiveMetaStore: 0: get_database: default -23/09/22 03:13:44 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: default -23/09/22 03:13:44 INFO HiveMetaStore: 0: get_database: default -23/09/22 03:13:44 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: default -23/09/22 03:13:44 INFO ClusterLoadMonitor: Removed query with execution ID:0. Current active queries:0 -23/09/22 03:13:44 INFO ConsoleTransport: {"eventTime":"2023-09-22T03:13:44.19Z","producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent","eventType":"COMPLETE","run":{"runId":"83806b2b-6e39-49a4-a6f2-8efdc67da215","facets":{"spark.logicalPlan":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","plan":[{"class":"org.apache.spark.sql.execution.command.SetCatalogCommand","num-children":0,"catalogName":"hive_metastore"}]},"spark_version":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","spark-version":"3.3.0","openlineage-spark-version":"1.2.2"},"processing_engine":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet","version":"3.3.0","name":"spark","openlineageAdapterVersion":"1.2.2"}}},"job":{"namespace":"adb-5445974573286168.8#default","name":"adb-4679476628690204.4.azuredatabricks.net.execute_set_catalog_command","facets":{}},"inputs":[],"outputs":[]} -23/09/22 03:13:44 WARN SimpleFunctionRegistry: The function getargument replaced a previously registered function. -23/09/22 03:13:44 INFO ClusterLoadMonitor: Added query with execution ID:1. Current active queries:1 -23/09/22 03:13:44 INFO ClusterLoadAvgHelper: Current cluster load: 1, Old Ema: 1.0, New Ema: 1.0 -23/09/22 03:13:44 INFO HiveMetaStore: 0: get_databases: * -23/09/22 03:13:44 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_databases: * -23/09/22 03:13:44 INFO ConsoleTransport: {"eventTime":"2023-09-22T03:13:44.623Z","producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent","eventType":"START","run":{"runId":"eacbd5dc-0514-4ec2-b963-f7dae875fdf3","facets":{"spark.logicalPlan":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","plan":[{"class":"org.apache.spark.sql.catalyst.plans.logical.ShowNamespaces","num-children":1,"namespace":0,"output":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"databaseName","dataType":"string","nullable":false,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":6,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}]]},{"class":"org.apache.spark.sql.catalyst.analysis.ResolvedNamespace","num-children":0,"catalog":null,"namespace":[]}]},"spark_version":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","spark-version":"3.3.0","openlineage-spark-version":"1.2.2"},"processing_engine":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet","version":"3.3.0","name":"spark","openlineageAdapterVersion":"1.2.2"}}},"job":{"namespace":"adb-5445974573286168.8#default","name":"adb-4679476628690204.4.azuredatabricks.net.show_namespaces","facets":{}},"inputs":[],"outputs":[]} -23/09/22 03:13:46 INFO PythonDriverWrapper: setupRepl:ReplId-285c6-06788-c5eb5-e: finished to load -23/09/22 03:13:46 INFO ProgressReporter$: Added result fetcher for 2908305457167067998_7192583573582421287_a94f2305c01146bdabe8f83549508a51 -23/09/22 03:13:46 INFO AsyncEventQueue: Process of event SparkListenerQueryProfileParamsReady(executionId=0, ...) by listener QueryProfileListener took 1.437714923s. -23/09/22 03:13:46 INFO CodeGenerator: Code generated in 1131.434114 ms -23/09/22 03:13:46 INFO ClusterLoadMonitor: Removed query with execution ID:1. Current active queries:0 -23/09/22 03:13:46 INFO ConsoleTransport: {"eventTime":"2023-09-22T03:13:46.546Z","producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent","eventType":"COMPLETE","run":{"runId":"eacbd5dc-0514-4ec2-b963-f7dae875fdf3","facets":{"spark.logicalPlan":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","plan":[{"class":"org.apache.spark.sql.catalyst.plans.logical.ShowNamespaces","num-children":1,"namespace":0,"output":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"databaseName","dataType":"string","nullable":false,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":6,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}]]},{"class":"org.apache.spark.sql.catalyst.analysis.ResolvedNamespace","num-children":0,"catalog":null,"namespace":[]}]},"spark_version":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","spark-version":"3.3.0","openlineage-spark-version":"1.2.2"},"processing_engine":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet","version":"3.3.0","name":"spark","openlineageAdapterVersion":"1.2.2"}}},"job":{"namespace":"adb-5445974573286168.8#default","name":"adb-4679476628690204.4.azuredatabricks.net.show_namespaces","facets":{}},"inputs":[],"outputs":[]} -23/09/22 03:13:46 INFO ClusterLoadMonitor: Added query with execution ID:2. Current active queries:1 -23/09/22 03:13:46 INFO CodeGenerator: Code generated in 62.950143 ms -23/09/22 03:13:47 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart -23/09/22 03:13:47 INFO ClusterLoadMonitor: Removed query with execution ID:2. Current active queries:0 -23/09/22 03:13:47 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd -23/09/22 03:13:47 INFO ProgressReporter$: Removed result fetcher for 2908305457167067998_7192583573582421287_a94f2305c01146bdabe8f83549508a51 -23/09/22 03:13:47 INFO CodeGenerator: Code generated in 34.004039 ms -23/09/22 03:13:47 INFO ProgressReporter$: Removed result fetcher for 8803832534457543132_7062199902851827812_65e2f9e7-9eb1-4d20-b3b7-bcf8c99891cf -23/09/22 03:13:47 INFO ProgressReporter$: Added result fetcher for 2908305457167067998_5655166856849056603_5a8498e54dc6435896f9d354ad4dc411 -23/09/22 03:13:47 INFO ProgressReporter$: Removed result fetcher for 2908305457167067998_5655166856849056603_5a8498e54dc6435896f9d354ad4dc411 -23/09/22 03:13:47 INFO ProgressReporter$: Added result fetcher for 2908305457167067998_5289524564745408939_d346c9547ff042428a53259a1692d220 -23/09/22 03:13:47 INFO ClusterLoadAvgHelper: Current cluster load: 0, Old Ema: 1.0, New Ema: 0.85 -23/09/22 03:13:47 INFO ClusterLoadMonitor: Added query with execution ID:3. Current active queries:1 -23/09/22 03:13:47 INFO LogicalPlanStats: Setting LogicalPlanStats visitor to com.databricks.sql.optimizer.statsEstimation.DatabricksLogicalPlanStatsVisitor$ -23/09/22 03:13:47 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart -23/09/22 03:13:47 INFO HiveMetaStore: 1: get_database: journey -23/09/22 03:13:47 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: journey -23/09/22 03:13:47 INFO HiveMetaStore: 1: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore -23/09/22 03:13:47 INFO ObjectStore: ObjectStore, initialize called -23/09/22 03:13:47 INFO ObjectStore: Initialized ObjectStore -23/09/22 03:13:47 INFO HiveMetaStore: 1: get_database: journey -23/09/22 03:13:47 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: journey -23/09/22 03:13:47 INFO ClusterLoadMonitor: Removed query with execution ID:3. Current active queries:0 -23/09/22 03:13:47 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd -23/09/22 03:13:47 INFO ClusterLoadMonitor: Added query with execution ID:4. Current active queries:1 -23/09/22 03:13:47 INFO ClusterLoadMonitor: Removed query with execution ID:4. Current active queries:0 -23/09/22 03:13:47 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart -23/09/22 03:13:47 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd -23/09/22 03:13:48 INFO ProgressReporter$: Removed result fetcher for 2908305457167067998_5289524564745408939_d346c9547ff042428a53259a1692d220 -23/09/22 03:13:48 INFO ProgressReporter$: Added result fetcher for 2908305457167067998_8129229420424498214_3fae42fffd6144fca582f98dbc9b4746 -23/09/22 03:13:48 INFO ProgressReporter$: Removed result fetcher for 2908305457167067998_8129229420424498214_3fae42fffd6144fca582f98dbc9b4746 -23/09/22 03:13:48 INFO ProgressReporter$: Added result fetcher for 2908305457167067998_6796555818560213290_0c0092c6b28541e7b10544b4b1cad76d -23/09/22 03:13:48 INFO HiveMetaStore: 1: get_table : db=journey tbl=transactions -23/09/22 03:13:48 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=journey tbl=transactions -23/09/22 03:13:48 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:48 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:48 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:48 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:48 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:48 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:49 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:49 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:49 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:49 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:50 INFO DeltaLog: Loading version 16 starting from checkpoint version 10. -23/09/22 03:13:50 INFO ClusterLoadAvgHelper: Current cluster load: 0, Old Ema: 0.85, New Ema: 0.0 -23/09/22 03:13:51 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:51 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:51 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:51 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:51 INFO SnapshotEdge: [tableId=88997f34-e6ae-4a52-8e90-beab2ca48dfb] Created snapshot SnapshotEdge(path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log, version=16, metadata=Metadata(e409515a-4f0e-4b35-908c-3a8c6591a14f,null,null,Format(parquet,Map()),{"type":"struct","fields":[{"name":"household_id","type":"integer","nullable":true,"metadata":{}},{"name":"basket_id","type":"long","nullable":true,"metadata":{}},{"name":"day","type":"integer","nullable":true,"metadata":{}},{"name":"product_id","type":"integer","nullable":true,"metadata":{}},{"name":"quantity","type":"integer","nullable":true,"metadata":{}},{"name":"sales_amount","type":"float","nullable":true,"metadata":{}},{"name":"store_id","type":"integer","nullable":true,"metadata":{}},{"name":"discount_amount","type":"float","nullable":true,"metadata":{}},{"name":"transaction_time","type":"integer","nullable":true,"metadata":{}},{"name":"week_no","type":"integer","nullable":true,"metadata":{}},{"name":"coupon_discount","type":"float","nullable":true,"metadata":{}},{"name":"coupon_discount_match","type":"float","nullable":true,"metadata":{}}]},List(),Map(),Some(1694676659851)), logSegment=LogSegment(wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log,16,WrappedArray(FileStatus{path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log/00000000000000000011.json; isDirectory=false; length=6616; replication=1; blocksize=536870912; modification_time=1695274264000; access_time=0; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}, FileStatus{path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log/00000000000000000012.json; isDirectory=false; length=11239; replication=1; blocksize=536870912; modification_time=1695274677000; access_time=0; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}, FileStatus{path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log/00000000000000000013.json; isDirectory=false; length=8080; replication=1; blocksize=536870912; modification_time=1695276655000; access_time=0; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}, FileStatus{path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log/00000000000000000014.json; isDirectory=false; length=6616; replication=1; blocksize=536870912; modification_time=1695346578000; access_time=0; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}, FileStatus{path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log/00000000000000000015.json; isDirectory=false; length=6616; replication=1; blocksize=536870912; modification_time=1695347164000; access_time=0; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}, FileStatus{path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log/00000000000000000016.json; isDirectory=false; length=6616; replication=1; blocksize=536870912; modification_time=1695351300000; access_time=0; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}),WrappedArray(FileStatus{path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log/00000000000000000010.checkpoint.parquet; isDirectory=false; length=34444; replication=1; blocksize=536870912; modification_time=1695273438000; access_time=0; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}),Some(10),1695351300000), checksumOpt=Some(VersionChecksum(20222849,4,1,1,Protocol(1,2),Metadata(e409515a-4f0e-4b35-908c-3a8c6591a14f,null,null,Format(parquet,Map()),{"type":"struct","fields":[{"name":"household_id","type":"integer","nullable":true,"metadata":{}},{"name":"basket_id","type":"long","nullable":true,"metadata":{}},{"name":"day","type":"integer","nullable":true,"metadata":{}},{"name":"product_id","type":"integer","nullable":true,"metadata":{}},{"name":"quantity","type":"integer","nullable":true,"metadata":{}},{"name":"sales_amount","type":"float","nullable":true,"metadata":{}},{"name":"store_id","type":"integer","nullable":true,"metadata":{}},{"name":"discount_amount","type":"float","nullable":true,"metadata":{}},{"name":"transaction_time","type":"integer","nullable":true,"metadata":{}},{"name":"week_no","type":"integer","nullable":true,"metadata":{}},{"name":"coupon_discount","type":"float","nullable":true,"metadata":{}},{"name":"coupon_discount_match","type":"float","nullable":true,"metadata":{}}]},List(),Map(),Some(1694676659851)),Some(FileSizeHistogram(Vector(0, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304, 8388608, 12582912, 16777216, 20971520, 25165824, 29360128, 33554432, 37748736, 41943040, 50331648, 58720256, 67108864, 75497472, 83886080, 92274688, 100663296, 109051904, 117440512, 125829120, 130023424, 134217728, 138412032, 142606336, 146800640, 150994944, 167772160, 184549376, 201326592, 218103808, 234881024, 251658240, 268435456, 285212672, 301989888, 318767104, 335544320, 352321536, 369098752, 385875968, 402653184, 419430400, 436207616, 452984832, 469762048, 486539264, 503316480, 520093696, 536870912, 553648128, 570425344, 587202560, 603979776, 671088640, 738197504, 805306368, 872415232, 939524096, 1006632960, 1073741824, 1140850688, 1207959552, 1275068416, 1342177280, 1409286144, 1476395008, 1610612736, 1744830464, 1879048192, 2013265920, 2147483648, 2415919104, 2684354560, 2952790016, 3221225472, 3489660928, 3758096384, 4026531840, 4294967296, 8589934592, 17179869184, 34359738368, 68719476736, 137438953472, 274877906944),[J@15334883,[J@76dae8bb)),Some(b0819991-eddc-4afd-bd64-1591bc13547f),Some(List(AddFile(part-00000-dac72f33-722d-4e3f-9497-6046eeadaf78-c000.snappy.parquet,Map(),5283951,1695351297000,false,{"numRecords":672132,"minValues":{"household_id":1,"basket_id":26984851472,"day":1,"product_id":25671,"quantity":0,"sales_amount":0.0,"store_id":1,"discount_amount":-79.36,"transaction_time":0,"week_no":1,"coupon_discount":-29.99,"coupon_discount_match":-2.7},"maxValues":{"household_id":2500,"basket_id":30532627350,"day":240,"product_id":12949845,"quantity":85055,"sales_amount":505.0,"store_id":32124,"discount_amount":0.0,"transaction_time":2359,"week_no":35,"coupon_discount":0.0,"coupon_discount_match":0.0},"nullCount":{"household_id":0,"basket_id":0,"day":0,"product_id":0,"quantity":0,"sales_amount":0,"store_id":0,"discount_amount":0,"transaction_time":0,"week_no":0,"coupon_discount":0,"coupon_discount_match":0}},Map(INSERTION_TIME -> 1695351296000000, MIN_INSERTION_TIME -> 1695351296000000, MAX_INSERTION_TIME -> 1695351296000000, OPTIMIZE_TARGET_SIZE -> 268435456),null), AddFile(part-00003-8ff9238c-f34e-4d10-b70e-fcccd74a1e6d-c000.snappy.parquet,Map(),4537572,1695351296000,false,{"numRecords":587632,"minValues":{"household_id":1,"basket_id":40314850434,"day":568,"product_id":27160,"quantity":0,"sales_amount":0.0,"store_id":2,"discount_amount":-180.0,"transaction_time":0,"week_no":82,"coupon_discount":-31.46,"coupon_discount_match":-2.7},"maxValues":{"household_id":2500,"basket_id":42305362535,"day":711,"product_id":18316298,"quantity":45475,"sales_amount":631.8,"store_id":34280,"discount_amount":0.77,"transaction_time":2359,"week_no":102,"coupon_discount":0.0,"coupon_discount_match":0.0},"nullCount":{"household_id":0,"basket_id":0,"day":0,"product_id":0,"quantity":0,"sales_amount":0,"store_id":0,"discount_amount":0,"transaction_time":0,"week_no":0,"coupon_discount":0,"coupon_discount_match":0}},Map(INSERTION_TIME -> 1695351296000003, MIN_INSERTION_TIME -> 1695351296000003, MAX_INSERTION_TIME -> 1695351296000003, OPTIMIZE_TARGET_SIZE -> 268435456),null), AddFile(part-00002-618b4fff-77ad-4663-aaec-dbd5769515b1-c000.snappy.parquet,Map(),5238927,1695351296000,false,{"numRecords":667618,"minValues":{"household_id":1,"basket_id":32956680859,"day":401,"product_id":25671,"quantity":0,"sales_amount":0.0,"store_id":26,"discount_amount":-90.05,"transaction_time":0,"week_no":58,"coupon_discount":-37.93,"coupon_discount_match":-5.8},"maxValues":{"household_id":2500,"basket_id":40314850434,"day":568,"product_id":16809685,"quantity":89638,"sales_amount":329.99,"store_id":34016,"discount_amount":2.09,"transaction_time":2359,"week_no":82,"coupon_discount":0.0,"coupon_discount_match":0.0},"nullCount":{"household_id":0,"basket_id":0,"day":0,"product_id":0,"quantity":0,"sales_amount":0,"store_id":0,"discount_amount":0,"transaction_time":0,"week_no":0,"coupon_discount":0,"coupon_discount_match":0}},Map(INSERTION_TIME -> 1695351296000002, MIN_INSERTION_TIME -> 1695351296000002, MAX_INSERTION_TIME -> 1695351296000002, OPTIMIZE_TARGET_SIZE -> 268435456),null), AddFile(part-00001-894cd31a-620c-4f5e-9ea8-cb25d4193b6e-c000.snappy.parquet,Map(),5162399,1695351296000,false,{"numRecords":668350,"minValues":{"household_id":1,"basket_id":30532627350,"day":230,"product_id":25671,"quantity":0,"sales_amount":0.0,"store_id":2,"discount_amount":-129.98,"transaction_time":0,"week_no":34,"coupon_discount":-55.93,"coupon_discount_match":-7.7},"maxValues":{"household_id":2500,"basket_id":32956680859,"day":403,"product_id":14077546,"quantity":38348,"sales_amount":840.0,"store_id":33923,"discount_amount":3.99,"transaction_time":2359,"week_no":58,"coupon_discount":0.0,"coupon_discount_match":0.0},"nullCount":{"household_id":0,"basket_id":0,"day":0,"product_id":0,"quantity":0,"sales_amount":0,"store_id":0,"discount_amount":0,"transaction_time":0,"week_no":0,"coupon_discount":0,"coupon_discount_match":0}},Map(INSERTION_TIME -> 1695351296000001, MIN_INSERTION_TIME -> 1695351296000001, MAX_INSERTION_TIME -> 1695351296000001, OPTIMIZE_TARGET_SIZE -> 268435456),null)))))) -23/09/22 03:13:51 INFO ClusterLoadMonitor: Added query with execution ID:5. Current active queries:1 -23/09/22 03:13:51 INFO HiveMetaStore: 1: get_table : db=journey tbl=transactions -23/09/22 03:13:51 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=journey tbl=transactions -23/09/22 03:13:51 INFO HiveMetaStore: 1: get_table : db=journey tbl=transactions -23/09/22 03:13:51 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=journey tbl=transactions -23/09/22 03:13:51 INFO HiveClientImpl: Warehouse location for Hive client (version 0.13.1) is dbfs:/user/hive/warehouse -23/09/22 03:13:51 INFO HiveMetaStore: No user is added in admin role, since config is empty -23/09/22 03:13:51 INFO HiveMetaStore: 2: get_table : db=journey tbl=transactions -23/09/22 03:13:51 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=journey tbl=transactions -23/09/22 03:13:51 INFO HiveMetaStore: 2: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore -23/09/22 03:13:51 INFO ObjectStore: ObjectStore, initialize called -23/09/22 03:13:51 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:51 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:52 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:52 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:52 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:52 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:52 INFO ObjectStore: Initialized ObjectStore -23/09/22 03:13:52 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:52 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:52 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:52 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:52 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:52 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:52 INFO HiveMetaStore: 1: get_table : db=journey tbl=transactions -23/09/22 03:13:52 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=journey tbl=transactions -23/09/22 03:13:52 INFO HiveMetaStore: 1: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore -23/09/22 03:13:52 INFO ObjectStore: ObjectStore, initialize called -23/09/22 03:13:52 INFO ObjectStore: Initialized ObjectStore -23/09/22 03:13:52 INFO HiveMetaStore: 3: get_database: journey -23/09/22 03:13:52 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: journey -23/09/22 03:13:52 INFO HiveMetaStore: 3: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore -23/09/22 03:13:52 INFO ObjectStore: ObjectStore, initialize called -23/09/22 03:13:52 INFO ObjectStore: Initialized ObjectStore -23/09/22 03:13:52 INFO HiveMetaStore: 3: get_multi_table : db=journey tbls=transactions -23/09/22 03:13:52 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_multi_table : db=journey tbls=transactions -23/09/22 03:13:52 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:52 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:52 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:52 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:52 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:52 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:52 INFO ConsoleTransport: {"eventTime":"2023-09-22T03:13:51.932Z","producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent","eventType":"START","run":{"runId":"4d1903f6-f932-4e4c-a79c-ba66a376f72c","facets":{"spark.logicalPlan":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","plan":[{"class":"org.apache.spark.sql.execution.command.DropTableCommand","num-children":0,"tableName":{"product-class":"org.apache.spark.sql.catalyst.TableIdentifier","table":"transactions","database":"journey","catalog":"spark_catalog"},"ifExists":true,"isView":false,"purge":false,"materialized":false}]},"spark_version":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","spark-version":"3.3.0","openlineage-spark-version":"1.2.2"},"processing_engine":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet","version":"3.3.0","name":"spark","openlineageAdapterVersion":"1.2.2"}}},"job":{"namespace":"adb-5445974573286168.8#default","name":"adb-4679476628690204.4.azuredatabricks.net.execute_drop_table_command.silver_transactions","facets":{}},"inputs":[],"outputs":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/silver/transactions","facets":{"dataSource":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet","name":"wasbs://studio@clororetaildevadls.blob.core.windows.net","uri":"wasbs://studio@clororetaildevadls.blob.core.windows.net"},"symlinks":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet","identifiers":[{"namespace":"/examples/data/csv/completejourney/silver","name":"journey.transactions","type":"TABLE"}]},"lifecycleStateChange":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet","lifecycleStateChange":"DROP"}},"outputFacets":{}}]} -23/09/22 03:13:52 INFO HiveMetaStore: 1: get_table : db=journey tbl=transactions -23/09/22 03:13:52 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=journey tbl=transactions -23/09/22 03:13:52 INFO HiveMetaStore: 1: get_database: journey -23/09/22 03:13:52 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: journey -23/09/22 03:13:52 INFO HiveMetaStore: 1: get_table : db=journey tbl=transactions -23/09/22 03:13:52 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=journey tbl=transactions -23/09/22 03:13:52 INFO HiveMetaStore: 1: get_database: journey -23/09/22 03:13:52 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: journey -23/09/22 03:13:52 INFO HiveMetaStore: 1: get_table : db=journey tbl=transactions -23/09/22 03:13:52 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=journey tbl=transactions -23/09/22 03:13:52 INFO HiveMetaStore: 1: drop_table : db=journey tbl=transactions -23/09/22 03:13:52 INFO audit: ugi=root ip=unknown-ip-addr cmd=drop_table : db=journey tbl=transactions -23/09/22 03:13:52 INFO HiveMetaStore: 1: get_table : db=journey tbl=transactions -23/09/22 03:13:52 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=journey tbl=transactions -23/09/22 03:13:53 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:53 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:53 INFO ClusterLoadMonitor: Removed query with execution ID:5. Current active queries:0 -23/09/22 03:13:53 INFO HiveMetaStore: 2: get_table : db=journey tbl=transactions -23/09/22 03:13:53 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=journey tbl=transactions -23/09/22 03:13:53 INFO HiveMetaStore: 2: get_database: journey -23/09/22 03:13:53 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: journey -23/09/22 03:13:53 WARN DropTableCommandVisitor: Unable to find table by identifier `spark_catalog`.`journey`.`transactions` - Table or view 'transactions' not found in database 'journey' -23/09/22 03:13:53 INFO ConsoleTransport: {"eventTime":"2023-09-22T03:13:53.535Z","producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent","eventType":"COMPLETE","run":{"runId":"4d1903f6-f932-4e4c-a79c-ba66a376f72c","facets":{"spark.logicalPlan":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","plan":[{"class":"org.apache.spark.sql.execution.command.DropTableCommand","num-children":0,"tableName":{"product-class":"org.apache.spark.sql.catalyst.TableIdentifier","table":"transactions","database":"journey","catalog":"spark_catalog"},"ifExists":true,"isView":false,"purge":false,"materialized":false}]},"spark_version":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","spark-version":"3.3.0","openlineage-spark-version":"1.2.2"},"processing_engine":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet","version":"3.3.0","name":"spark","openlineageAdapterVersion":"1.2.2"}}},"job":{"namespace":"adb-5445974573286168.8#default","name":"adb-4679476628690204.4.azuredatabricks.net.execute_drop_table_command.silver_transactions","facets":{}},"inputs":[],"outputs":[]} -23/09/22 03:13:53 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:53 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:54 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:54 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:54 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:54 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:54 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:54 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:54 INFO InMemoryFileIndex: Start listing leaf files and directories. Size of Paths: 1; threshold: 32 -23/09/22 03:13:54 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:54 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:54 INFO InMemoryFileIndex: Start listing leaf files and directories. Size of Paths: 0; threshold: 32 -23/09/22 03:13:54 INFO InMemoryFileIndex: It took 126 ms to list leaf files for 1 paths. -23/09/22 03:13:55 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:55 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:55 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:55 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:55 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:55 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:55 INFO ClusterLoadMonitor: Added query with execution ID:6. Current active queries:1 -23/09/22 03:13:55 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:55 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:55 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:55 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:55 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:55 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:55 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:55 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:55 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:55 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:55 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:55 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:55 INFO DeltaLog: Loading version 16 starting from checkpoint version 10. -23/09/22 03:13:55 INFO ConsoleTransport: {"eventTime":"2023-09-22T03:13:55.374Z","producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent","eventType":"START","run":{"runId":"09b465e3-ef2c-452a-be68-6bcb8d01fe80","facets":{"spark.logicalPlan":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","plan":[{"class":"org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand","num-children":0,"query":[{"class":"org.apache.spark.sql.execution.datasources.LogicalRelation","num-children":0,"relation":null,"output":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"household_id","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":75,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"basket_id","dataType":"long","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":76,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"day","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":77,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"product_id","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":78,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"quantity","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":79,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"sales_amount","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":80,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"store_id","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":81,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"discount_amount","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":82,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"transaction_time","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":83,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"week_no","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":84,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"coupon_discount","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":85,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"coupon_discount_match","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":86,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}]],"isStreaming":false}],"dataSource":null,"options":null,"mode":null}]},"spark_version":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","spark-version":"3.3.0","openlineage-spark-version":"1.2.2"},"processing_engine":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet","version":"3.3.0","name":"spark","openlineageAdapterVersion":"1.2.2"}}},"job":{"namespace":"adb-5445974573286168.8#default","name":"adb-4679476628690204.4.azuredatabricks.net.execute_save_into_data_source_command.silver_transactions","facets":{}},"inputs":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","facets":{"dataSource":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet","name":"wasbs://studio@clororetaildevadls.blob.core.windows.net","uri":"wasbs://studio@clororetaildevadls.blob.core.windows.net"},"schema":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet","fields":[{"name":"household_id","type":"integer"},{"name":"basket_id","type":"long"},{"name":"day","type":"integer"},{"name":"product_id","type":"integer"},{"name":"quantity","type":"integer"},{"name":"sales_amount","type":"float"},{"name":"store_id","type":"integer"},{"name":"discount_amount","type":"float"},{"name":"transaction_time","type":"integer"},{"name":"week_no","type":"integer"},{"name":"coupon_discount","type":"float"},{"name":"coupon_discount_match","type":"float"}]}},"inputFacets":{}}],"outputs":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/silver/transactions","facets":{"dataSource":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet","name":"wasbs://studio@clororetaildevadls.blob.core.windows.net","uri":"wasbs://studio@clororetaildevadls.blob.core.windows.net"},"schema":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet","fields":[{"name":"household_id","type":"integer"},{"name":"basket_id","type":"long"},{"name":"day","type":"integer"},{"name":"product_id","type":"integer"},{"name":"quantity","type":"integer"},{"name":"sales_amount","type":"float"},{"name":"store_id","type":"integer"},{"name":"discount_amount","type":"float"},{"name":"transaction_time","type":"integer"},{"name":"week_no","type":"integer"},{"name":"coupon_discount","type":"float"},{"name":"coupon_discount_match","type":"float"}]},"columnLineage":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-1/ColumnLineageDatasetFacet.json#/$defs/ColumnLineageDatasetFacet","fields":{"household_id":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"household_id"}]},"basket_id":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"basket_id"}]},"day":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"day"}]},"product_id":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"product_id"}]},"quantity":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"quantity"}]},"sales_amount":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"sales_amount"}]},"store_id":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"store_id"}]},"discount_amount":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"discount_amount"}]},"transaction_time":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"transaction_time"}]},"week_no":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"week_no"}]},"coupon_discount":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"coupon_discount"}]},"coupon_discount_match":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"coupon_discount_match"}]}}},"lifecycleStateChange":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet","lifecycleStateChange":"OVERWRITE"}},"outputFacets":{}}]} -23/09/22 03:13:55 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:55 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:55 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:55 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:55 INFO SnapshotEdge: [tableId=92982fea-9dbe-4e68-848c-022fb5257783] Created snapshot SnapshotEdge(path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log, version=16, metadata=Metadata(e409515a-4f0e-4b35-908c-3a8c6591a14f,null,null,Format(parquet,Map()),{"type":"struct","fields":[{"name":"household_id","type":"integer","nullable":true,"metadata":{}},{"name":"basket_id","type":"long","nullable":true,"metadata":{}},{"name":"day","type":"integer","nullable":true,"metadata":{}},{"name":"product_id","type":"integer","nullable":true,"metadata":{}},{"name":"quantity","type":"integer","nullable":true,"metadata":{}},{"name":"sales_amount","type":"float","nullable":true,"metadata":{}},{"name":"store_id","type":"integer","nullable":true,"metadata":{}},{"name":"discount_amount","type":"float","nullable":true,"metadata":{}},{"name":"transaction_time","type":"integer","nullable":true,"metadata":{}},{"name":"week_no","type":"integer","nullable":true,"metadata":{}},{"name":"coupon_discount","type":"float","nullable":true,"metadata":{}},{"name":"coupon_discount_match","type":"float","nullable":true,"metadata":{}}]},List(),Map(),Some(1694676659851)), logSegment=LogSegment(wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log,16,WrappedArray(FileStatus{path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log/00000000000000000011.json; isDirectory=false; length=6616; replication=1; blocksize=536870912; modification_time=1695274264000; access_time=0; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}, FileStatus{path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log/00000000000000000012.json; isDirectory=false; length=11239; replication=1; blocksize=536870912; modification_time=1695274677000; access_time=0; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}, FileStatus{path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log/00000000000000000013.json; isDirectory=false; length=8080; replication=1; blocksize=536870912; modification_time=1695276655000; access_time=0; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}, FileStatus{path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log/00000000000000000014.json; isDirectory=false; length=6616; replication=1; blocksize=536870912; modification_time=1695346578000; access_time=0; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}, FileStatus{path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log/00000000000000000015.json; isDirectory=false; length=6616; replication=1; blocksize=536870912; modification_time=1695347164000; access_time=0; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}, FileStatus{path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log/00000000000000000016.json; isDirectory=false; length=6616; replication=1; blocksize=536870912; modification_time=1695351300000; access_time=0; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}),WrappedArray(FileStatus{path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions/_delta_log/00000000000000000010.checkpoint.parquet; isDirectory=false; length=34444; replication=1; blocksize=536870912; modification_time=1695273438000; access_time=0; owner=root; group=supergroup; permission=rw-r--r--; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}),Some(10),1695351300000), checksumOpt=Some(VersionChecksum(20222849,4,1,1,Protocol(1,2),Metadata(e409515a-4f0e-4b35-908c-3a8c6591a14f,null,null,Format(parquet,Map()),{"type":"struct","fields":[{"name":"household_id","type":"integer","nullable":true,"metadata":{}},{"name":"basket_id","type":"long","nullable":true,"metadata":{}},{"name":"day","type":"integer","nullable":true,"metadata":{}},{"name":"product_id","type":"integer","nullable":true,"metadata":{}},{"name":"quantity","type":"integer","nullable":true,"metadata":{}},{"name":"sales_amount","type":"float","nullable":true,"metadata":{}},{"name":"store_id","type":"integer","nullable":true,"metadata":{}},{"name":"discount_amount","type":"float","nullable":true,"metadata":{}},{"name":"transaction_time","type":"integer","nullable":true,"metadata":{}},{"name":"week_no","type":"integer","nullable":true,"metadata":{}},{"name":"coupon_discount","type":"float","nullable":true,"metadata":{}},{"name":"coupon_discount_match","type":"float","nullable":true,"metadata":{}}]},List(),Map(),Some(1694676659851)),Some(FileSizeHistogram(Vector(0, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304, 8388608, 12582912, 16777216, 20971520, 25165824, 29360128, 33554432, 37748736, 41943040, 50331648, 58720256, 67108864, 75497472, 83886080, 92274688, 100663296, 109051904, 117440512, 125829120, 130023424, 134217728, 138412032, 142606336, 146800640, 150994944, 167772160, 184549376, 201326592, 218103808, 234881024, 251658240, 268435456, 285212672, 301989888, 318767104, 335544320, 352321536, 369098752, 385875968, 402653184, 419430400, 436207616, 452984832, 469762048, 486539264, 503316480, 520093696, 536870912, 553648128, 570425344, 587202560, 603979776, 671088640, 738197504, 805306368, 872415232, 939524096, 1006632960, 1073741824, 1140850688, 1207959552, 1275068416, 1342177280, 1409286144, 1476395008, 1610612736, 1744830464, 1879048192, 2013265920, 2147483648, 2415919104, 2684354560, 2952790016, 3221225472, 3489660928, 3758096384, 4026531840, 4294967296, 8589934592, 17179869184, 34359738368, 68719476736, 137438953472, 274877906944),[J@4a157934,[J@aa60254)),Some(b0819991-eddc-4afd-bd64-1591bc13547f),Some(List(AddFile(part-00000-dac72f33-722d-4e3f-9497-6046eeadaf78-c000.snappy.parquet,Map(),5283951,1695351297000,false,{"numRecords":672132,"minValues":{"household_id":1,"basket_id":26984851472,"day":1,"product_id":25671,"quantity":0,"sales_amount":0.0,"store_id":1,"discount_amount":-79.36,"transaction_time":0,"week_no":1,"coupon_discount":-29.99,"coupon_discount_match":-2.7},"maxValues":{"household_id":2500,"basket_id":30532627350,"day":240,"product_id":12949845,"quantity":85055,"sales_amount":505.0,"store_id":32124,"discount_amount":0.0,"transaction_time":2359,"week_no":35,"coupon_discount":0.0,"coupon_discount_match":0.0},"nullCount":{"household_id":0,"basket_id":0,"day":0,"product_id":0,"quantity":0,"sales_amount":0,"store_id":0,"discount_amount":0,"transaction_time":0,"week_no":0,"coupon_discount":0,"coupon_discount_match":0}},Map(INSERTION_TIME -> 1695351296000000, MIN_INSERTION_TIME -> 1695351296000000, MAX_INSERTION_TIME -> 1695351296000000, OPTIMIZE_TARGET_SIZE -> 268435456),null), AddFile(part-00003-8ff9238c-f34e-4d10-b70e-fcccd74a1e6d-c000.snappy.parquet,Map(),4537572,1695351296000,false,{"numRecords":587632,"minValues":{"household_id":1,"basket_id":40314850434,"day":568,"product_id":27160,"quantity":0,"sales_amount":0.0,"store_id":2,"discount_amount":-180.0,"transaction_time":0,"week_no":82,"coupon_discount":-31.46,"coupon_discount_match":-2.7},"maxValues":{"household_id":2500,"basket_id":42305362535,"day":711,"product_id":18316298,"quantity":45475,"sales_amount":631.8,"store_id":34280,"discount_amount":0.77,"transaction_time":2359,"week_no":102,"coupon_discount":0.0,"coupon_discount_match":0.0},"nullCount":{"household_id":0,"basket_id":0,"day":0,"product_id":0,"quantity":0,"sales_amount":0,"store_id":0,"discount_amount":0,"transaction_time":0,"week_no":0,"coupon_discount":0,"coupon_discount_match":0}},Map(INSERTION_TIME -> 1695351296000003, MIN_INSERTION_TIME -> 1695351296000003, MAX_INSERTION_TIME -> 1695351296000003, OPTIMIZE_TARGET_SIZE -> 268435456),null), AddFile(part-00002-618b4fff-77ad-4663-aaec-dbd5769515b1-c000.snappy.parquet,Map(),5238927,1695351296000,false,{"numRecords":667618,"minValues":{"household_id":1,"basket_id":32956680859,"day":401,"product_id":25671,"quantity":0,"sales_amount":0.0,"store_id":26,"discount_amount":-90.05,"transaction_time":0,"week_no":58,"coupon_discount":-37.93,"coupon_discount_match":-5.8},"maxValues":{"household_id":2500,"basket_id":40314850434,"day":568,"product_id":16809685,"quantity":89638,"sales_amount":329.99,"store_id":34016,"discount_amount":2.09,"transaction_time":2359,"week_no":82,"coupon_discount":0.0,"coupon_discount_match":0.0},"nullCount":{"household_id":0,"basket_id":0,"day":0,"product_id":0,"quantity":0,"sales_amount":0,"store_id":0,"discount_amount":0,"transaction_time":0,"week_no":0,"coupon_discount":0,"coupon_discount_match":0}},Map(INSERTION_TIME -> 1695351296000002, MIN_INSERTION_TIME -> 1695351296000002, MAX_INSERTION_TIME -> 1695351296000002, OPTIMIZE_TARGET_SIZE -> 268435456),null), AddFile(part-00001-894cd31a-620c-4f5e-9ea8-cb25d4193b6e-c000.snappy.parquet,Map(),5162399,1695351296000,false,{"numRecords":668350,"minValues":{"household_id":1,"basket_id":30532627350,"day":230,"product_id":25671,"quantity":0,"sales_amount":0.0,"store_id":2,"discount_amount":-129.98,"transaction_time":0,"week_no":34,"coupon_discount":-55.93,"coupon_discount_match":-7.7},"maxValues":{"household_id":2500,"basket_id":32956680859,"day":403,"product_id":14077546,"quantity":38348,"sales_amount":840.0,"store_id":33923,"discount_amount":3.99,"transaction_time":2359,"week_no":58,"coupon_discount":0.0,"coupon_discount_match":0.0},"nullCount":{"household_id":0,"basket_id":0,"day":0,"product_id":0,"quantity":0,"sales_amount":0,"store_id":0,"discount_amount":0,"transaction_time":0,"week_no":0,"coupon_discount":0,"coupon_discount_match":0}},Map(INSERTION_TIME -> 1695351296000001, MIN_INSERTION_TIME -> 1695351296000001, MAX_INSERTION_TIME -> 1695351296000001, OPTIMIZE_TARGET_SIZE -> 268435456),null)))))) -23/09/22 03:13:55 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:55 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:55 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:55 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:56 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 398.7 KiB, free 3.3 GiB) -23/09/22 03:13:56 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 151.9 KiB, free 3.3 GiB) -23/09/22 03:13:56 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 18.1 KiB, free 3.3 GiB) -23/09/22 03:13:56 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.11.115.134:44293 (size: 18.1 KiB, free: 3.3 GiB) -23/09/22 03:13:56 INFO SparkContext: Created broadcast 1 from writeExternal at ObjectOutputStream.java:1459 -23/09/22 03:13:56 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.8 KiB, free 3.3 GiB) -23/09/22 03:13:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.11.115.134:44293 (size: 13.8 KiB, free: 3.3 GiB) -23/09/22 03:13:56 INFO SparkContext: Created broadcast 0 from broadcast at Snapshot.scala:119 -23/09/22 03:13:56 INFO DeltaLogFileIndex: Created DeltaLogFileIndex(Parquet, numFilesInSegment: 1, totalFileSize: 34444) -23/09/22 03:13:56 INFO DeltaLogFileIndex: Created DeltaLogFileIndex(JSON, numFilesInSegment: 6, totalFileSize: 45783) -23/09/22 03:13:56 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:56 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:56 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:56 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:56 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:56 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:56 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:56 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:56 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:56 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:56 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:56 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:56 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:56 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:56 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:56 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:56 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:56 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:56 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:56 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:56 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:56 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:56 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:13:56 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:13:56 INFO ClusterLoadAvgHelper: Current cluster load: 1, Old Ema: 0.0, New Ema: 1.0 -23/09/22 03:13:58 INFO CodeGenerator: Code generated in 12.782887 ms -23/09/22 03:13:58 INFO CodeGenerator: Code generated in 28.714197 ms -23/09/22 03:13:58 INFO CodeGenerator: Code generated in 284.497949 ms -23/09/22 03:13:58 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 31.6 KiB, free 3.3 GiB) -23/09/22 03:13:58 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 14.1 KiB, free 3.3 GiB) -23/09/22 03:13:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.11.115.134:44293 (size: 14.1 KiB, free: 3.3 GiB) -23/09/22 03:13:58 INFO SparkContext: Created broadcast 2 from toRdd at StateCache.scala:61 -23/09/22 03:13:58 INFO CodeGenerator: Code generated in 140.11686 ms -23/09/22 03:13:58 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 38.5 KiB, free 3.3 GiB) -23/09/22 03:13:58 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 12.2 KiB, free 3.3 GiB) -23/09/22 03:13:58 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.11.115.134:44293 (size: 12.2 KiB, free: 3.3 GiB) -23/09/22 03:13:58 INFO SparkContext: Created broadcast 3 from toRdd at StateCache.scala:61 -23/09/22 03:13:58 INFO FileSourceStrategy: Pushed Filters: -23/09/22 03:13:58 INFO FileSourceStrategy: Post-Scan Filters: -23/09/22 03:13:58 INFO FileSourceStrategy: Output Data Schema: struct, add: struct, size: bigint, modificationTime: bigint, dataChange: boolean ... 6 more fields>, remove: struct ... 6 more fields>, metaData: struct>, schemaString: string ... 6 more fields>, protocol: struct, writerFeatures: array ... 2 more fields> ... 5 more fields> -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO NativeAzureFileSystem: WASB Filesystem wasbs://studio@clororetaildevadls.blob.core.windows.net is closed with isClosed = false -23/09/22 03:13:59 INFO ClusterLoadAvgHelper: Current cluster load: 1, Old Ema: 1.0, New Ema: 1.0 -23/09/22 03:13:59 INFO CodeGenerator: Code generated in 112.834173 ms -23/09/22 03:13:59 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 491.3 KiB, free 3.3 GiB) -23/09/22 03:13:59 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 17.7 KiB, free 3.3 GiB) -23/09/22 03:13:59 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 10.11.115.134:44293 (size: 17.7 KiB, free: 3.3 GiB) -23/09/22 03:13:59 INFO SparkContext: Created broadcast 4 from $anonfun$withThreadLocalCaptured$1 at CompletableFuture.java:1604 -23/09/22 03:13:59 INFO FileSourceScanExec: Planning scan with bin packing, max split size: 134217728 bytes, max partition size: 4194304, open cost is considered as scanning 4194304 bytes. -23/09/22 03:13:59 INFO CodeGenerator: Code generated in 50.220344 ms -23/09/22 03:14:00 INFO DAGScheduler: Registering RDD 6 ($anonfun$withThreadLocalCaptured$1 at CompletableFuture.java:1604) as input to shuffle 0 -23/09/22 03:14:00 INFO DAGScheduler: Got map stage job 0 ($anonfun$withThreadLocalCaptured$1 at CompletableFuture.java:1604) with 5 output partitions -23/09/22 03:14:00 INFO DAGScheduler: Final stage: ShuffleMapStage 0 ($anonfun$withThreadLocalCaptured$1 at CompletableFuture.java:1604) -23/09/22 03:14:00 INFO DAGScheduler: Parents of final stage: List() -23/09/22 03:14:00 INFO DAGScheduler: Missing parents: List() -23/09/22 03:14:00 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:14:00 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:14:00 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[6] at $anonfun$withThreadLocalCaptured$1 at CompletableFuture.java:1604), which has no missing parents -23/09/22 03:14:00 INFO DAGScheduler: Jars for session None: Map() -23/09/22 03:14:00 INFO DAGScheduler: Files for session None: Map() -23/09/22 03:14:00 INFO DAGScheduler: Archives for session None: Map() -23/09/22 03:14:00 INFO DAGScheduler: Submitting 5 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[6] at $anonfun$withThreadLocalCaptured$1 at CompletableFuture.java:1604) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4)) -23/09/22 03:14:00 INFO TaskSchedulerImpl: Adding task set 0.0 with 5 tasks resource profile 0 -23/09/22 03:14:00 INFO ConsoleTransport: {"eventTime":"2023-09-22T03:14:00.237Z","producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent","eventType":"START","run":{"runId":"09b465e3-ef2c-452a-be68-6bcb8d01fe80","facets":{"spark.logicalPlan":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","plan":[{"class":"org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand","num-children":0,"query":[{"class":"org.apache.spark.sql.execution.datasources.LogicalRelation","num-children":0,"relation":null,"output":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"household_id","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":75,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"basket_id","dataType":"long","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":76,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"day","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":77,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"product_id","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":78,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"quantity","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":79,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"sales_amount","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":80,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"store_id","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":81,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"discount_amount","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":82,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"transaction_time","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":83,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"week_no","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":84,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"coupon_discount","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":85,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"coupon_discount_match","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":86,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}]],"isStreaming":false}],"dataSource":null,"options":null,"mode":null}]},"spark_version":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","spark-version":"3.3.0","openlineage-spark-version":"1.2.2"},"spark_properties":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","properties":{"spark.master":"spark://10.11.115.134:7077","spark.app.name":"Databricks Shell"}},"processing_engine":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet","version":"3.3.0","name":"spark","openlineageAdapterVersion":"1.2.2"},"environment-properties":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","environment-properties":{"spark.databricks.clusterUsageTags.clusterName":"jason.yip@tredence.com's Cluster","spark.databricks.clusterUsageTags.azureSubscriptionId":"a4f54399-8db8-4849-adcc-a42aed1fb97f","spark.databricks.notebook.path":"/Repos/jason.yip@tredence.com/segmentation/01_Data Prep","mountPoints":[{"mountPoint":"/databricks-datasets","source":"databricks-datasets"},{"mountPoint":"/Volumes","source":"UnityCatalogVolumes"},{"mountPoint":"/databricks/mlflow-tracking","source":"databricks/mlflow-tracking"},{"mountPoint":"/databricks-results","source":"databricks-results"},{"mountPoint":"/databricks/mlflow-registry","source":"databricks/mlflow-registry"},{"mountPoint":"/Volume","source":"DbfsReserved"},{"mountPoint":"/volumes","source":"DbfsReserved"},{"mountPoint":"/","source":"DatabricksRoot"},{"mountPoint":"/volume","source":"DbfsReserved"}],"spark.databricks.clusterUsageTags.clusterAllTags":"[{\"key\":\"Vendor\",\"value\":\"Databricks\"},{\"key\":\"Creator\",\"value\":\"jason.yip@tredence.com\"},{\"key\":\"ClusterName\",\"value\":\"jason.yip@tredence.com's Cluster\"},{\"key\":\"ClusterId\",\"value\":\"0808-055325-43kdx9a4\"},{\"key\":\"Environment\",\"value\":\"POC\"},{\"key\":\"Project\",\"value\":\"SI\"},{\"key\":\"DatabricksEnvironment\",\"value\":\"workerenv-4679476628690204\"}]","spark.databricks.clusterUsageTags.clusterOwnerOrgId":"4679476628690204","user":"jason.yip@tredence.com","userId":"4768657035718622","orgId":"4679476628690204"}}}},"job":{"namespace":"adb-5445974573286168.8#default","name":"adb-4679476628690204.4.azuredatabricks.net.execute_save_into_data_source_command.silver_transactions","facets":{}},"inputs":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","facets":{"dataSource":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet","name":"wasbs://studio@clororetaildevadls.blob.core.windows.net","uri":"wasbs://studio@clororetaildevadls.blob.core.windows.net"},"schema":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet","fields":[{"name":"household_id","type":"integer"},{"name":"basket_id","type":"long"},{"name":"day","type":"integer"},{"name":"product_id","type":"integer"},{"name":"quantity","type":"integer"},{"name":"sales_amount","type":"float"},{"name":"store_id","type":"integer"},{"name":"discount_amount","type":"float"},{"name":"transaction_time","type":"integer"},{"name":"week_no","type":"integer"},{"name":"coupon_discount","type":"float"},{"name":"coupon_discount_match","type":"float"}]}},"inputFacets":{}}],"outputs":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/silver/transactions","facets":{"dataSource":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet","name":"wasbs://studio@clororetaildevadls.blob.core.windows.net","uri":"wasbs://studio@clororetaildevadls.blob.core.windows.net"},"schema":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet","fields":[{"name":"household_id","type":"integer"},{"name":"basket_id","type":"long"},{"name":"day","type":"integer"},{"name":"product_id","type":"integer"},{"name":"quantity","type":"integer"},{"name":"sales_amount","type":"float"},{"name":"store_id","type":"integer"},{"name":"discount_amount","type":"float"},{"name":"transaction_time","type":"integer"},{"name":"week_no","type":"integer"},{"name":"coupon_discount","type":"float"},{"name":"coupon_discount_match","type":"float"}]},"columnLineage":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-1/ColumnLineageDatasetFacet.json#/$defs/ColumnLineageDatasetFacet","fields":{"household_id":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"household_id"}]},"basket_id":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"basket_id"}]},"day":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"day"}]},"product_id":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"product_id"}]},"quantity":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"quantity"}]},"sales_amount":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"sales_amount"}]},"store_id":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"store_id"}]},"discount_amount":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"discount_amount"}]},"transaction_time":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"transaction_time"}]},"week_no":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"week_no"}]},"coupon_discount":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"coupon_discount"}]},"coupon_discount_match":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"coupon_discount_match"}]}}},"lifecycleStateChange":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet","lifecycleStateChange":"OVERWRITE"}},"outputFacets":{}}]} -23/09/22 03:14:00 WARN FairSchedulableBuilder: A job was submitted with scheduler pool 2908305457167067998, which has not been configured. This can happen when the file that pools are read from isn't set, or when that file doesn't contain 2908305457167067998. Created 2908305457167067998 with default configuration (schedulingMode: FIFO, minShare: 0, weight: 1) -23/09/22 03:14:00 INFO FairSchedulableBuilder: Added task set TaskSet_0.0 tasks to pool 2908305457167067998 -23/09/22 03:14:00 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (10.11.115.133, executor 0, partition 0, PROCESS_LOCAL, taskResourceAssignments Map()) -23/09/22 03:14:00 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1) (10.11.115.133, executor 0, partition 1, PROCESS_LOCAL, taskResourceAssignments Map()) -23/09/22 03:14:00 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2) (10.11.115.133, executor 0, partition 2, PROCESS_LOCAL, taskResourceAssignments Map()) -23/09/22 03:14:00 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3) (10.11.115.133, executor 0, partition 3, PROCESS_LOCAL, taskResourceAssignments Map()) -23/09/22 03:14:00 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 275.9 KiB, free 3.3 GiB) -23/09/22 03:14:00 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 75.6 KiB, free 3.3 GiB) -23/09/22 03:14:00 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 10.11.115.134:44293 (size: 75.6 KiB, free: 3.3 GiB) -23/09/22 03:14:00 INFO SparkContext: Created broadcast 5 from broadcast at TaskSetManager.scala:622 -23/09/22 03:14:01 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 10.11.115.133:45037 (size: 75.6 KiB, free: 3.6 GiB) -23/09/22 03:14:01 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.11.115.133:45037 (size: 18.1 KiB, free: 3.6 GiB) -23/09/22 03:14:02 INFO ClusterLoadAvgHelper: Current cluster load: 1, Old Ema: 1.0, New Ema: 1.0 -23/09/22 03:14:02 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4) (10.11.115.133, executor 0, partition 4, PROCESS_LOCAL, taskResourceAssignments Map()) -23/09/22 03:14:02 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 2074 ms on 10.11.115.133 (executor 0) (1/5) -23/09/22 03:14:02 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 2080 ms on 10.11.115.133 (executor 0) (2/5) -23/09/22 03:14:02 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 2081 ms on 10.11.115.133 (executor 0) (3/5) -23/09/22 03:14:02 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 37 ms on 10.11.115.133 (executor 0) (4/5) -23/09/22 03:14:02 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 10.11.115.133:45037 (size: 17.7 KiB, free: 3.6 GiB) -23/09/22 03:14:05 INFO ClusterLoadAvgHelper: Current cluster load: 1, Old Ema: 1.0, New Ema: 1.0 -23/09/22 03:14:08 INFO ClusterLoadAvgHelper: Current cluster load: 1, Old Ema: 1.0, New Ema: 1.0 -23/09/22 03:14:08 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.11.115.133:45037 (size: 13.8 KiB, free: 3.6 GiB) -23/09/22 03:14:09 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 8252 ms on 10.11.115.133 (executor 0) (5/5) -23/09/22 03:14:09 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 2908305457167067998 -23/09/22 03:14:09 INFO DAGScheduler: ShuffleMapStage 0 ($anonfun$withThreadLocalCaptured$1 at CompletableFuture.java:1604) finished in 8.708 s -23/09/22 03:14:09 INFO DAGScheduler: looking for newly runnable stages -23/09/22 03:14:09 INFO DAGScheduler: running: Set() -23/09/22 03:14:09 INFO DAGScheduler: waiting: Set() -23/09/22 03:14:09 INFO DAGScheduler: failed: Set() -23/09/22 03:14:09 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:14:09 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:14:09 INFO CodeGenerator: Code generated in 104.675309 ms -23/09/22 03:14:09 INFO ConsoleTransport: {"eventTime":"2023-09-22T03:14:09.058Z","producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent","eventType":"COMPLETE","run":{"runId":"09b465e3-ef2c-452a-be68-6bcb8d01fe80","facets":{"spark.logicalPlan":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","plan":[{"class":"org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand","num-children":0,"query":[{"class":"org.apache.spark.sql.execution.datasources.LogicalRelation","num-children":0,"relation":null,"output":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"household_id","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":75,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"basket_id","dataType":"long","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":76,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"day","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":77,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"product_id","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":78,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"quantity","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":79,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"sales_amount","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":80,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"store_id","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":81,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"discount_amount","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":82,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"transaction_time","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":83,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"week_no","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":84,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"coupon_discount","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":85,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"coupon_discount_match","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":86,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}]],"isStreaming":false}],"dataSource":null,"options":null,"mode":null}]},"spark_version":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","spark-version":"3.3.0","openlineage-spark-version":"1.2.2"},"processing_engine":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet","version":"3.3.0","name":"spark","openlineageAdapterVersion":"1.2.2"}}},"job":{"namespace":"adb-5445974573286168.8#default","name":"adb-4679476628690204.4.azuredatabricks.net.execute_save_into_data_source_command.silver_transactions","facets":{}},"inputs":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","facets":{"dataSource":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet","name":"wasbs://studio@clororetaildevadls.blob.core.windows.net","uri":"wasbs://studio@clororetaildevadls.blob.core.windows.net"},"schema":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet","fields":[{"name":"household_id","type":"integer"},{"name":"basket_id","type":"long"},{"name":"day","type":"integer"},{"name":"product_id","type":"integer"},{"name":"quantity","type":"integer"},{"name":"sales_amount","type":"float"},{"name":"store_id","type":"integer"},{"name":"discount_amount","type":"float"},{"name":"transaction_time","type":"integer"},{"name":"week_no","type":"integer"},{"name":"coupon_discount","type":"float"},{"name":"coupon_discount_match","type":"float"}]}},"inputFacets":{}}],"outputs":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/silver/transactions","facets":{"dataSource":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet","name":"wasbs://studio@clororetaildevadls.blob.core.windows.net","uri":"wasbs://studio@clororetaildevadls.blob.core.windows.net"},"schema":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet","fields":[{"name":"household_id","type":"integer"},{"name":"basket_id","type":"long"},{"name":"day","type":"integer"},{"name":"product_id","type":"integer"},{"name":"quantity","type":"integer"},{"name":"sales_amount","type":"float"},{"name":"store_id","type":"integer"},{"name":"discount_amount","type":"float"},{"name":"transaction_time","type":"integer"},{"name":"week_no","type":"integer"},{"name":"coupon_discount","type":"float"},{"name":"coupon_discount_match","type":"float"}]},"columnLineage":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-1/ColumnLineageDatasetFacet.json#/$defs/ColumnLineageDatasetFacet","fields":{"household_id":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"household_id"}]},"basket_id":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"basket_id"}]},"day":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"day"}]},"product_id":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"product_id"}]},"quantity":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"quantity"}]},"sales_amount":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"sales_amount"}]},"store_id":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"store_id"}]},"discount_amount":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"discount_amount"}]},"transaction_time":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"transaction_time"}]},"week_no":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"week_no"}]},"coupon_discount":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"coupon_discount"}]},"coupon_discount_match":{"inputFields":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","field":"coupon_discount_match"}]}}},"lifecycleStateChange":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet","lifecycleStateChange":"OVERWRITE"}},"outputFacets":{"outputStatistics":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/OutputStatisticsOutputDatasetFacet.json#/$defs/OutputStatisticsOutputDatasetFacet","rowCount":0,"size":0}}}]} -23/09/22 03:14:09 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart -23/09/22 03:14:09 INFO CodeGenerator: Code generated in 59.190001 ms -23/09/22 03:14:10 INFO DAGScheduler: Registering RDD 16 ($anonfun$withThreadLocalCaptured$1 at CompletableFuture.java:1604) as input to shuffle 1 -23/09/22 03:14:10 INFO DAGScheduler: Got map stage job 1 ($anonfun$withThreadLocalCaptured$1 at CompletableFuture.java:1604) with 1 output partitions -23/09/22 03:14:10 INFO DAGScheduler: Final stage: ShuffleMapStage 2 ($anonfun$withThreadLocalCaptured$1 at CompletableFuture.java:1604) -23/09/22 03:14:10 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 1) -23/09/22 03:14:10 INFO DAGScheduler: Missing parents: List() -23/09/22 03:14:10 INFO DAGScheduler: Submitting ShuffleMapStage 2 (MapPartitionsRDD[16] at $anonfun$withThreadLocalCaptured$1 at CompletableFuture.java:1604), which has no missing parents -23/09/22 03:14:10 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerJobStart -23/09/22 03:14:10 INFO DAGScheduler: Jars for session None: Map() -23/09/22 03:14:10 INFO DAGScheduler: Files for session None: Map() -23/09/22 03:14:10 INFO DAGScheduler: Archives for session None: Map() -23/09/22 03:14:10 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 2 (MapPartitionsRDD[16] at $anonfun$withThreadLocalCaptured$1 at CompletableFuture.java:1604) (first 15 tasks are for partitions Vector(0)) -23/09/22 03:14:10 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks resource profile 0 -23/09/22 03:14:10 INFO FairSchedulableBuilder: Added task set TaskSet_2.0 tasks to pool 2908305457167067998 -23/09/22 03:14:10 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 5) (10.11.115.133, executor 0, partition 0, PROCESS_LOCAL, taskResourceAssignments Map()) -23/09/22 03:14:10 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 417.6 KiB, free 3.3 GiB) -23/09/22 03:14:10 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 113.2 KiB, free 3.3 GiB) -23/09/22 03:14:10 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on 10.11.115.134:44293 (size: 113.2 KiB, free: 3.3 GiB) -23/09/22 03:14:10 INFO SparkContext: Created broadcast 6 from broadcast at TaskSetManager.scala:622 -23/09/22 03:14:10 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on 10.11.115.133:45037 (size: 113.2 KiB, free: 3.6 GiB) -23/09/22 03:14:10 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.11.115.133:57974 -23/09/22 03:14:11 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.11.115.133:45037 (size: 12.2 KiB, free: 3.6 GiB) -23/09/22 03:14:11 INFO ClusterLoadAvgHelper: Current cluster load: 1, Old Ema: 1.0, New Ema: 1.0 -23/09/22 03:14:11 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.11.115.133:45037 (size: 14.1 KiB, free: 3.6 GiB) -23/09/22 03:14:11 INFO BlockManagerInfo: Added rdd_13_0 in memory on 10.11.115.133:45037 (size: 5.0 KiB, free: 3.6 GiB) -23/09/22 03:14:12 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 5) in 2488 ms on 10.11.115.133 (executor 0) (1/1) -23/09/22 03:14:12 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 2908305457167067998 -23/09/22 03:14:12 INFO DAGScheduler: ShuffleMapStage 2 ($anonfun$withThreadLocalCaptured$1 at CompletableFuture.java:1604) finished in 2.872 s -23/09/22 03:14:12 INFO DAGScheduler: looking for newly runnable stages -23/09/22 03:14:12 INFO DAGScheduler: running: Set() -23/09/22 03:14:12 INFO DAGScheduler: waiting: Set() -23/09/22 03:14:12 INFO DAGScheduler: failed: Set() -23/09/22 03:14:12 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerJobEnd -23/09/22 03:14:12 INFO SparkContext: Starting job: first at Snapshot.scala:238 -23/09/22 03:14:12 INFO DAGScheduler: Got job 2 (first at Snapshot.scala:238) with 1 output partitions -23/09/22 03:14:12 INFO DAGScheduler: Final stage: ResultStage 5 (first at Snapshot.scala:238) -23/09/22 03:14:12 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 4) -23/09/22 03:14:12 INFO DAGScheduler: Missing parents: List() -23/09/22 03:14:12 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerJobStart -23/09/22 03:14:12 INFO DAGScheduler: Submitting ResultStage 5 (MapPartitionsRDD[18] at first at Snapshot.scala:238), which has no missing parents -23/09/22 03:14:13 INFO DAGScheduler: Jars for session None: Map() -23/09/22 03:14:13 INFO DAGScheduler: Files for session None: Map() -23/09/22 03:14:13 INFO DAGScheduler: Archives for session None: Map() -23/09/22 03:14:13 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 5 (MapPartitionsRDD[18] at first at Snapshot.scala:238) (first 15 tasks are for partitions Vector(0)) -23/09/22 03:14:13 INFO TaskSchedulerImpl: Adding task set 5.0 with 1 tasks resource profile 0 -23/09/22 03:14:13 INFO FairSchedulableBuilder: Added task set TaskSet_5.0 tasks to pool 2908305457167067998 -23/09/22 03:14:13 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID 6) (10.11.115.133, executor 0, partition 0, PROCESS_LOCAL, taskResourceAssignments Map()) -23/09/22 03:14:13 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 363.2 KiB, free 3.3 GiB) -23/09/22 03:14:13 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 104.0 KiB, free 3.3 GiB) -23/09/22 03:14:13 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on 10.11.115.134:44293 (size: 104.0 KiB, free: 3.3 GiB) -23/09/22 03:14:13 INFO SparkContext: Created broadcast 7 from broadcast at TaskSetManager.scala:622 -23/09/22 03:14:13 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on 10.11.115.133:45037 (size: 104.0 KiB, free: 3.6 GiB) -23/09/22 03:14:13 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 10.11.115.133:57974 -23/09/22 03:14:14 INFO ClusterLoadAvgHelper: Current cluster load: 1, Old Ema: 1.0, New Ema: 1.0 -23/09/22 03:14:15 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID 6) in 2487 ms on 10.11.115.133 (executor 0) (1/1) -23/09/22 03:14:15 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool 2908305457167067998 -23/09/22 03:14:15 INFO DAGScheduler: ResultStage 5 (first at Snapshot.scala:238) finished in 2.554 s -23/09/22 03:14:15 INFO DAGScheduler: Job 2 is finished. Cancelling potential speculative or zombie tasks for this job -23/09/22 03:14:15 INFO TaskSchedulerImpl: Killing all running tasks in stage 5: Stage finished -23/09/22 03:14:15 INFO DAGScheduler: Job 2 finished: first at Snapshot.scala:238, took 2.576761 s -23/09/22 03:14:15 INFO CodeGenerator: Code generated in 44.3349 ms -23/09/22 03:14:15 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd -23/09/22 03:14:15 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:14:15 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:14:15 INFO FileSourceStrategy: Pushed Filters: -23/09/22 03:14:15 INFO FileSourceStrategy: Post-Scan Filters: -23/09/22 03:14:15 INFO FileSourceStrategy: Output Data Schema: struct -23/09/22 03:14:15 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:14:15 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:14:16 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:14:16 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:14:16 INFO DeltaParquetFileFormat: Using user defined output committer for Parquet: org.apache.spark.sql.parquet.DirectParquetOutputCommitter -23/09/22 03:14:16 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 405.9 KiB, free 3.3 GiB) -23/09/22 03:14:16 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 14.5 KiB, free 3.3 GiB) -23/09/22 03:14:16 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on 10.11.115.134:44293 (size: 14.5 KiB, free: 3.3 GiB) -23/09/22 03:14:16 INFO SparkContext: Created broadcast 8 from execute at DeltaInvariantCheckerExec.scala:74 -23/09/22 03:14:16 INFO FileSourceScanExec: Planning scan with bin packing, max split size: 36484162 bytes, max partition size: 36484162, open cost is considered as scanning 4194304 bytes. -23/09/22 03:14:16 INFO ConsoleTransport: {"eventTime":"2023-09-22T03:14:15.962Z","producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent","eventType":"START","run":{"runId":"ee45de3c-b839-4347-92cc-89f766d073c3","facets":{"spark.logicalPlan":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","plan":[{"class":"org.apache.spark.sql.execution.datasources.LogicalRelation","num-children":0,"relation":null,"output":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"household_id","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":75,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"basket_id","dataType":"long","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":76,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"day","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":77,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"product_id","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":78,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"quantity","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":79,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"sales_amount","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":80,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"store_id","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":81,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"discount_amount","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":82,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"transaction_time","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":83,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"week_no","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":84,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"coupon_discount","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":85,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"coupon_discount_match","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":86,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}]],"isStreaming":false}]},"spark_version":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","spark-version":"3.3.0","openlineage-spark-version":"1.2.2"},"processing_engine":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet","version":"3.3.0","name":"spark","openlineageAdapterVersion":"1.2.2"}}},"job":{"namespace":"adb-5445974573286168.8#default","name":"adb-4679476628690204.4.azuredatabricks.net.scan_csv ","facets":{}},"inputs":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","facets":{"dataSource":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet","name":"wasbs://studio@clororetaildevadls.blob.core.windows.net","uri":"wasbs://studio@clororetaildevadls.blob.core.windows.net"},"schema":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet","fields":[{"name":"household_id","type":"integer"},{"name":"basket_id","type":"long"},{"name":"day","type":"integer"},{"name":"product_id","type":"integer"},{"name":"quantity","type":"integer"},{"name":"sales_amount","type":"float"},{"name":"store_id","type":"integer"},{"name":"discount_amount","type":"float"},{"name":"transaction_time","type":"integer"},{"name":"week_no","type":"integer"},{"name":"coupon_discount","type":"float"},{"name":"coupon_discount_match","type":"float"}]}},"inputFacets":{}}],"outputs":[]} -23/09/22 03:14:16 INFO SparkContext: Starting job: write at WriteIntoDeltaCommand.scala:70 -23/09/22 03:14:16 INFO DAGScheduler: Got job 3 (write at WriteIntoDeltaCommand.scala:70) with 4 output partitions -23/09/22 03:14:16 INFO DAGScheduler: Final stage: ResultStage 6 (write at WriteIntoDeltaCommand.scala:70) -23/09/22 03:14:16 INFO DAGScheduler: Parents of final stage: List() -23/09/22 03:14:16 INFO DAGScheduler: Missing parents: List() -23/09/22 03:14:16 INFO DAGScheduler: Submitting ResultStage 6 (MapPartitionsRDD[20] at execute at DeltaInvariantCheckerExec.scala:74), which has no missing parents -23/09/22 03:14:16 INFO AzureNativeFileSystemStore: URI scheme: wasbs, using https for connections -23/09/22 03:14:16 INFO NativeAzureFileSystem: Delete with limit configurations: deleteFileCountLimitEnabled=false, deleteFileCountLimit=-1 -23/09/22 03:14:16 INFO DAGScheduler: Jars for session None: Map() -23/09/22 03:14:16 INFO DAGScheduler: Files for session None: Map() -23/09/22 03:14:16 INFO DAGScheduler: Archives for session None: Map() -23/09/22 03:14:16 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 6 (MapPartitionsRDD[20] at execute at DeltaInvariantCheckerExec.scala:74) (first 15 tasks are for partitions Vector(0, 1, 2, 3)) -23/09/22 03:14:16 INFO TaskSchedulerImpl: Adding task set 6.0 with 4 tasks resource profile 0 -23/09/22 03:14:16 INFO FairSchedulableBuilder: Added task set TaskSet_6.0 tasks to pool 2908305457167067998 -23/09/22 03:14:16 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID 7) (10.11.115.133, executor 0, partition 0, PROCESS_LOCAL, taskResourceAssignments Map()) -23/09/22 03:14:16 INFO TaskSetManager: Starting task 1.0 in stage 6.0 (TID 8) (10.11.115.133, executor 0, partition 1, PROCESS_LOCAL, taskResourceAssignments Map()) -23/09/22 03:14:16 INFO TaskSetManager: Starting task 2.0 in stage 6.0 (TID 9) (10.11.115.133, executor 0, partition 2, PROCESS_LOCAL, taskResourceAssignments Map()) -23/09/22 03:14:16 INFO TaskSetManager: Starting task 3.0 in stage 6.0 (TID 10) (10.11.115.133, executor 0, partition 3, PROCESS_LOCAL, taskResourceAssignments Map()) -23/09/22 03:14:16 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 234.0 KiB, free 3.3 GiB) -23/09/22 03:14:16 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 82.5 KiB, free 3.3 GiB) -23/09/22 03:14:16 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on 10.11.115.134:44293 (size: 82.5 KiB, free: 3.3 GiB) -23/09/22 03:14:16 INFO SparkContext: Created broadcast 9 from broadcast at TaskSetManager.scala:622 -23/09/22 03:14:16 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on 10.11.115.133:45037 (size: 82.5 KiB, free: 3.6 GiB) -23/09/22 03:14:16 INFO ConsoleTransport: {"eventTime":"2023-09-22T03:14:16.29Z","producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent","eventType":"START","run":{"runId":"ee45de3c-b839-4347-92cc-89f766d073c3","facets":{"spark.logicalPlan":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","plan":[{"class":"org.apache.spark.sql.execution.datasources.LogicalRelation","num-children":0,"relation":null,"output":[[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"household_id","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":75,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"basket_id","dataType":"long","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":76,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"day","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":77,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"product_id","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":78,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"quantity","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":79,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"sales_amount","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":80,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"store_id","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":81,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"discount_amount","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":82,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"transaction_time","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":83,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"week_no","dataType":"integer","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":84,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"coupon_discount","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":85,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}],[{"class":"org.apache.spark.sql.catalyst.expressions.AttributeReference","num-children":0,"name":"coupon_discount_match","dataType":"float","nullable":true,"metadata":{},"exprId":{"product-class":"org.apache.spark.sql.catalyst.expressions.ExprId","id":86,"jvmId":"cf1b65cb-72d4-4826-9aa0-ea3aa307592f"},"qualifier":[]}]],"isStreaming":false}]},"spark_version":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","spark-version":"3.3.0","openlineage-spark-version":"1.2.2"},"spark_properties":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","properties":{"spark.master":"spark://10.11.115.134:7077","spark.app.name":"Databricks Shell"}},"processing_engine":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet","version":"3.3.0","name":"spark","openlineageAdapterVersion":"1.2.2"},"environment-properties":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet","environment-properties":{"spark.databricks.clusterUsageTags.clusterName":"jason.yip@tredence.com's Cluster","spark.databricks.clusterUsageTags.azureSubscriptionId":"a4f54399-8db8-4849-adcc-a42aed1fb97f","spark.databricks.notebook.path":"/Repos/jason.yip@tredence.com/segmentation/01_Data Prep","mountPoints":[{"mountPoint":"/databricks-datasets","source":"databricks-datasets"},{"mountPoint":"/Volumes","source":"UnityCatalogVolumes"},{"mountPoint":"/databricks/mlflow-tracking","source":"databricks/mlflow-tracking"},{"mountPoint":"/databricks-results","source":"databricks-results"},{"mountPoint":"/databricks/mlflow-registry","source":"databricks/mlflow-registry"},{"mountPoint":"/Volume","source":"DbfsReserved"},{"mountPoint":"/volumes","source":"DbfsReserved"},{"mountPoint":"/","source":"DatabricksRoot"},{"mountPoint":"/volume","source":"DbfsReserved"}],"spark.databricks.clusterUsageTags.clusterAllTags":"[{\"key\":\"Vendor\",\"value\":\"Databricks\"},{\"key\":\"Creator\",\"value\":\"jason.yip@tredence.com\"},{\"key\":\"ClusterName\",\"value\":\"jason.yip@tredence.com's Cluster\"},{\"key\":\"ClusterId\",\"value\":\"0808-055325-43kdx9a4\"},{\"key\":\"Environment\",\"value\":\"POC\"},{\"key\":\"Project\",\"value\":\"SI\"},{\"key\":\"DatabricksEnvironment\",\"value\":\"workerenv-4679476628690204\"}]","spark.databricks.clusterUsageTags.clusterOwnerOrgId":"4679476628690204","user":"jason.yip@tredence.com","userId":"4768657035718622","orgId":"4679476628690204"}}}},"job":{"namespace":"adb-5445974573286168.8#default","name":"adb-4679476628690204.4.azuredatabricks.net.scan_csv ","facets":{}},"inputs":[{"namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net","name":"/examples/data/csv/completejourney/transaction_data.csv","facets":{"dataSource":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet","name":"wasbs://studio@clororetaildevadls.blob.core.windows.net","uri":"wasbs://studio@clororetaildevadls.blob.core.windows.net"},"schema":{"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.2.2/integration/spark","_schemaURL":"https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet","fields":[{"name":"household_id","type":"integer"},{"name":"basket_id","type":"long"},{"name":"day","type":"integer"},{"name":"product_id","type":"integer"},{"name":"quantity","type":"integer"},{"name":"sales_amount","type":"float"},{"name":"store_id","type":"integer"},{"name":"discount_amount","type":"float"},{"name":"transaction_time","type":"integer"},{"name":"week_no","type":"integer"},{"name":"coupon_discount","type":"float"},{"name":"coupon_discount_match","type":"float"}]}},"inputFacets":{}}],"outputs":[]} -23/09/22 03:14:16 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on 10.11.115.133:45037 (size: 14.5 KiB, free: 3.6 GiB) -23/09/22 03:14:17 INFO ClusterLoadAvgHelper: Current cluster load: 1, Old Ema: 1.0, New Ema: 1.0 -23/09/22 03:14:20 INFO ClusterLoadAvgHelper: Current cluster load: 1, Old Ema: 1.0, New Ema: 1.0 -23/09/22 03:14:23 INFO ClusterLoadAvgHelper: Current cluster load: 1, Old Ema: 1.0, New Ema: 1.0 -23/09/22 03:14:24 INFO TaskSetManager: Finished task 3.0 in stage 6.0 (TID 10) in 8029 ms on 10.11.115.133 (executor 0) (1/4) -23/09/22 03:14:24 INFO TaskSetManager: Finished task 1.0 in stage 6.0 (TID 8) in 8458 ms on 10.11.115.133 (executor 0) (2/4) diff --git a/slack-archive/html/files/C01CK9T7HKR/F05TR87P1JB.jpg b/slack-archive/html/files/C01CK9T7HKR/F05TR87P1JB.jpg deleted file mode 100644 index 06c3bd0..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05TR87P1JB.jpg and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05TR9U0V9V.jpg b/slack-archive/html/files/C01CK9T7HKR/F05TR9U0V9V.jpg deleted file mode 100644 index 2473898..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05TR9U0V9V.jpg and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05TZ52T18W.jpg b/slack-archive/html/files/C01CK9T7HKR/F05TZ52T18W.jpg deleted file mode 100644 index 3f61a69..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05TZ52T18W.jpg and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05TZ6P1FD4.jpg b/slack-archive/html/files/C01CK9T7HKR/F05TZ6P1FD4.jpg deleted file mode 100644 index 6ea3d8a..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05TZ6P1FD4.jpg and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05U39M0MPX.jpg b/slack-archive/html/files/C01CK9T7HKR/F05U39M0MPX.jpg deleted file mode 100644 index 58eb3f2..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05U39M0MPX.jpg and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05U5RQUTJ6.jpg b/slack-archive/html/files/C01CK9T7HKR/F05U5RQUTJ6.jpg deleted file mode 100644 index a65edf2..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05U5RQUTJ6.jpg and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05U6445VM1.jpg b/slack-archive/html/files/C01CK9T7HKR/F05U6445VM1.jpg deleted file mode 100644 index 8be6e13..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05U6445VM1.jpg and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05U673KN1G.jpg b/slack-archive/html/files/C01CK9T7HKR/F05U673KN1G.jpg deleted file mode 100644 index 0f0b67c..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F05U673KN1G.jpg and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F05V8FALBL7 b/slack-archive/html/files/C01CK9T7HKR/F05V8FALBL7 deleted file mode 100644 index bd6ed9c..0000000 --- a/slack-archive/html/files/C01CK9T7HKR/F05V8FALBL7 +++ /dev/null @@ -1,197 +0,0 @@ -23/10/09 03:53:29 INFO InMemoryFileIndex: Start listing leaf files and directories. Size of Paths: 1; threshold: 32 -23/10/09 03:53:29 INFO InMemoryFileIndex: Start listing leaf files and directories. Size of Paths: 0; threshold: 32 -23/10/09 03:53:29 INFO InMemoryFileIndex: It took 18 ms to list leaf files for 1 paths. -23/10/09 03:53:29 INFO InMemoryFileIndex: Start listing leaf files and directories. Size of Paths: 1; threshold: 32 -23/10/09 03:53:29 INFO InMemoryFileIndex: Start listing leaf files and directories. Size of Paths: 0; threshold: 32 -23/10/09 03:53:29 INFO InMemoryFileIndex: It took 18 ms to list leaf files for 1 paths. -23/10/09 03:53:29 INFO ClusterLoadMonitor: Added query with execution ID:15. Current active queries:1 -23/10/09 03:53:29 INFO FileSourceStrategy: Pushed Filters: -23/10/09 03:53:29 INFO FileSourceStrategy: Post-Scan Filters: (length(trim(value#135, None)) > 0) -23/10/09 03:53:29 INFO MemoryStore: Block broadcast_19 stored as values in memory (estimated size 411.3 KiB, free 3.3 GiB) -23/10/09 03:53:29 INFO MemoryStore: Block broadcast_19_piece0 stored as bytes in memory (estimated size 14.4 KiB, free 3.3 GiB) -23/10/09 03:53:29 INFO BlockManagerInfo: Added broadcast_19_piece0 in memory on 10.139.64.10:41051 (size: 14.4 KiB, free: 3.3 GiB) -23/10/09 03:53:29 INFO SparkContext: Created broadcast 19 from load at NativeMethodAccessorImpl.java:0 -23/10/09 03:53:29 INFO FileSourceScanExec: Planning scan with bin packing, max split size: 4194304 bytes, max partition size: 4194304, open cost is considered as scanning 4194304 bytes. -23/10/09 03:53:29 INFO SparkContext: Starting job: load at NativeMethodAccessorImpl.java:0 -23/10/09 03:53:29 INFO DAGScheduler: Got job 10 (load at NativeMethodAccessorImpl.java:0) with 1 output partitions -23/10/09 03:53:29 INFO DAGScheduler: Final stage: ResultStage 12 (load at NativeMethodAccessorImpl.java:0) -23/10/09 03:53:29 INFO DAGScheduler: Parents of final stage: List() -23/10/09 03:53:29 INFO DAGScheduler: Missing parents: List() -23/10/09 03:53:29 INFO DAGScheduler: Submitting ResultStage 12 (MapPartitionsRDD[46] at load at NativeMethodAccessorImpl.java:0), which has no missing parents -23/10/09 03:53:29 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart -23/10/09 03:53:29 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerJobStart -23/10/09 03:53:29 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 12 (MapPartitionsRDD[46] at load at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0)) -23/10/09 03:53:29 INFO TaskSchedulerImpl: Adding task set 12.0 with 1 tasks resource profile 0 -23/10/09 03:53:29 INFO TaskSetManager: TaskSet 12.0 using PreferredLocationsV1 -23/10/09 03:53:29 WARN FairSchedulableBuilder: A job was submitted with scheduler pool 1239554428518675957, which has not been configured. This can happen when the file that pools are read from isn't set, or when that file doesn't contain 1239554428518675957. Created 1239554428518675957 with default configuration (schedulingMode: FIFO, minShare: 0, weight: 1) -23/10/09 03:53:29 INFO FairSchedulableBuilder: Added task set TaskSet_12.0 tasks to pool 1239554428518675957 -23/10/09 03:53:29 INFO TaskSetManager: Starting task 0.0 in stage 12.0 (TID 10) (10.139.64.10, executor driver, partition 0, PROCESS_LOCAL, -23/10/09 03:53:29 INFO MemoryStore: Block broadcast_20 stored as values in memory (estimated size 131.7 KiB, free 3.3 GiB) -23/10/09 03:53:29 INFO MemoryStore: Block broadcast_20_piece0 stored as bytes in memory (estimated size 38.7 KiB, free 3.3 GiB) -23/10/09 03:53:29 INFO BlockManagerInfo: Added broadcast_20_piece0 in memory on 10.139.64.10:41051 (size: 38.7 KiB, free: 3.3 GiB) -23/10/09 03:53:29 INFO SparkContext: Created broadcast 20 from broadcast at TaskSetManager.scala:711 -23/10/09 03:53:29 INFO Executor: Running task 0.0 in stage 12.0 (TID 10) -23/10/09 03:53:29 INFO FileScanRDD: Reading File path: dbfs:/FileStore/babynames.csv, range: 0-278154, partition values: [empty row], modificationTime: 1696823414000. -23/10/09 03:53:29 INFO Executor: Finished task 0.0 in stage 12.0 (TID 10). 3413 bytes result sent to driver -23/10/09 03:53:29 INFO TaskSetManager: Finished task 0.0 in stage 12.0 (TID 10) in 55 ms on 10.139.64.10 (executor driver) (1/1) -23/10/09 03:53:29 INFO TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks have all completed, from pool 1239554428518675957 -23/10/09 03:53:29 INFO DAGScheduler: ResultStage 12 (load at NativeMethodAccessorImpl.java:0) finished in 0.060 s -23/10/09 03:53:29 INFO DAGScheduler: Job 10 is finished. Cancelling potential speculative or zombie tasks for this job -23/10/09 03:53:29 INFO TaskSchedulerImpl: Killing all running tasks in stage 12: Stage finished -23/10/09 03:53:29 INFO DAGScheduler: Job 10 finished: load at NativeMethodAccessorImpl.java:0, took 0.065550 s -23/10/09 03:53:29 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerJobEnd -23/10/09 03:53:30 INFO ClusterLoadMonitor: Removed query with execution ID:15. Current active queries:0 -23/10/09 03:53:30 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd -23/10/09 03:53:30 INFO QueryProfileListener: Query profile sent to logger, seq number: 15, app id: local-1696821525950 -23/10/09 03:53:30 INFO FileSourceStrategy: Pushed Filters: -23/10/09 03:53:30 INFO FileSourceStrategy: Post-Scan Filters: -23/10/09 03:53:30 INFO MemoryStore: Block broadcast_21 stored as values in memory (estimated size 411.3 KiB, free 3.3 GiB) -23/10/09 03:53:30 INFO MemoryStore: Block broadcast_21_piece0 stored as bytes in memory (estimated size 14.4 KiB, free 3.3 GiB) -23/10/09 03:53:30 INFO BlockManagerInfo: Added broadcast_21_piece0 in memory on 10.139.64.10:41051 (size: 14.4 KiB, free: 3.3 GiB) -23/10/09 03:53:30 INFO SparkContext: Created broadcast 21 from load at NativeMethodAccessorImpl.java:0 -23/10/09 03:53:30 INFO FileSourceScanExec: Planning scan with bin packing, max split size: 4194304 bytes, max partition size: 4194304, open cost is considered as scanning 4194304 bytes. -23/10/09 03:53:30 INFO SparkContext: Starting job: load at NativeMethodAccessorImpl.java:0 -23/10/09 03:53:30 INFO DAGScheduler: Got job 11 (load at NativeMethodAccessorImpl.java:0) with 1 output partitions -23/10/09 03:53:30 INFO DAGScheduler: Final stage: ResultStage 13 (load at NativeMethodAccessorImpl.java:0) -23/10/09 03:53:30 INFO DAGScheduler: Parents of final stage: List() -23/10/09 03:53:30 INFO DAGScheduler: Missing parents: List() -23/10/09 03:53:30 INFO DAGScheduler: Submitting ResultStage 13 (MapPartitionsRDD[52] at load at NativeMethodAccessorImpl.java:0), which has no missing parents -23/10/09 03:53:30 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 13 (MapPartitionsRDD[52] at load at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0)) -23/10/09 03:53:30 INFO TaskSchedulerImpl: Adding task set 13.0 with 1 tasks resource profile 0 -23/10/09 03:53:30 INFO TaskSetManager: TaskSet 13.0 using PreferredLocationsV1 -23/10/09 03:53:30 INFO FairSchedulableBuilder: Added task set TaskSet_13.0 tasks to pool 1239554428518675957 -23/10/09 03:53:30 INFO TaskSetManager: Starting task 0.0 in stage 13.0 (TID 11) (10.139.64.10, executor driver, partition 0, PROCESS_LOCAL, -23/10/09 03:53:30 INFO MemoryStore: Block broadcast_22 stored as values in memory (estimated size 158.0 KiB, free 3.3 GiB) -23/10/09 03:53:30 INFO MemoryStore: Block broadcast_22_piece0 stored as bytes in memory (estimated size 52.1 KiB, free 3.3 GiB) -23/10/09 03:53:30 INFO RddExecutionContext: Config field is not HadoopMapRedWriteConfigUtil or HadoopMapReduceWriteConfigUtil, it's org.apache.spark.rdd.RDD$$Lambda$7387/139829442 -23/10/09 03:53:30 INFO RddExecutionContext: Found job conf from RDD Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-rbf-default.xml, hdfs-site.xml, hdfs-rbf-site.xml -23/10/09 03:53:30 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on 10.139.64.10:41051 (size: 52.1 KiB, free: 3.3 GiB) -23/10/09 03:53:30 INFO RddExecutionContext: Found output path null from RDD MapPartitionsRDD[52] at load at NativeMethodAccessorImpl.java:0 -23/10/09 03:53:30 INFO RddExecutionContext: RDDs are empty: skipping sending OpenLineage event -23/10/09 03:53:30 INFO SparkContext: Created broadcast 22 from broadcast at TaskSetManager.scala:711 -23/10/09 03:53:30 INFO Executor: Running task 0.0 in stage 13.0 (TID 11) -23/10/09 03:53:30 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead. -23/10/09 03:53:30 INFO FileScanRDD: Reading File path: dbfs:/FileStore/babynames.csv, range: 0-278154, partition values: [empty row], modificationTime: 1696823414000. -23/10/09 03:53:30 INFO Executor: Finished task 0.0 in stage 13.0 (TID 11). 3237 bytes result sent to driver -23/10/09 03:53:30 INFO TaskSetManager: Finished task 0.0 in stage 13.0 (TID 11) in 68 ms on 10.139.64.10 (executor driver) (1/1) -23/10/09 03:53:30 INFO TaskSchedulerImpl: Removed TaskSet 13.0, whose tasks have all completed, from pool 1239554428518675957 -23/10/09 03:53:30 INFO DAGScheduler: ResultStage 13 (load at NativeMethodAccessorImpl.java:0) finished in 0.086 s -23/10/09 03:53:30 INFO DAGScheduler: Job 11 is finished. Cancelling potential speculative or zombie tasks for this job -23/10/09 03:53:30 INFO TaskSchedulerImpl: Killing all running tasks in stage 13: Stage finished -23/10/09 03:53:30 INFO DAGScheduler: Job 11 finished: load at NativeMethodAccessorImpl.java:0, took 0.091253 s -23/10/09 03:53:30 INFO RddExecutionContext: RDDs are empty: skipping sending OpenLineage event -23/10/09 03:53:30 INFO ClusterLoadMonitor: Added query with execution ID:16. Current active queries:1 -23/10/09 03:53:30 INFO ClusterLoadMonitor: Removed query with execution ID:16. Current active queries:0 -23/10/09 03:53:30 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart -23/10/09 03:53:30 INFO QueryProfileListener: Query profile sent to logger, seq number: 16, app id: local-1696821525950 -23/10/09 03:53:30 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd -23/10/09 03:53:30 INFO FileSourceStrategy: Pushed Filters: -23/10/09 03:53:30 INFO FileSourceStrategy: Post-Scan Filters: -23/10/09 03:53:30 INFO HashAggregateExec: spark.sql.codegen.aggregate.map.twolevel.enabled is set to true, but current version of codegened fast hashmap does not support this aggregate. -23/10/09 03:53:30 INFO MemoryStore: Block broadcast_23 stored as values in memory (estimated size 410.7 KiB, free 3.3 GiB) -23/10/09 03:53:30 INFO MemoryStore: Block broadcast_23_piece0 stored as bytes in memory (estimated size 14.4 KiB, free 3.3 GiB) -23/10/09 03:53:30 INFO BlockManagerInfo: Added broadcast_23_piece0 in memory on 10.139.64.10:41051 (size: 14.4 KiB, free: 3.3 GiB) -23/10/09 03:53:30 INFO SparkContext: Created broadcast 23 from $anonfun$withThreadLocalCaptured$5 at LexicalThreadLocal.scala:63 -23/10/09 03:53:30 INFO FileSourceScanExec: Planning scan with bin packing, max split size: 4194304 bytes, max partition size: 4194304, open cost is considered as scanning 4194304 bytes. -23/10/09 03:53:30 INFO DAGScheduler: Registering RDD 56 ($anonfun$withThreadLocalCaptured$5 at LexicalThreadLocal.scala:63) as input to shuffle 2 -23/10/09 03:53:30 INFO DAGScheduler: Got map stage job 12 ($anonfun$withThreadLocalCaptured$5 at LexicalThreadLocal.scala:63) with 1 output partitions -23/10/09 03:53:30 INFO DAGScheduler: Final stage: ShuffleMapStage 14 ($anonfun$withThreadLocalCaptured$5 at LexicalThreadLocal.scala:63) -23/10/09 03:53:30 INFO DAGScheduler: Parents of final stage: List() -23/10/09 03:53:30 INFO DAGScheduler: Missing parents: List() -23/10/09 03:53:30 INFO DAGScheduler: Submitting ShuffleMapStage 14 (MapPartitionsRDD[56] at $anonfun$withThreadLocalCaptured$5 at LexicalThreadLocal.scala:63), which has no missing parents -23/10/09 03:53:30 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 14 (MapPartitionsRDD[56] at $anonfun$withThreadLocalCaptured$5 at LexicalThreadLocal.scala:63) (first 15 tasks are for partitions Vector(0)) -23/10/09 03:53:30 INFO TaskSchedulerImpl: Adding task set 14.0 with 1 tasks resource profile 0 -23/10/09 03:53:30 INFO TaskSetManager: TaskSet 14.0 using PreferredLocationsV1 -23/10/09 03:53:30 INFO FairSchedulableBuilder: Added task set TaskSet_14.0 tasks to pool 1239554428518675957 -23/10/09 03:53:30 INFO TaskSetManager: Starting task 0.0 in stage 14.0 (TID 12) (10.139.64.10, executor driver, partition 0, PROCESS_LOCAL, -23/10/09 03:53:30 INFO MemoryStore: Block broadcast_24 stored as values in memory (estimated size 91.5 KiB, free 3.3 GiB) -23/10/09 03:53:30 INFO MemoryStore: Block broadcast_24_piece0 stored as bytes in memory (estimated size 37.1 KiB, free 3.3 GiB) -23/10/09 03:53:30 INFO BlockManagerInfo: Added broadcast_24_piece0 in memory on 10.139.64.10:41051 (size: 37.1 KiB, free: 3.3 GiB) -23/10/09 03:53:30 INFO SparkContext: Created broadcast 24 from broadcast at TaskSetManager.scala:711 -23/10/09 03:53:30 INFO Executor: Running task 0.0 in stage 14.0 (TID 12) -23/10/09 03:53:30 INFO RddExecutionContext: Found output path null from RDD MapPartitionsRDD[56] at $anonfun$withThreadLocalCaptured$5 at LexicalThreadLocal.scala:63 -23/10/09 03:53:30 INFO RddExecutionContext: RDDs are empty: skipping sending OpenLineage event -23/10/09 03:53:30 INFO FileScanRDD: Reading File path: dbfs:/FileStore/babynames.csv, range: 0-278154, partition values: [empty row], modificationTime: 1696823414000. -23/10/09 03:53:30 INFO Executor: Finished task 0.0 in stage 14.0 (TID 12). 3926 bytes result sent to driver -23/10/09 03:53:30 INFO TaskSetManager: Finished task 0.0 in stage 14.0 (TID 12) in 82 ms on 10.139.64.10 (executor driver) (1/1) -23/10/09 03:53:30 INFO TaskSchedulerImpl: Removed TaskSet 14.0, whose tasks have all completed, from pool 1239554428518675957 -23/10/09 03:53:30 INFO DAGScheduler: ShuffleMapStage 14 ($anonfun$withThreadLocalCaptured$5 at LexicalThreadLocal.scala:63) finished in 0.086 s -23/10/09 03:53:30 INFO DAGScheduler: looking for newly runnable stages -23/10/09 03:53:30 INFO DAGScheduler: running: Set() -23/10/09 03:53:30 INFO DAGScheduler: waiting: Set() -23/10/09 03:53:30 INFO DAGScheduler: failed: Set() -23/10/09 03:53:30 INFO RddExecutionContext: RDDs are empty: skipping sending OpenLineage event -23/10/09 03:53:30 INFO ShufflePartitionsUtil: For shuffle(2), advisory target size: 67108864, actual target size 1048576, minimum partition size: 1048576 -23/10/09 03:53:30 INFO HashAggregateExec: spark.sql.codegen.aggregate.map.twolevel.enabled is set to true, but current version of codegened fast hashmap does not support this aggregate. -23/10/09 03:53:30 INFO SparkContext: Starting job: wrapper at /root/.ipykernel/2070/command-2627471680180925-2004223455:3 -23/10/09 03:53:30 INFO DAGScheduler: Got job 13 (wrapper at /root/.ipykernel/2070/command-2627471680180925-2004223455:3) with 1 output partitions -23/10/09 03:53:30 INFO DAGScheduler: Final stage: ResultStage 16 (wrapper at /root/.ipykernel/2070/command-2627471680180925-2004223455:3) -23/10/09 03:53:30 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 15) -23/10/09 03:53:30 INFO DAGScheduler: Missing parents: List() -23/10/09 03:53:30 INFO DAGScheduler: Submitting ResultStage 16 (PythonRDD[62] at wrapper at /root/.ipykernel/2070/command-2627471680180925-2004223455:3), which has no missing parents -23/10/09 03:53:30 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 16 (PythonRDD[62] at wrapper at /root/.ipykernel/2070/command-2627471680180925-2004223455:3) (first 15 tasks are for partitions Vector(0)) -23/10/09 03:53:30 INFO TaskSchedulerImpl: Adding task set 16.0 with 1 tasks resource profile 0 -23/10/09 03:53:30 INFO TaskSetManager: TaskSet 16.0 using PreferredLocationsV1 -23/10/09 03:53:30 INFO FairSchedulableBuilder: Added task set TaskSet_16.0 tasks to pool 1239554428518675957 -23/10/09 03:53:30 INFO TaskSetManager: Starting task 0.0 in stage 16.0 (TID 13) (10.139.64.10, executor driver, partition 0, PROCESS_LOCAL, -23/10/09 03:53:30 INFO MemoryStore: Block broadcast_25 stored as values in memory (estimated size 111.8 KiB, free 3.3 GiB) -23/10/09 03:53:30 INFO MemoryStore: Block broadcast_25_piece0 stored as bytes in memory (estimated size 49.0 KiB, free 3.3 GiB) -23/10/09 03:53:30 INFO BlockManagerInfo: Added broadcast_25_piece0 in memory on 10.139.64.10:41051 (size: 49.0 KiB, free: 3.3 GiB) -23/10/09 03:53:30 INFO SparkContext: Created broadcast 25 from broadcast at TaskSetManager.scala:711 -23/10/09 03:53:30 INFO Executor: Running task 0.0 in stage 16.0 (TID 13) -23/10/09 03:53:30 INFO RddExecutionContext: Config field is not HadoopMapRedWriteConfigUtil or HadoopMapReduceWriteConfigUtil, it's org.apache.spark.api.python.PythonRDD -23/10/09 03:53:30 INFO RddExecutionContext: Found job conf from RDD Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-rbf-default.xml, hdfs-site.xml, hdfs-rbf-site.xml -23/10/09 03:53:30 INFO RddExecutionContext: Found output path null from RDD PythonRDD[62] at wrapper at /root/.ipykernel/2070/command-2627471680180925-2004223455:3 -23/10/09 03:53:30 INFO RddExecutionContext: RDDs are empty: skipping sending OpenLineage event -23/10/09 03:53:30 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead. -23/10/09 03:53:30 INFO ShuffleBlockFetcherIterator: Getting 14 (840.0 B) non-empty blocks including 14 (840.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks -23/10/09 03:53:30 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms -23/10/09 03:53:30 INFO PythonRunner: Times: total = 99, boot = 51, init = 10, finish = 38 -23/10/09 03:53:30 INFO Executor: Finished task 0.0 in stage 16.0 (TID 13). 3412 bytes result sent to driver -23/10/09 03:53:30 INFO TaskSetManager: Finished task 0.0 in stage 16.0 (TID 13) in 137 ms on 10.139.64.10 (executor driver) (1/1) -23/10/09 03:53:30 INFO TaskSchedulerImpl: Removed TaskSet 16.0, whose tasks have all completed, from pool 1239554428518675957 -23/10/09 03:53:30 INFO DAGScheduler: ResultStage 16 (wrapper at /root/.ipykernel/2070/command-2627471680180925-2004223455:3) finished in 0.142 s -23/10/09 03:53:30 INFO DAGScheduler: Job 13 is finished. Cancelling potential speculative or zombie tasks for this job -23/10/09 03:53:30 INFO TaskSchedulerImpl: Killing all running tasks in stage 16: Stage finished -23/10/09 03:53:30 INFO DAGScheduler: Job 13 finished: wrapper at /root/.ipykernel/2070/command-2627471680180925-2004223455:3, took 0.148639 s -23/10/09 03:53:30 INFO RddExecutionContext: RDDs are empty: skipping sending OpenLineage event -23/10/09 03:53:30 INFO ClusterLoadMonitor: Added query with execution ID:17. Current active queries:1 -23/10/09 03:53:30 INFO FileSourceStrategy: Pushed Filters: IsNotNull(Year),EqualTo(Year,2014) -23/10/09 03:53:30 INFO FileSourceStrategy: Post-Scan Filters: isnotnull(Year#152),(Year#152 = 2014) -23/10/09 03:53:30 INFO MemoryStore: Block broadcast_26 stored as values in memory (estimated size 410.7 KiB, free 3.3 GiB) -23/10/09 03:53:30 INFO MemoryStore: Block broadcast_26_piece0 stored as bytes in memory (estimated size 14.4 KiB, free 3.3 GiB) -23/10/09 03:53:30 INFO BlockManagerInfo: Added broadcast_26_piece0 in memory on 10.139.64.10:41051 (size: 14.4 KiB, free: 3.3 GiB) -23/10/09 03:53:30 INFO SparkContext: Created broadcast 26 from collectResult at OutputAggregator.scala:267 -23/10/09 03:53:30 INFO FileSourceScanExec: Planning scan with bin packing, max split size: 4194304 bytes, max partition size: 4194304, open cost is considered as scanning 4194304 bytes. -23/10/09 03:53:30 INFO SparkContext: Starting job: collectResult at OutputAggregator.scala:267 -23/10/09 03:53:30 INFO DAGScheduler: Got job 14 (collectResult at OutputAggregator.scala:267) with 1 output partitions -23/10/09 03:53:30 INFO DAGScheduler: Final stage: ResultStage 17 (collectResult at OutputAggregator.scala:267) -23/10/09 03:53:30 INFO DAGScheduler: Parents of final stage: List() -23/10/09 03:53:30 INFO DAGScheduler: Missing parents: List() -23/10/09 03:53:30 INFO DAGScheduler: Submitting ResultStage 17 (MapPartitionsRDD[65] at collectResult at OutputAggregator.scala:267), which has no missing parents -23/10/09 03:53:30 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 17 (MapPartitionsRDD[65] at collectResult at OutputAggregator.scala:267) (first 15 tasks are for partitions Vector(0)) -23/10/09 03:53:30 INFO TaskSchedulerImpl: Adding task set 17.0 with 1 tasks resource profile 0 -23/10/09 03:53:30 INFO TaskSetManager: TaskSet 17.0 using PreferredLocationsV1 -23/10/09 03:53:30 INFO FairSchedulableBuilder: Added task set TaskSet_17.0 tasks to pool 1239554428518675957 -23/10/09 03:53:30 INFO TaskSetManager: Starting task 0.0 in stage 17.0 (TID 14) (10.139.64.10, executor driver, partition 0, PROCESS_LOCAL, -23/10/09 03:53:30 INFO MemoryStore: Block broadcast_27 stored as values in memory (estimated size 135.4 KiB, free 3.3 GiB) -23/10/09 03:53:30 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionStart -23/10/09 03:53:30 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerJobStart -23/10/09 03:53:30 INFO MemoryStore: Block broadcast_27_piece0 stored as bytes in memory (estimated size 40.3 KiB, free 3.3 GiB) -23/10/09 03:53:30 INFO BlockManagerInfo: Added broadcast_27_piece0 in memory on 10.139.64.10:41051 (size: 40.3 KiB, free: 3.3 GiB) -23/10/09 03:53:30 INFO SparkContext: Created broadcast 27 from broadcast at TaskSetManager.scala:711 -23/10/09 03:53:30 INFO Executor: Running task 0.0 in stage 17.0 (TID 14) -23/10/09 03:53:30 INFO FileScanRDD: Reading File path: dbfs:/FileStore/babynames.csv, range: 0-278154, partition values: [empty row], modificationTime: 1696823414000. -23/10/09 03:53:30 INFO Executor: Finished task 0.0 in stage 17.0 (TID 14). 36219 bytes result sent to driver -23/10/09 03:53:30 INFO TaskSetManager: Finished task 0.0 in stage 17.0 (TID 14) in 58 ms on 10.139.64.10 (executor driver) (1/1) -23/10/09 03:53:30 INFO TaskSchedulerImpl: Removed TaskSet 17.0, whose tasks have all completed, from pool 1239554428518675957 -23/10/09 03:53:30 INFO DAGScheduler: ResultStage 17 (collectResult at OutputAggregator.scala:267) finished in 0.063 s -23/10/09 03:53:30 INFO DAGScheduler: Job 14 is finished. Cancelling potential speculative or zombie tasks for this job -23/10/09 03:53:30 INFO TaskSchedulerImpl: Killing all running tasks in stage 17: Stage finished -23/10/09 03:53:30 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerJobEnd -23/10/09 03:53:30 INFO DAGScheduler: Job 14 finished: collectResult at OutputAggregator.scala:267, took 0.071653 s -23/10/09 03:53:30 INFO ClusterLoadMonitor: Removed query with execution ID:17. Current active queries:0 -23/10/09 03:53:30 INFO QueryProfileListener: Query profile sent to logger, seq number: 17, app id: local-1696821525950 -23/10/09 03:53:30 INFO SparkSQLExecutionContext: OpenLineage received Spark event that is configured to be skipped: SparkListenerSQLExecutionEnd -23/10/09 03:53:30 INFO ProgressReporter$: Removed result fetcher for 1239554428518675957_5649797394450581660_69e5ef0f22794594ab0d05a411e35e19 -23/10/09 03:53:30 INFO PresignedUrlClientUtils$: Successfully upload file to ADLGen2 using create, append and flush to url: https://dbstoragekvew6l5xkyj2c.dfs.core.windows.net/jobs/3942203504488904/command-results/2627471680180925/8b067875-71c2-4aa6-bdad-8ad2226eb487 \ No newline at end of file diff --git a/slack-archive/html/files/C01CK9T7HKR/F0608U3FJ3D.png b/slack-archive/html/files/C01CK9T7HKR/F0608U3FJ3D.png deleted file mode 100644 index 068e698..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F0608U3FJ3D.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F06183ZEM39.png b/slack-archive/html/files/C01CK9T7HKR/F06183ZEM39.png deleted file mode 100644 index 4abd6a6..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F06183ZEM39.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F061FR39UE5.html b/slack-archive/html/files/C01CK9T7HKR/F061FR39UE5.html deleted file mode 100644 index ce5199b..0000000 --- a/slack-archive/html/files/C01CK9T7HKR/F061FR39UE5.html +++ /dev/null @@ -1,243 +0,0 @@ - - - - - -Test results - AlterTableAddPartitionCommandVisitorTest - - - - - -
-

AlterTableAddPartitionCommandVisitorTest

- -
-
- - - - -
-
- - - - - - - -
-
-
1
-

tests

-
-
-
-
1
-

failures

-
-
-
-
0
-

ignored

-
-
-
-
2.120s
-

duration

-
-
-
-
-
-
0%
-

successful

-
-
-

-
- -
-

Failed tests

-
- -

testAlterTableAddPartition()

- -
java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x5824a83d) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x5824a83d
-	at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:213)
-	at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
-	at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:121)
-	at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:358)
-	at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:295)
-	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:344)
-	at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:196)
-	at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:284)
-	at org.apache.spark.SparkContext.<init>(SparkContext.scala:483)
-	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2888)
-	at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1099)
-	at scala.Option.getOrElse(Option.scala:189)
-	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1093)
-	at io.openlineage.spark.agent.lifecycle.plan.AlterTableAddPartitionCommandVisitorTest.setup(AlterTableAddPartitionCommandVisitorTest.java:71)
-	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
-	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
-	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
-	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
-	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
-	at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
-	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:128)
-	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptBeforeEachMethod(TimeoutExtension.java:78)
-	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
-	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
-	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
-	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
-	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
-	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
-	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
-	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
-	at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.invokeMethodInExtensionContext(ClassBasedTestDescriptor.java:521)
-	at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$synthesizeBeforeEachMethodAdapter$23(ClassBasedTestDescriptor.java:506)
-	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeBeforeEachMethods$3(TestMethodTestDescriptor.java:175)
-	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeBeforeMethodsOrCallbacksUntilExceptionOccurs$6(TestMethodTestDescriptor.java:203)
-	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
-	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeBeforeMethodsOrCallbacksUntilExceptionOccurs(TestMethodTestDescriptor.java:203)
-	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeBeforeEachMethods(TestMethodTestDescriptor.java:172)
-	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
-	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
-	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
-	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
-	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
-	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
-	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
-	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
-	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
-	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
-	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
-	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
-	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
-	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
-	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
-	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
-	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
-	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
-	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
-	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
-	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
-	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
-	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
-	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
-	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
-	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
-	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
-	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
-	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
-	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
-	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35)
-	at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
-	at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:54)
-	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108)
-	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
-	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
-	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67)
-	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:52)
-	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:96)
-	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:75)
-	at org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor.processAllTestClasses(JUnitPlatformTestClassProcessor.java:99)
-	at org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor.access$000(JUnitPlatformTestClassProcessor.java:79)
-	at org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor.stop(JUnitPlatformTestClassProcessor.java:75)
-	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.stop(SuiteTestClassProcessor.java:61)
-	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
-	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
-	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
-	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
-	at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
-	at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
-	at jdk.proxy1/jdk.proxy1.$Proxy2.stop(Unknown Source)
-	at org.gradle.api.internal.tasks.testing.worker.TestWorker$3.run(TestWorker.java:193)
-	at org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129)
-	at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100)
-	at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60)
-	at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56)
-	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:133)
-	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:71)
-	at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
-	at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
-	Suppressed: java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.SparkSession.sessionState()" because "this.session" is null
-		at io.openlineage.spark.agent.lifecycle.plan.AlterTableAddPartitionCommandVisitorTest.dropTables(AlterTableAddPartitionCommandVisitorTest.java:46)
-		at io.openlineage.spark.agent.lifecycle.plan.AlterTableAddPartitionCommandVisitorTest.afterEach(AlterTableAddPartitionCommandVisitorTest.java:41)
-		at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-		at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
-		at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
-		at java.base/java.lang.reflect.Method.invoke(Method.java:568)
-		at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
-		at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
-		at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
-		at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
-		at org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:128)
-		at org.junit.jupiter.engine.extension.TimeoutExtension.interceptAfterEachMethod(TimeoutExtension.java:110)
-		at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
-		at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
-		at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
-		at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
-		at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
-		at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
-		at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
-		at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
-		at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.invokeMethodInExtensionContext(ClassBasedTestDescriptor.java:521)
-		at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$synthesizeAfterEachMethodAdapter$24(ClassBasedTestDescriptor.java:511)
-		at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeAfterEachMethods$10(TestMethodTestDescriptor.java:244)
-		at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeAllAfterMethodsOrCallbacks$13(TestMethodTestDescriptor.java:277)
-		at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
-		at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeAllAfterMethodsOrCallbacks$14(TestMethodTestDescriptor.java:277)
-		at org.junit.platform.commons.util.CollectionUtils.forEachInReverseOrder(CollectionUtils.java:217)
-		at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeAllAfterMethodsOrCallbacks(TestMethodTestDescriptor.java:276)
-		at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeAfterEachMethods(TestMethodTestDescriptor.java:242)
-		at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:143)
-		... 61 more
-
-
-
-
-
-

Tests

- - - - - - - - - - - - - -
TestDurationResult
testAlterTableAddPartition()2.120sfailed
-
-
- -
- - diff --git a/slack-archive/html/files/C01CK9T7HKR/F061S5ZMF08.png b/slack-archive/html/files/C01CK9T7HKR/F061S5ZMF08.png deleted file mode 100644 index 9ee26be..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F061S5ZMF08.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F062L6J1PU1.png b/slack-archive/html/files/C01CK9T7HKR/F062L6J1PU1.png deleted file mode 100644 index e6e7367..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F062L6J1PU1.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F062ZFJN2UB.png b/slack-archive/html/files/C01CK9T7HKR/F062ZFJN2UB.png deleted file mode 100644 index 638a6a3..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F062ZFJN2UB.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F06300USGUS.png b/slack-archive/html/files/C01CK9T7HKR/F06300USGUS.png deleted file mode 100644 index be12785..0000000 Binary files a/slack-archive/html/files/C01CK9T7HKR/F06300USGUS.png and /dev/null differ diff --git a/slack-archive/html/files/C01CK9T7HKR/F06375RT0LS.txt b/slack-archive/html/files/C01CK9T7HKR/F06375RT0LS.txt deleted file mode 100644 index 0db579d..0000000 --- a/slack-archive/html/files/C01CK9T7HKR/F06375RT0LS.txt +++ /dev/null @@ -1,197 +0,0 @@ -root@ip-172-30-4-153:~/marquez# ./docker/up.sh --tag 0.37.0 -a 5000 -m 5001 -w 3000 --build -...creating volumes: marquez_data, marquez_db-conf, marquez_db-init, marquez_db-backup -Successfully copied 7.17kB to volumes-provisioner:/data/wait-for-it.sh -Added files to volume marquez_data: wait-for-it.sh -Successfully copied 2.05kB to volumes-provisioner:/db-conf/postgresql.conf -Added files to volume marquez_db-conf: postgresql.conf -Successfully copied 2.05kB to volumes-provisioner:/db-init/init-db.sh -Added files to volume marquez_db-init: init-db.sh -DONE! -[+] Building 0.7s (40/53) docker:default - => [seed_marquez internal] load build definition from Dockerfile 0.0s - => => transferring dockerfile: 730B 0.0s - => [seed_marquez internal] load .dockerignore 0.0s - => => transferring context: 91B 0.0s - => [api internal] load .dockerignore 0.0s - => => transferring context: 91B 0.0s - => [api internal] load build definition from Dockerfile 0.0s - => => transferring dockerfile: 730B 0.0s - => [api internal] load metadata for docker.io/library/eclipse-temurin:17 0.1s - => [api internal] load build context 0.1s - => => transferring context: 42.84kB 0.0s - => [seed_marquez base 1/7] FROM docker.io/library/eclipse-temurin:17@sha256:b11bfab9cf5699455664b66873a9857ba22ce8da5e2d2e4e4698c9f6a4930c36 0.0s - => [seed_marquez internal] load build context 0.1s - => => transferring context: 42.84kB 0.0s - => CACHED [api stage-2 2/6] RUN apt-get update && apt-get install -y postgresql-client bash coreutils 0.0s - => CACHED [api stage-2 3/6] WORKDIR /usr/src/app 0.0s - => CACHED [api base 2/7] WORKDIR /usr/src/app 0.0s - => CACHED [api base 3/7] COPY gradle gradle 0.0s - => CACHED [api base 4/7] COPY gradle.properties gradle.properties 0.0s - => CACHED [api base 5/7] COPY gradlew gradlew 0.0s - => CACHED [api base 6/7] COPY settings.gradle settings.gradle 0.0s - => CACHED [api base 7/7] RUN ./gradlew --version 0.0s - => CACHED [api build 1/5] WORKDIR /usr/src/app 0.0s - => CACHED [api build 2/5] COPY build.gradle build.gradle 0.0s - => CACHED [api build 3/5] COPY api ./api 0.0s - => CACHED [api build 4/5] COPY clients/java ./clients/java 0.0s - => CACHED [api build 5/5] RUN ./gradlew --no-daemon clean :api:shadowJar 0.0s - => CACHED [api stage-2 4/6] COPY --from=build /usr/src/app/api/build/libs/marquez-*.jar /usr/src/app 0.0s - => CACHED [api stage-2 5/6] COPY marquez.dev.yml marquez.dev.yml 0.0s - => CACHED [api stage-2 6/6] COPY docker/entrypoint.sh entrypoint.sh 0.0s - => [seed_marquez] exporting to image 0.0s - => => exporting layers 0.0s - => => writing image sha256:2b5190cc7c89dacadc5a6c0c8466994dc10aef24ef7fae4046498c80bc5823c3 0.0s - => => naming to docker.io/library/marquez-seed_marquez 0.0s - => [api] exporting to image 0.0s - => => exporting layers 0.0s - => => writing image sha256:e4241ae334d3c967583e3af6691f21eb387005e1e8663048803ef590ae8d0127 0.0s - => => naming to docker.io/marquezproject/marquez:0.42.0 0.0s - => [web internal] load build definition from Dockerfile 0.0s - => => transferring dockerfile: 317B 0.0s - => [web internal] load .dockerignore 0.0s - => => transferring context: 2B 0.0s - => [web internal] load metadata for docker.io/library/node:18-alpine 0.1s - => [web 1/9] FROM docker.io/library/node:18-alpine@sha256:435dcad253bb5b7f347ebc69c8cc52de7c912eb7241098b920f2fc2d7843183d 0.0s - => [web internal] load build context 0.0s - => => transferring context: 9.54kB 0.0s - => CACHED [web 2/9] WORKDIR /usr/src/app 0.0s - => CACHED [web 3/9] RUN apk update && apk add --virtual bash coreutils 0.0s - => CACHED [web 4/9] RUN apk add --no-cache git 0.0s - => CACHED [web 5/9] COPY package*.json ./ 0.0s - => CACHED [web 6/9] RUN npm install 0.0s - => CACHED [web 7/9] COPY . . 0.0s - => CACHED [web 8/9] RUN npm run build 0.0s - => CACHED [web 9/9] COPY docker/entrypoint.sh entrypoint.sh 0.0s - => [web] exporting to image 0.0s - => => exporting layers 0.0s - => => writing image sha256:e5d43892ccac4e77b0193fe6fb59418000a71f0046eb77abeac9d239079bfb8f 0.0s - => => naming to docker.io/marquezproject/marquez-web:0.42.0 0.0s -[+] Running 5/5 - ✔ Container marquez-seed_marquez-1 Recreated 0.2s - ✔ Container marquez-db Recreated 0.2s - ✔ Container pghero Recreated 0.2s - ✔ Container marquez-api Recreated 0.1s - ✔ Container marquez-web Recreated 0.1s -Attaching to marquez-api, marquez-db, marquez-seed_marquez-1, marquez-web, pghero -marquez-seed_marquez-1 | WARNING 'MARQUEZ_CONFIG' not set, using development configuration. -marquez-db | -marquez-db | PostgreSQL Database directory appears to contain a database; Skipping initialization -marquez-db | -marquez-db | 2023-10-27 20:43:22.700 GMT [1] LOG: starting PostgreSQL 14.9 (Debian 14.9-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit -marquez-db | 2023-10-27 20:43:22.711 GMT [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 -marquez-db | 2023-10-27 20:43:22.717 GMT [1] LOG: could not create IPv6 socket for address "::": Address family not supported by protocol -marquez-db | 2023-10-27 20:43:22.726 GMT [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" -marquez-db | 2023-10-27 20:43:22.747 GMT [27] LOG: database system was shut down at 2023-10-27 20:41:26 GMT -marquez-db | 2023-10-27 20:43:22.758 GMT [1] LOG: database system is ready to accept connections -marquez-api | wait-for-it.sh: waiting 15 seconds for db:5432 -marquez-db | 2023-10-27 20:43:23.229 GMT [34] LOG: incomplete startup packet -marquez-api | wait-for-it.sh: db:5432 is available after 0 seconds -marquez-api | WARNING 'MARQUEZ_CONFIG' not set, using development configuration. -pghero | [1] Puma starting in cluster mode... -pghero | [1] * Puma version: 6.3.1 (ruby 3.1.4-p223) ("Mugi No Toki Itaru") -pghero | [1] * Min threads: 1 -pghero | [1] * Max threads: 16 -pghero | [1] * Environment: production -pghero | [1] * Master PID: 1 -pghero | [1] * Workers: 3 -pghero | [1] * Restarts: (✔) hot (✖) phased -pghero | [1] * Preloading application -marquez-web | [HPM] Proxy created: /api/v1 -> http://api:5000/ -marquez-web | App listening on port 3000! -marquez-seed_marquez-1 | INFO [2023-10-27 20:43:31,124] org.eclipse.jetty.util.log: Logging initialized @8890ms to org.eclipse.jetty.util.log.Slf4jLog -marquez-seed_marquez-1 | INFO [2023-10-27 20:43:31,497] io.dropwizard.server.DefaultServerFactory: Registering jersey handler with root path prefix: / -marquez-seed_marquez-1 | INFO [2023-10-27 20:43:31,529] io.dropwizard.server.DefaultServerFactory: Registering admin handler with root path prefix: / -marquez-seed_marquez-1 | INFO [2023-10-27 20:43:31,536] io.dropwizard.assets.AssetsBundle: Registering AssetBundle with name: graphql-playground for path /graphql-playground/* -marquez-seed_marquez-1 | INFO [2023-10-27 20:43:31,589] marquez.MarquezApp: Running startup actions... -pghero | [1] * Listening on http://0.0.0.0:8080 -pghero | [1] Use Ctrl-C to stop -marquez-seed_marquez-1 | INFO [2023-10-27 20:43:31,777] org.flywaydb.core.internal.license.VersionPrinter: Flyway Community Edition 8.5.13 by Redgate -marquez-seed_marquez-1 | INFO [2023-10-27 20:43:31,780] org.flywaydb.core.internal.license.VersionPrinter: See what's new here: https://flywaydb.org/documentation/learnmore/releaseNotes#8.5.13 -marquez-seed_marquez-1 | INFO [2023-10-27 20:43:31,781] org.flywaydb.core.internal.license.VersionPrinter: -pghero | [1] - Worker 1 (PID: 10) booted in 0.1s, phase: 0 -pghero | [1] - Worker 2 (PID: 13) booted in 0.01s, phase: 0 -pghero | [1] - Worker 0 (PID: 9) booted in 0.14s, phase: 0 -marquez-seed_marquez-1 | ERROR [2023-10-27 20:43:32,205] org.apache.tomcat.jdbc.pool.ConnectionPool: Unable to create initial connections of pool. -marquez-seed_marquez-1 | ! java.net.UnknownHostException: postgres -marquez-seed_marquez-1 | ! at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:572) -marquez-seed_marquez-1 | ! at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) -marquez-seed_marquez-1 | ! at java.base/java.net.Socket.connect(Socket.java:633) -marquez-seed_marquez-1 | ! at org.postgresql.core.PGStream.createSocket(PGStream.java:243) -marquez-seed_marquez-1 | ! at org.postgresql.core.PGStream.(PGStream.java:98) -marquez-seed_marquez-1 | ! at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:132) -marquez-seed_marquez-1 | ! at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258) -marquez-seed_marquez-1 | ! ... 26 common frames omitted -marquez-seed_marquez-1 | ! Causing: org.postgresql.util.PSQLException: The connection attempt failed. -marquez-seed_marquez-1 | ! at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:354) -marquez-seed_marquez-1 | ! at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54) -marquez-seed_marquez-1 | ! at org.postgresql.jdbc.PgConnection.(PgConnection.java:263) -marquez-seed_marquez-1 | ! at org.postgresql.Driver.makeConnection(Driver.java:443) -marquez-seed_marquez-1 | ! at org.postgresql.Driver.connect(Driver.java:297) -marquez-seed_marquez-1 | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:346) -marquez-seed_marquez-1 | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:227) -marquez-seed_marquez-1 | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:768) -marquez-seed_marquez-1 | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:696) -marquez-seed_marquez-1 | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.init(ConnectionPool.java:495) -marquez-seed_marquez-1 | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.(ConnectionPool.java:153) -marquez-seed_marquez-1 | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.pCreatePool(DataSourceProxy.java:118) -marquez-seed_marquez-1 | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.createPool(DataSourceProxy.java:107) -marquez-seed_marquez-1 | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:131) -marquez-seed_marquez-1 | ! at org.flywaydb.core.internal.jdbc.JdbcUtils.openConnection(JdbcUtils.java:48) -marquez-seed_marquez-1 | ! at org.flywaydb.core.internal.jdbc.JdbcConnectionFactory.(JdbcConnectionFactory.java:75) -marquez-seed_marquez-1 | ! at org.flywaydb.core.FlywayExecutor.execute(FlywayExecutor.java:147) -marquez-seed_marquez-1 | ! at org.flywaydb.core.Flyway.info(Flyway.java:190) -marquez-seed_marquez-1 | ! at marquez.db.DbMigration.hasPendingDbMigrations(DbMigration.java:78) -marquez-seed_marquez-1 | ! at marquez.db.DbMigration.migrateDbOrError(DbMigration.java:33) -marquez-seed_marquez-1 | ! at marquez.MarquezApp.run(MarquezApp.java:107) -marquez-seed_marquez-1 | ! at marquez.MarquezApp.run(MarquezApp.java:49) -marquez-seed_marquez-1 | ! at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:67) -marquez-seed_marquez-1 | ! at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:98) -marquez-seed_marquez-1 | ! at io.dropwizard.cli.Cli.run(Cli.java:78) -marquez-seed_marquez-1 | ! at io.dropwizard.Application.run(Application.java:94) -marquez-seed_marquez-1 | ! at marquez.MarquezApp.main(MarquezApp.java:61) -marquez-seed_marquez-1 | INFO [2023-10-27 20:43:32,218] marquez.MarquezApp: Stopping app... -marquez-api | INFO [2023-10-27 20:43:32,286] org.eclipse.jetty.util.log: Logging initialized @9023ms to org.eclipse.jetty.util.log.Slf4jLog -marquez-seed_marquez-1 exited with code 1 -marquez-api | INFO [2023-10-27 20:43:32,518] io.dropwizard.server.DefaultServerFactory: Registering jersey handler with root path prefix: / -marquez-api | INFO [2023-10-27 20:43:32,521] io.dropwizard.server.DefaultServerFactory: Registering admin handler with root path prefix: / -marquez-api | INFO [2023-10-27 20:43:32,522] io.dropwizard.assets.AssetsBundle: Registering AssetBundle with name: graphql-playground for path /graphql-playground/* -marquez-api | INFO [2023-10-27 20:43:32,533] marquez.MarquezApp: Running startup actions... -marquez-api | INFO [2023-10-27 20:43:32,586] org.flywaydb.core.internal.license.VersionPrinter: Flyway Community Edition 8.5.13 by Redgate -marquez-api | INFO [2023-10-27 20:43:32,586] org.flywaydb.core.internal.license.VersionPrinter: See what's new here: https://flywaydb.org/documentation/learnmore/releaseNotes#8.5.13 -marquez-api | INFO [2023-10-27 20:43:32,586] org.flywaydb.core.internal.license.VersionPrinter: -marquez-db | 2023-10-27 20:43:32.972 GMT [35] FATAL: password authentication failed for user "marquez" -marquez-db | 2023-10-27 20:43:32.972 GMT [35] DETAIL: Role "marquez" does not exist. -marquez-db | Connection matched pg_hba.conf line 100: "host marquez marquez 172.18.0.5/32 md5" -marquez-api | ERROR [2023-10-27 20:43:32,983] org.apache.tomcat.jdbc.pool.ConnectionPool: Unable to create initial connections of pool. -marquez-api | ! org.postgresql.util.PSQLException: FATAL: password authentication failed for user "marquez" -marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:693) -marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:203) -marquez-api | ! at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258) -marquez-api | ! at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54) -marquez-api | ! at org.postgresql.jdbc.PgConnection.(PgConnection.java:263) -marquez-api | ! at org.postgresql.Driver.makeConnection(Driver.java:443) -marquez-api | ! at org.postgresql.Driver.connect(Driver.java:297) -marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:346) -marquez-api | ! at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:227) -marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:768) -marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:696) -marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.init(ConnectionPool.java:495) -marquez-api | ! at org.apache.tomcat.jdbc.pool.ConnectionPool.(ConnectionPool.java:153) -marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.pCreatePool(DataSourceProxy.java:118) -marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.createPool(DataSourceProxy.java:107) -marquez-api | ! at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:131) -marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcUtils.openConnection(JdbcUtils.java:48) -marquez-api | ! at org.flywaydb.core.internal.jdbc.JdbcConnectionFactory.(JdbcConnectionFactory.java:75) -marquez-api | ! at org.flywaydb.core.FlywayExecutor.execute(FlywayExecutor.java:147) -marquez-api | ! at org.flywaydb.core.Flyway.info(Flyway.java:190) -marquez-api | ! at marquez.db.DbMigration.hasPendingDbMigrations(DbMigration.java:78) -marquez-api | ! at marquez.db.DbMigration.migrateDbOrError(DbMigration.java:33) -marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:107) -marquez-api | ! at marquez.MarquezApp.run(MarquezApp.java:49) -marquez-api | ! at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:67) -marquez-api | ! at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:98) -marquez-api | ! at io.dropwizard.cli.Cli.run(Cli.java:78) -marquez-api | ! at io.dropwizard.Application.run(Application.java:94) -marquez-api | ! at marquez.MarquezApp.main(MarquezApp.java:61) -marquez-api | INFO [2023-10-27 20:43:32,989] marquez.MarquezApp: Stopping app... -marquez-api exited with code 1 \ No newline at end of file diff --git a/slack-archive/html/files/C01CK9T7HKR/F063M06MWBZ.json b/slack-archive/html/files/C01CK9T7HKR/F063M06MWBZ.json deleted file mode 100644 index 12f45f4..0000000 --- a/slack-archive/html/files/C01CK9T7HKR/F063M06MWBZ.json +++ /dev/null @@ -1,531 +0,0 @@ -{ - "eventTime":"2023-11-02T18:42:00.619Z", - "producer":"https://github.com/OpenLineage/OpenLineage/tree/1.5.0/integration/spark", - "schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent", - "eventType":"START", - "run":{ - "runId":"957e5191-a1bf-4c65-b08a-b7d125da2ff3", - "facets":{ - "spark.logicalPlan":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.5.0/integration/spark", - "_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", - "plan":[ - { - "class":"org.apache.spark.sql.catalyst.plans.logical.ReplaceTableAsSelect", - "num-children":0, - "name":[ - { - "class":"org.apache.spark.sql.catalyst.analysis.ResolvedIdentifier", - "num-children":0, - "catalog":null, - "identifier":null - } - ], - "partitioning":[ - - ], - "query":[ - { - "class":"org.apache.spark.sql.execution.datasources.LogicalRelation", - "num-children":0, - "relation":null, - "output":[ - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"household_id", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":131, - "jvmId":"ae8ac2ed-0395-48ec-8418-2fc62a0fd2b4" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"basket_id", - "dataType":"long", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":132, - "jvmId":"ae8ac2ed-0395-48ec-8418-2fc62a0fd2b4" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"day", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":133, - "jvmId":"ae8ac2ed-0395-48ec-8418-2fc62a0fd2b4" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"product_id", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":134, - "jvmId":"ae8ac2ed-0395-48ec-8418-2fc62a0fd2b4" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"quantity", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":135, - "jvmId":"ae8ac2ed-0395-48ec-8418-2fc62a0fd2b4" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"sales_amount", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":136, - "jvmId":"ae8ac2ed-0395-48ec-8418-2fc62a0fd2b4" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"store_id", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":137, - "jvmId":"ae8ac2ed-0395-48ec-8418-2fc62a0fd2b4" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"discount_amount", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":138, - "jvmId":"ae8ac2ed-0395-48ec-8418-2fc62a0fd2b4" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"transaction_time", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":139, - "jvmId":"ae8ac2ed-0395-48ec-8418-2fc62a0fd2b4" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"week_no", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":140, - "jvmId":"ae8ac2ed-0395-48ec-8418-2fc62a0fd2b4" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"coupon_discount", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":141, - "jvmId":"ae8ac2ed-0395-48ec-8418-2fc62a0fd2b4" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"coupon_discount_match", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":142, - "jvmId":"ae8ac2ed-0395-48ec-8418-2fc62a0fd2b4" - }, - "qualifier":[ - - ] - } - ] - ], - "isStreaming":false - } - ], - "tableSpec":null, - "writeOptions":null, - "orCreate":true, - "isAnalyzed":true - } - ] - }, - "debug":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.5.0/integration/spark", - "_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", - "classpath":{ - "openLineageVersion":"1.5.0", - "sparkVersion":"3.4.1", - "scalaVersion":"2.12.15", - "jars":[ - - ], - "classDetails":[ - { - "className":"org.apache.spark.sql.delta.catalog.DeltaCatalog", - "onClasspath":true - }, - { - "className":"org.apache.iceberg.catalog.Catalog", - "onClasspath":false - }, - { - "className":"com.google.cloud.spark.bigquery.BigQueryRelation", - "packageVersion":"0.22.2-SNAPSHOT", - "onClasspath":true - } - ] - }, - "system":{ - "sparkDeployMode":"client", - "javaVersion":"1.8.0_372", - "javaVendor":"Azul Systems, Inc.", - "osArch":"amd64", - "osName":"Linux", - "osVersion":"5.15.0-1049-azure", - "userLanguage":"en", - "userTimezone":"Etc/UTC" - }, - "config":{ - "extraListeners":"io.openlineage.spark.agent.OpenLineageSparkListener", - "openLineageConfig":{ - "endpoint":"api/v1/lineage", - "debugFacet":"enabled", - "url.param.code":"8kZl0bo2TJfnbpFxBv-R2v7xBDj-PgWMol3yUm5iP1vaAzFu9kIZGg==", - "facets.disabled":"[spark_unknown;]", - "namespace":"adb-4679476628690204.4#default", - "transport.type":"http", - "transport.url":"https://490f-2607-fb90-c13e-509-2854-a7e-2d53-a767.ngrok-free.app" - }, - "catalogClass":"org.apache.spark.sql.internal.CatalogImpl" - }, - "logicalPlan":{ - "nodes":[ - { - "id":"ReplaceTableAsSelect@1968008258", - "desc":"ReplaceTableAsSelect TableSpec(Map(),Some(delta),Map(),Some(wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions),None,None,false,Set(),None,None,None), [overwriteSchema=true, path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions], true, true\n :- ResolvedIdentifier com.databricks.sql.managedcatalog.UnityCatalogV2Proxy@698ca05e, journey.transactions\n +- Relation [household_id#131,basket_id#132L,day#133,product_id#134,quantity#135,sales_amount#136,store_id#137,discount_amount#138,transaction_time#139,week_no#140,coupon_discount#141,coupon_discount_match#142] csv\n", - "children":[ - - ] - } - ] - } - }, - "spark_version":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.5.0/integration/spark", - "_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", - "spark-version":"3.4.1", - "openlineage-spark-version":"1.5.0" - }, - "processing_engine":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.5.0/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet", - "version":"3.4.1", - "name":"spark", - "openlineageAdapterVersion":"1.5.0" - } - } - }, - "job":{ - "namespace":"adb-4679476628690204.4#default", - "name":"adb-4679476628690204.4.azuredatabricks.net.atomic_replace_table_as_select.journey_db_transactions", - "facets":{ - - } - }, - "inputs":[ - { - "namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "name":"/examples/data/csv/completejourney/transaction_data.csv", - "facets":{ - "dataSource":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.5.0/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", - "name":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "uri":"wasbs://studio@clororetaildevadls.blob.core.windows.net" - }, - "schema":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.5.0/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", - "fields":[ - { - "name":"household_id", - "type":"integer" - }, - { - "name":"basket_id", - "type":"long" - }, - { - "name":"day", - "type":"integer" - }, - { - "name":"product_id", - "type":"integer" - }, - { - "name":"quantity", - "type":"integer" - }, - { - "name":"sales_amount", - "type":"float" - }, - { - "name":"store_id", - "type":"integer" - }, - { - "name":"discount_amount", - "type":"float" - }, - { - "name":"transaction_time", - "type":"integer" - }, - { - "name":"week_no", - "type":"integer" - }, - { - "name":"coupon_discount", - "type":"float" - }, - { - "name":"coupon_discount_match", - "type":"float" - } - ] - } - }, - "inputFacets":{ - - } - } - ], - "outputs":[ - { - "namespace":"dbfs", - "name":"/user/hive/warehouse/journey.db/transactions", - "facets":{ - "dataSource":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.5.0/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", - "name":"dbfs", - "uri":"dbfs" - }, - "schema":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.5.0/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", - "fields":[ - { - "name":"household_id", - "type":"integer" - }, - { - "name":"basket_id", - "type":"long" - }, - { - "name":"day", - "type":"integer" - }, - { - "name":"product_id", - "type":"integer" - }, - { - "name":"quantity", - "type":"integer" - }, - { - "name":"sales_amount", - "type":"float" - }, - { - "name":"store_id", - "type":"integer" - }, - { - "name":"discount_amount", - "type":"float" - }, - { - "name":"transaction_time", - "type":"integer" - }, - { - "name":"week_no", - "type":"integer" - }, - { - "name":"coupon_discount", - "type":"float" - }, - { - "name":"coupon_discount_match", - "type":"float" - } - ] - }, - "storage":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.5.0/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-0-0/StorageDatasetFacet.json#/$defs/StorageDatasetFacet", - "storageLayer":"unity", - "fileFormat":"parquet" - }, - "symlinks":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.5.0/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet", - "identifiers":[ - { - "namespace":"/user/hive/warehouse/journey.db", - "name":"journey.transactions", - "type":"TABLE" - } - ] - }, - "lifecycleStateChange":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.5.0/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-0-0/LifecycleStateChangeDatasetFacet.json#/$defs/LifecycleStateChangeDatasetFacet", - "lifecycleStateChange":"OVERWRITE" - } - }, - "outputFacets":{ - - } - } - ] -} \ No newline at end of file diff --git a/slack-archive/html/files/C01CK9T7HKR/F0663EAEL0Y.json b/slack-archive/html/files/C01CK9T7HKR/F0663EAEL0Y.json deleted file mode 100644 index 2cf3c3e..0000000 --- a/slack-archive/html/files/C01CK9T7HKR/F0663EAEL0Y.json +++ /dev/null @@ -1,1137 +0,0 @@ -{ - "eventTime":"2023-11-13T07:49:59.575Z", - "producer":"https://github.com/OpenLineage/OpenLineage/tree/1.4.1/integration/spark", - "schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent", - "eventType":"COMPLETE", - "run":{ - "runId":"dc25990e-163c-4a84-9935-ff743afbcf66", - "facets":{ - "spark.logicalPlan":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.4.1/integration/spark", - "_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", - "plan":[ - { - "class":"org.apache.spark.sql.catalyst.plans.logical.AppendData", - "num-children":1, - "table":[ - { - "class":"org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation", - "num-children":0, - "table":null, - "output":[ - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"household_id", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":187, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"basket_id", - "dataType":"long", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":188, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"day", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":189, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"product_id", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":190, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"quantity", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":191, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"sales_amount", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":192, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"store_id", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":193, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"discount_amount", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":194, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"transaction_time", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":195, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"week_no", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":196, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"coupon_discount", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":197, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"coupon_discount_match", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":198, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ] - ], - "catalog":null, - "identifier":null, - "options":null - } - ], - "query":0, - "writeOptions":null, - "isByName":false, - "write":null, - "analyzedQuery":[ - { - "class":"org.apache.spark.sql.execution.datasources.LogicalRelation", - "num-children":0, - "relation":null, - "output":[ - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"household_id", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":131, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"basket_id", - "dataType":"long", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":132, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"day", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":133, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"product_id", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":134, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"quantity", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":135, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"sales_amount", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":136, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"store_id", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":137, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"discount_amount", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":138, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"transaction_time", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":139, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"week_no", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":140, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"coupon_discount", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":141, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"coupon_discount_match", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":142, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ] - ], - "isStreaming":false - } - ], - "requireImplicitCasting":false - }, - { - "class":"org.apache.spark.sql.execution.datasources.LogicalRelation", - "num-children":0, - "relation":null, - "output":[ - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"household_id", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":131, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"basket_id", - "dataType":"long", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":132, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"day", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":133, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"product_id", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":134, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"quantity", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":135, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"sales_amount", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":136, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"store_id", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":137, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"discount_amount", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":138, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"transaction_time", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":139, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"week_no", - "dataType":"integer", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":140, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"coupon_discount", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":141, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ], - [ - { - "class":"org.apache.spark.sql.catalyst.expressions.AttributeReference", - "num-children":0, - "name":"coupon_discount_match", - "dataType":"float", - "nullable":true, - "metadata":{ - - }, - "exprId":{ - "product-class":"org.apache.spark.sql.catalyst.expressions.ExprId", - "id":142, - "jvmId":"c2853996-7342-46e3-989b-5f71d5a6d5b5" - }, - "qualifier":[ - - ] - } - ] - ], - "isStreaming":false - } - ] - }, - "debug":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.4.1/integration/spark", - "_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", - "classpath":{ - "openLineageVersion":"1.4.1", - "sparkVersion":"3.4.1", - "scalaVersion":"2.12.15", - "jars":[ - - ], - "classDetails":[ - { - "className":"org.apache.spark.sql.delta.catalog.DeltaCatalog", - "onClasspath":true - }, - { - "className":"org.apache.iceberg.catalog.Catalog", - "onClasspath":false - }, - { - "className":"com.google.cloud.spark.bigquery.BigQueryRelation", - "packageVersion":"0.22.2-SNAPSHOT", - "onClasspath":true - } - ] - }, - "system":{ - "sparkDeployMode":"client", - "javaVersion":"1.8.0_372", - "javaVendor":"Azul Systems, Inc.", - "osArch":"amd64", - "osName":"Linux", - "osVersion":"5.15.0-1049-azure", - "userLanguage":"en", - "userTimezone":"Etc/UTC" - }, - "config":{ - "extraListeners":"io.openlineage.spark.agent.OpenLineageSparkListener", - "openLineageConfig":{ - "endpoint":"api/v1/lineage", - "debugFacet":"enabled", - "url.param.code":"8kZl0bo2TJfnbpFxBv-R2v7xBDj-PgWMol3yUm5iP1vaAzFu9kIZGg==", - "facets.disabled":"[spark_unknown;]", - "namespace":"adb-4679476628690204.4#default", - "transport.type":"http", - "transport.url":"https://8e0a-50-35-69-138.ngrok-free.app" - }, - "catalogClass":"org.apache.spark.sql.internal.CatalogImpl" - }, - "logicalPlan":{ - "nodes":[ - { - "id":"AppendData@-520955523", - "desc":"AppendData RelationV2[household_id#187, basket_id#188L, day#189, product_id#190, quantity#191, sales_amount#192, store_id#193, discount_amount#194, transaction_time#195, week_no#196, coupon_discount#197, coupon_discount_match#198] spark_catalog.journey.transactions transactions, [overwriteSchema=true, path=wasbs://studio@clororetaildevadls.blob.core.windows.net/examples/data/csv/completejourney/silver/transactions], false, com.databricks.sql.transaction.tahoe.catalog.DeltaCatalog$StagedDeltaTableV2$DeltaV1WriteBuilder$$anon$1@d3b3679, false\n+- Relation [household_id#131,basket_id#132L,day#133,product_id#134,quantity#135,sales_amount#136,store_id#137,discount_amount#138,transaction_time#139,week_no#140,coupon_discount#141,coupon_discount_match#142] csv\n", - "children":[ - "LogicalRelation@-1939131325" - ] - }, - { - "id":"LogicalRelation@-1939131325", - "desc":"Relation [household_id#131,basket_id#132L,day#133,product_id#134,quantity#135,sales_amount#136,store_id#137,discount_amount#138,transaction_time#139,week_no#140,coupon_discount#141,coupon_discount_match#142] csv\n", - "children":[ - - ] - } - ] - } - }, - "spark_version":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.4.1/integration/spark", - "_schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet", - "spark-version":"3.4.1", - "openlineage-spark-version":"1.4.1" - }, - "processing_engine":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.4.1/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-1-0/ProcessingEngineRunFacet.json#/$defs/ProcessingEngineRunFacet", - "version":"3.4.1", - "name":"spark", - "openlineageAdapterVersion":"1.4.1" - } - } - }, - "job":{ - "namespace":"adb-4679476628690204.4#default", - "name":"adb-4679476628690204.4.azuredatabricks.net.append_data_exec_v1.silver_transactions", - "facets":{ - - } - }, - "inputs":[ - { - "namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "name":"/examples/data/csv/completejourney/transaction_data.csv", - "facets":{ - "dataSource":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.4.1/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", - "name":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "uri":"wasbs://studio@clororetaildevadls.blob.core.windows.net" - }, - "schema":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.4.1/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", - "fields":[ - { - "name":"household_id", - "type":"integer" - }, - { - "name":"basket_id", - "type":"long" - }, - { - "name":"day", - "type":"integer" - }, - { - "name":"product_id", - "type":"integer" - }, - { - "name":"quantity", - "type":"integer" - }, - { - "name":"sales_amount", - "type":"float" - }, - { - "name":"store_id", - "type":"integer" - }, - { - "name":"discount_amount", - "type":"float" - }, - { - "name":"transaction_time", - "type":"integer" - }, - { - "name":"week_no", - "type":"integer" - }, - { - "name":"coupon_discount", - "type":"float" - }, - { - "name":"coupon_discount_match", - "type":"float" - } - ] - } - }, - "inputFacets":{ - - } - } - ], - "outputs":[ - { - "namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "name":"/examples/data/csv/completejourney/silver/transactions", - "facets":{ - "dataSource":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.4.1/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-0-0/DatasourceDatasetFacet.json#/$defs/DatasourceDatasetFacet", - "name":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "uri":"wasbs://studio@clororetaildevadls.blob.core.windows.net" - }, - "schema":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.4.1/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-0-0/SchemaDatasetFacet.json#/$defs/SchemaDatasetFacet", - "fields":[ - { - "name":"household_id", - "type":"integer" - }, - { - "name":"basket_id", - "type":"long" - }, - { - "name":"day", - "type":"integer" - }, - { - "name":"product_id", - "type":"integer" - }, - { - "name":"quantity", - "type":"integer" - }, - { - "name":"sales_amount", - "type":"float" - }, - { - "name":"store_id", - "type":"integer" - }, - { - "name":"discount_amount", - "type":"float" - }, - { - "name":"transaction_time", - "type":"integer" - }, - { - "name":"week_no", - "type":"integer" - }, - { - "name":"coupon_discount", - "type":"float" - }, - { - "name":"coupon_discount_match", - "type":"float" - } - ] - }, - "storage":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.4.1/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-0-0/StorageDatasetFacet.json#/$defs/StorageDatasetFacet", - "storageLayer":"unity", - "fileFormat":"parquet" - }, - "columnLineage":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.4.1/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-0-1/ColumnLineageDatasetFacet.json#/$defs/ColumnLineageDatasetFacet", - "fields":{ - "household_id":{ - "inputFields":[ - { - "namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "name":"/examples/data/csv/completejourney/transaction_data.csv", - "field":"household_id" - } - ] - }, - "basket_id":{ - "inputFields":[ - { - "namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "name":"/examples/data/csv/completejourney/transaction_data.csv", - "field":"basket_id" - } - ] - }, - "day":{ - "inputFields":[ - { - "namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "name":"/examples/data/csv/completejourney/transaction_data.csv", - "field":"day" - } - ] - }, - "product_id":{ - "inputFields":[ - { - "namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "name":"/examples/data/csv/completejourney/transaction_data.csv", - "field":"product_id" - } - ] - }, - "quantity":{ - "inputFields":[ - { - "namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "name":"/examples/data/csv/completejourney/transaction_data.csv", - "field":"quantity" - } - ] - }, - "sales_amount":{ - "inputFields":[ - { - "namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "name":"/examples/data/csv/completejourney/transaction_data.csv", - "field":"sales_amount" - } - ] - }, - "store_id":{ - "inputFields":[ - { - "namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "name":"/examples/data/csv/completejourney/transaction_data.csv", - "field":"store_id" - } - ] - }, - "discount_amount":{ - "inputFields":[ - { - "namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "name":"/examples/data/csv/completejourney/transaction_data.csv", - "field":"discount_amount" - } - ] - }, - "transaction_time":{ - "inputFields":[ - { - "namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "name":"/examples/data/csv/completejourney/transaction_data.csv", - "field":"transaction_time" - } - ] - }, - "week_no":{ - "inputFields":[ - { - "namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "name":"/examples/data/csv/completejourney/transaction_data.csv", - "field":"week_no" - } - ] - }, - "coupon_discount":{ - "inputFields":[ - { - "namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "name":"/examples/data/csv/completejourney/transaction_data.csv", - "field":"coupon_discount" - } - ] - }, - "coupon_discount_match":{ - "inputFields":[ - { - "namespace":"wasbs://studio@clororetaildevadls.blob.core.windows.net", - "name":"/examples/data/csv/completejourney/transaction_data.csv", - "field":"coupon_discount_match" - } - ] - } - } - }, - "symlinks":{ - "_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.4.1/integration/spark", - "_schemaURL":"https://openlineage.io/spec/facets/1-0-0/SymlinksDatasetFacet.json#/$defs/SymlinksDatasetFacet", - "identifiers":[ - { - "namespace":"/examples/data/csv/completejourney/silver", - "name":"journey.transactions", - "type":"TABLE" - } - ] - } - }, - "outputFacets":{ - - } - } - ] -} \ No newline at end of file diff --git a/slack-archive/html/files/C056YHEU680/F05QC8C72TY.jpg b/slack-archive/html/files/C056YHEU680/F05QC8C72TY.jpg deleted file mode 100644 index e3403d0..0000000 Binary files a/slack-archive/html/files/C056YHEU680/F05QC8C72TY.jpg and /dev/null differ diff --git a/slack-archive/html/files/C056YHEU680/F05QJQD2SE7.jpg b/slack-archive/html/files/C056YHEU680/F05QJQD2SE7.jpg deleted file mode 100644 index e5b1f34..0000000 Binary files a/slack-archive/html/files/C056YHEU680/F05QJQD2SE7.jpg and /dev/null differ diff --git a/slack-archive/html/files/C065PQ4TL8K/F065PUT9SRL.png b/slack-archive/html/files/C065PQ4TL8K/F065PUT9SRL.png deleted file mode 100644 index 337d981..0000000 Binary files a/slack-archive/html/files/C065PQ4TL8K/F065PUT9SRL.png and /dev/null differ diff --git a/slack-archive/html/fonts/Lato-Bold.ttf b/slack-archive/html/fonts/Lato-Bold.ttf deleted file mode 100644 index b63a14d..0000000 Binary files a/slack-archive/html/fonts/Lato-Bold.ttf and /dev/null differ diff --git a/slack-archive/html/fonts/Lato-Regular.ttf b/slack-archive/html/fonts/Lato-Regular.ttf deleted file mode 100644 index 33eba8b..0000000 Binary files a/slack-archive/html/fonts/Lato-Regular.ttf and /dev/null differ diff --git a/slack-archive/html/scroll.js b/slack-archive/html/scroll.js deleted file mode 100644 index 46bab56..0000000 --- a/slack-archive/html/scroll.js +++ /dev/null @@ -1,5 +0,0 @@ -if (window.location.hash) { - document.getElementById(window.location.hash).scrollTo(); -} else { - scrollBy({ top: 99999999 }); -} diff --git a/slack-archive/html/search.html b/slack-archive/html/search.html deleted file mode 100644 index b6c4de9..0000000 --- a/slack-archive/html/search.html +++ /dev/null @@ -1,232 +0,0 @@ - - - - - - - Message Search - - - - - - - - - - - - - - diff --git a/slack-archive/html/style.css b/slack-archive/html/style.css deleted file mode 100644 index 2256710..0000000 --- a/slack-archive/html/style.css +++ /dev/null @@ -1,321 +0,0 @@ -/* Reset */ - -/* Box sizing rules */ -*, -*::before, -*::after { - box-sizing: border-box; -} - -/* Remove default margin */ -body, -h1, -h2, -h3, -h4, -p, -figure, -blockquote, -dl, -dd { - margin: 0; -} - -/* Remove list styles on ul, ol elements with a list role, which suggests default styling will be removed */ -ul[role='list'], -ol[role='list'] { - list-style: none; -} - -/* Set core root defaults */ -html:focus-within { - scroll-behavior: smooth; -} - -/* Set core body defaults */ -body { - min-height: 100vh; - text-rendering: optimizeSpeed; - line-height: 1.5; -} - -/* A elements that don't have a class get default styles */ -a:not([class]) { - text-decoration-skip-ink: auto; -} - -/* Make images easier to work with */ -img, -picture { - max-width: 100%; - display: block; -} - -/* Inherit fonts for inputs and buttons */ -input, -button, -textarea, -select { - font: inherit; -} - -/* Remove all animations, transitions and smooth scroll for people that prefer not to see them */ -@media (prefers-reduced-motion: reduce) { - html:focus-within { - scroll-behavior: auto; - } - - *, - *::before, - *::after { - animation-duration: 0.01ms !important; - animation-iteration-count: 1 !important; - transition-duration: 0.01ms !important; - scroll-behavior: auto !important; - } -} - -@font-face { - font-family: "Lato"; - src: url('fonts/Lato-Regular.ttf') format('truetype'); - font-weight: normal; - font-style: normal; -} - -@font-face { - font-family: "Lato"; - src: url('fonts/Lato-Bold.ttf') format('truetype'); - font-weight: bold; - font-style: normal; -} - -body, html { - font-family: 'Lato', sans-serif; - font-size: 14px; - color: rgb(29, 28, 29); -} - -a { - color: rgb(18, 100, 163); -} - -audio, video { - max-width: 400px; -} - -.messages-list { - padding-bottom: 20px; -} - -.messages-list .avatar { - height: 36px; - width: 36px; - border-radius: 7px; - margin-right: 10px; - background: #c1c1c1; -} - -.message-gutter { - display: flex; - margin: 10px; - scroll-margin-top: 120px; -} - -.message-gutter:target { - background-color: #fafafa; - border: 2px solid #39113E; - padding: 10px; - border-radius: 5px; -} - -.message-gutter div:first-of-type { - flex-shrink: 0; -} - -.message-gutter > .message-gutter { - /** i.e. replies in thread. Just here to be easily findable */ -} - -.sender { - font-weight: 800; - margin-right: 10px; -} - -.timestamp { - font-weight: 200; - font-size: 13px; - color: rgb(97, 96, 97); -} - -.header { - position: sticky; - background: #fff; - color: #616061; - top: 0; - left: 0; - padding: 10px; - min-height: 70px; - border-bottom: 1px solid #E2E2E2; - box-sizing: border-box; -} - -.header h1 { - font-size: 16px; - color: #1D1C1D; - display: inline-block; -} - -.header a { - color: #616061; -} - -.header a:active, .header a.current { - color: #000; -} - -.header .created { - float: right; -} - -.jumper { - display: inline-block; -} - -.jumper a { - margin: 2px; -} - -.text { - overflow-wrap: break-word; -} - -.file { - max-height: 270px; - margin-right: 10px; - margin-top: 10px; - border-radius: 4px; - border: 1px solid #80808045; - outline: none; -} - -.reaction { - background-color: #eaeaea; - display: inline-block; - border-radius: 10px; - font-size: .7em; - padding-left: 6px; - padding-right: 6px; - padding-bottom: 4px; - margin-right: 5px; - padding-top: 4px; -} - -.reaction img { - height: 16px; - width: 16px; - margin-right: 3px; - vertical-align: middle; - display: inline-block; -} - -.reaction span { - position: relative; - top: 1px; -} - -#index { - display: flex; - height: calc(100vh - 4px); -} - -#channels { - background: #39113E; - width: 250px; - color: #CDC3CE; - padding-top: 10px; - overflow: scroll; - padding-bottom: 20px; -} - -#channels ul { - margin: 0; - padding: 0; - list-style: none; -} - -#channels p { - padding-left: 20px; -} - -#channels .section { - font-weight: 800; - color: #fff; - margin-top: 10px; -} - -#channels .section:first-of-type { - margin-top: 0; -} - -#channels a { - padding: 5px; - display: block; - color: #CDC3CE; - text-decoration: none; - padding-left: 20px; - display: flex; - max-height: 28px; - white-space: pre; - text-overflow: ellipsis; - overflow: hidden; -} - -#channels a .avatar { - height: 20px; - width: 20px; - border-radius: 3px; - margin-right: 10px; - object-fit: contain; -} - -#channels a:hover { - background: #301034; - color: #edeced; -} - -#messages { - flex-grow: 1; -} - -#messages iframe { - height: 100%; - width: calc(100vw - 250px); - border: none; -} - -#search { - margin: 10px; - text-align: center; -} - -#search ul { - list-style: none; - display: flex; - flex-direction: column; - align-items: center; -} - -#search li { - padding: 5px; - border-bottom: 1px solid #E2E2E2; - background: hsl(0deg 0% 98%); - border-radius: 5px; - width: 600px; - text-align: left; - margin-bottom: 5px; -} - -#search a { - text-decoration: none; - color: unset; -} \ No newline at end of file diff --git a/slack-archive/index.html b/slack-archive/index.html deleted file mode 100644 index a480ca2..0000000 --- a/slack-archive/index.html +++ /dev/null @@ -1,10 +0,0 @@ -Slack

Public Channels

Private Channels

    DMs

      Group DMs

        Bots

          Archived Public Channels

            Archived Private Channels

              DMs (Deleted Users)

                \ No newline at end of file diff --git a/slack-archive/search.html b/slack-archive/search.html deleted file mode 100644 index 506aea0..0000000 --- a/slack-archive/search.html +++ /dev/null @@ -1,232 +0,0 @@ - - - - - - - Message Search - - - - - - - - - - - - - -