Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide simple implementation of one-level lineage optimized for parent jobs #2657

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

julienledem
Copy link
Member

Problem

The main lineage graph API focuses on individual jobs and is not easy to use when one wants coverage of a all the children of a parent job.

Solution

This new endpoint provides a non-recursive one level of lineage for all the children of a given parent job.
This will facilitate for example if someone wants to retrieve all the lineage of a given Airflow DAG.
It will return all its children (tasks) and all the datasets they consume or produce as well as the other tasks and DAGs producing and consuming them.

Example:
GET /api/v1/lineage/simple?nodeId=job:default:order_analysis

{
  "parent": {
    "namespace": "default",
    "name": "order_analysis"
  },
  "children": [
    {
      "job": {
        "namespace": "default",
        "name": "order_analysis.find_popular_products"
      },
      "inputs": [
        {
          "dataset": {
            "namespace": "postgres://host.docker.internal:5435",
            "name": "postgres.public.orders"
          },
          "consumers": null,
          "producers": [
            {
               "job": {
                "namespace": "default",
                "name": "order_analysis.import_orders"
               },
               "parent": {
                  "namespace": "default", 
                  "name": "order_analysis"
               }
            }
          ]
        }
      ],
     "outputs": [
         ...   
      ]
    ),
    ...
  ]
}

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added the api API layer changes label Oct 23, 2023
@netlify
Copy link

netlify bot commented Oct 23, 2023

Deploy Preview for peppy-sprite-186812 ready!

Name Link
🔨 Latest commit e7af696
🔍 Latest deploy log https://app.netlify.com/sites/peppy-sprite-186812/deploys/6542e889e1c70e00082b2cc3
😎 Deploy Preview https://deploy-preview-2657--peppy-sprite-186812.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@@ -165,6 +165,9 @@ public void testGetLineage() {
.containsAll(
expected.getOutput().map(ds -> ds.getDatasetRow().getUuid()).stream()::iterator);
}


lineageDao.getDirectLineageFromParent("foo", "bar");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, I will clean up that test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: Julien Le Dem <[email protected]>
@codecov
Copy link

codecov bot commented Oct 23, 2023

Codecov Report

Merging #2657 (1354701) into main (f6be002) will decrease coverage by 0.14%.
The diff coverage is 72.60%.

❗ Current head 1354701 differs from pull request most recent head e7af696. Consider uploading reports for the commit e7af696 to get more accurate results

@@             Coverage Diff              @@
##               main    #2657      +/-   ##
============================================
- Coverage     83.35%   83.22%   -0.14%     
- Complexity     1295     1304       +9     
============================================
  Files           244      246       +2     
  Lines          5948     6020      +72     
  Branches        279      282       +3     
============================================
+ Hits           4958     5010      +52     
- Misses          844      859      +15     
- Partials        146      151       +5     
Files Coverage Δ
api/src/main/java/marquez/db/Columns.java 79.74% <100.00%> (ø)
...java/marquez/service/models/EventTypeResolver.java 96.15% <100.00%> (+0.15%) ⬆️
api/src/main/java/marquez/db/LineageDao.java 50.00% <50.00%> (ø)
...src/main/java/marquez/api/OpenLineageResource.java 86.11% <50.00%> (-4.52%) ⬇️
...va/marquez/db/mappers/DirectLineageEdgeMapper.java 80.00% <80.00%> (ø)
.../src/main/java/marquez/service/LineageService.java 78.48% <70.00%> (-2.58%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the feature. I put some questions in comments as I would like to understand more why do we need separate DAO methods to support this API call.

api/src/main/java/marquez/db/LineageDao.java Outdated Show resolved Hide resolved
api/src/main/java/marquez/db/LineageDao.java Outdated Show resolved Hide resolved
api/src/main/java/marquez/db/LineageDao.java Outdated Show resolved Hide resolved
Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor comments added.
I think that SQL query and tests are already fine.


public record DirectLineageEdge(
JobId job1,
String direction,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using existing IOType enum? It took me some time to understand what direction is.
Would it make sense to replace job1,job2 with job, upstreamJob?

@Consumes(APPLICATION_JSON)
@Produces(APPLICATION_JSON)
@Path("/lineage/direct")
public Response getDirectLineage(@QueryParam("parentJobNodeId") @NotNull NodeId parentJobNodeId) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mind updating openapi.yaml and changeling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes client/java
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants