-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provide simple implementation of one-level lineage optimized for parent jobs #2657
base: main
Are you sure you want to change the base?
Conversation
…nt jobs Signed-off-by: Julien Le Dem <[email protected]>
✅ Deploy Preview for peppy-sprite-186812 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@@ -165,6 +165,9 @@ public void testGetLineage() { | |||
.containsAll( | |||
expected.getOutput().map(ds -> ds.getDatasetRow().getUuid()).stream()::iterator); | |||
} | |||
|
|||
|
|||
lineageDao.getDirectLineageFromParent("foo", "bar"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, I will clean up that test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Signed-off-by: Julien Le Dem <[email protected]>
Codecov Report
@@ Coverage Diff @@
## main #2657 +/- ##
============================================
- Coverage 83.35% 83.22% -0.14%
- Complexity 1295 1304 +9
============================================
Files 244 246 +2
Lines 5948 6020 +72
Branches 279 282 +3
============================================
+ Hits 4958 5010 +52
- Misses 844 859 +15
- Partials 146 151 +5
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the feature. I put some questions in comments as I would like to understand more why do we need separate DAO methods to support this API call.
api/src/main/java/marquez/db/mappers/SimpleLineageEdgeMapper.java
Outdated
Show resolved
Hide resolved
…ith name Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
1354701
to
53ff595
Compare
Signed-off-by: Julien Le Dem <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor comments added.
I think that SQL query and tests are already fine.
|
||
public record DirectLineageEdge( | ||
JobId job1, | ||
String direction, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using existing IOType
enum? It took me some time to understand what direction
is.
Would it make sense to replace job1
,job2
with job
, upstreamJob
?
@Consumes(APPLICATION_JSON) | ||
@Produces(APPLICATION_JSON) | ||
@Path("/lineage/direct") | ||
public Response getDirectLineage(@QueryParam("parentJobNodeId") @NotNull NodeId parentJobNodeId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mind updating openapi.yaml
and changeling
Problem
The main lineage graph API focuses on individual jobs and is not easy to use when one wants coverage of a all the children of a parent job.
Solution
This new endpoint provides a non-recursive one level of lineage for all the children of a given parent job.
This will facilitate for example if someone wants to retrieve all the lineage of a given Airflow DAG.
It will return all its children (tasks) and all the datasets they consume or produce as well as the other tasks and DAGs producing and consuming them.
Example:
GET /api/v1/lineage/simple?nodeId=job:default:order_analysis
Checklist
CHANGELOG.md
(Depending on the change, this may not be necessary)..sql
database schema migration according to Flyway's naming convention (if relevant)