You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the Marquez API for OpenLineage events (/api/v1/lineage) accepts one event per request, as seen in OpenLineageResource.java#L67. While this is suitable for real-time ingestion, it becomes inefficient when we need to ingest multiple events simultaneously.
Use Case:
Database Migration or Restoration: When changing the database or restoring from backups, we may need to re-ingest a large number of events to rebuild the lineage graph.
Bulk Event Replay: In scenarios like system recovery or batch processing, ingesting events one by one is not practical.
Performance Optimization: Reducing the number of HTTP requests can significantly improve ingestion performance.
Proposal:
New Endpoint: Introduce a batch ingestion endpoint (e.g., /api/v1/lineage/batch) that accepts an array of OpenLineage events.
Batch Processing: Update the OpenLineageResource class to handle a list of events in a single request.
Response Format: Provide a response that indicates the success or failure of each event within the batch.
(Or even update the current one /api/v1/lineage to accept both options)
Benefits:
Efficiency: Streamlines the ingestion process for multiple events.
Scalability: Enhances Marquez's ability to handle large-scale data operations.
User Convenience: Simplifies workflows that require bulk event ingestion.
The text was updated successfully, but these errors were encountered:
Thanks for the suggestion, @algorithmy1! We couldn't agree more on the benefits you outlined. The good news is that we've been prototyping such an endpoint for OpenLineage batch events, see v2.LineageResource.collectBatchOf(BatchOfEvents). The endpoint will be available in Marquez 0.51.0.
Currently, the Marquez API for OpenLineage events (
/api/v1/lineage
) accepts one event per request, as seen in OpenLineageResource.java#L67. While this is suitable for real-time ingestion, it becomes inefficient when we need to ingest multiple events simultaneously.Use Case:
Proposal:
/api/v1/lineage/batch
) that accepts an array of OpenLineage events.OpenLineageResource
class to handle a list of events in a single request.(Or even update the current one
/api/v1/lineage
to accept both options)Benefits:
The text was updated successfully, but these errors were encountered: