Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate metadata management services for integrating with the data platform #855

Open
9 tasks
blarghmatey opened this issue Oct 4, 2023 · 1 comment
Open
9 tasks
Assignees
Labels
product:data-platform Issues related to the Data Platform product

Comments

@blarghmatey
Copy link
Member

User Story

  • As a data platform owner I want to be able to provide a single location for discovering, documenting, and understanding the lineage of data
  • As a data platform consumer I want to be able to find data relevant to my needs and understand its context

Description/Context

In order to provide a cohesive view of our data platform and the various data sets that are available we want to implement a cross-cutting metadata platform. This will provide visibility into the full lineage graph of a given data asset (e.g. database table, dashboard report, report export delivered via Dagster, etc.). There are numerous open source and commercial options available, so the purpose of this issue is to establish a set of evaluation criteria and select a solution that we would like to implement.

Acceptance Criteria

  • Integrations are available for consuming metadata from our various platform components
    • Dagster
    • Trino
    • dbt
    • Superset
  • Supports column level lineage
  • Has search/discovery functionality for data sets
  • Provides a means of documenting data sets
  • Supports tagging and tag propagation based on lineage (e.g. tag a column in raw with pii and propagate to mart tables/dashboards)

Plan/Design

Review relevant documentation and pricing information available for each platform. Perform a simple proof of concept implementation of the top contenders.

@blarghmatey blarghmatey transferred this issue from mitodl/ol-infrastructure Oct 4, 2023
@blarghmatey blarghmatey added the product:data-platform Issues related to the Data Platform product label Oct 4, 2023
@blarghmatey blarghmatey self-assigned this Oct 4, 2023
@blarghmatey
Copy link
Member Author

Products to be considered:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product:data-platform Issues related to the Data Platform product
Projects
None yet
Development

No branches or pull requests

1 participant