Skip to content

Commit

Permalink
[#551] feat(doc): Add overview doc (#558)
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

This PR adds an overview doc to explain Gravitino at a high level.

### Why are the changes needed?

Fix: #551 

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

No
  • Loading branch information
jerryshao authored Oct 23, 2023
1 parent 38e57af commit 2225587
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 0 deletions.
Binary file added docs/assets/gravitino-model-arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/metadata-model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
43 changes: 43 additions & 0 deletions docs/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
title: "Overview"
date: 2023-10-19T15:33:00-08:00
license: "Copyright 2023 Datastrato.
This software is licensed under the Apache License version 2."
---

# Overview

## Introduction

Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages the metadata directly in different sources, types, and regions. It also provides users unified access to the metadata for both data and AI assets.

![Gravitino Architecture](assets/gravitino-architecture.png)

Gravitino aims to provide several key features:

* SSOT (Single Source of Truth) for multi-regional data with geo-distributed architecture support.
* Unified Data + AI assets managements for both users and engines.
* Security in one place, centralize the security for different sources.
* Built-in data management + data access management.

## Architecture

![Gravitino Model and Arch](assets/gravitino-model-arch.png)

* **Functionality Layer**: Gravitino provides a set of APIs for users to manage and govern the
metadata, including standard metadata creation, update, and delete operations. In the meantime, it also provides the ability to govern the metadata in a unified way, including access control, discovery, and others.
* **Interface Layer**: Gravitino provides standard REST APIs as the interface layer for users. It will also provide Thrift and JDBC interfaces in the future.
* **Core Object Model**: Gravitino defines a generic metadata model to represent the metadata in different sources and types, manages them in a unified way.
* **Connection Layer**: In the connection layer, Gravitino provides a set of connectors to connect to different metadata sources, including Hive, MySQL, PostgreSQL, and others. It also provides the ability to connect and manage heterogeneous metadata other than Tabular data.

## Terminology

The model of Gravitino

![Gravitino Model](assets/metadata-model.png)

* **Metalake**: The top-level container for metadata. Typically, one group has one metalake to manage all the metadata in it. Each metalake exposes a three-level namespace(catalog.schema.table) to organize the data.
* **Catalog**: catalog is a collection of metadata from a specific metadata source. Each catalog will have a related connector to connect to the specific metadata source.
* **Schema**: Schema is equivalent to database, Schemas only exist in the specific catalogs that support relational metadata sources, such as Hive, MySQL, PostgreSQL, and others.
* **Table**: The lowest level in the object hierarchy for catalogs that support relational metadata sources. Tables can be created in the specific schemas in the catalogs.
* **Model**: Model represents the metadata in the specific catalogs that support model management.

0 comments on commit 2225587

Please sign in to comment.