Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#299] Added E2E tests for Hive schemas and tables. #536

Closed
wants to merge 184 commits into from

Conversation

justinmclean
Copy link
Member

What changes were proposed in this pull request?

Added E2E tests. Note Currently, testTableColumnSwap fails due to a bug.

Why are the changes needed?

To test E2E.

Fix: #299

Does this PR introduce any user-facing change?

None.

How was this patch tested?

Test pass locally.

jerryshao and others added 30 commits April 23, 2023 10:09
### What changes were proposed in this pull request?

This PR propose the schema and type spec for Unified Catalog. This spec
is used to describe how metadata is organized in the system.

### Why are the changes needed?

This PR defines the basic metadata schema model, which will be used in
the system for memory structure, on-wire protocol and serialization
protocol.

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

N/A
…ndecy and plugin version (#18)

### What changes were proposed in this pull request?

This PR improves the gradle build file to use gradle catalog mechanism
to centralize the dependency and plugin versions. Also removes some
redundant codes.

### Why are the changes needed?

Currently there's no centralized version control place for project, this
will easily lead to dependency chaos. After some investigation, we
choose to use gradle's default catalog mechanism to manage all the
versions in a central place.

Fix: #17

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

Local manual test.
### What changes were proposed in this pull request?

This PR proposes to implement the schema metadata and type system in
Java and Protobuf.

### Why are the changes needed?

This PR defines a core metadata system for unified catalog.

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

This PR adds UT to cover the schema definitions.
…ema (#20)

### What changes were proposed in this pull request?

This PR adds JSON serde support for schema system.

### Why are the changes needed?

The adds of JSON serde support will help to support REST API for Unified
Catalog.

Fix: #16 

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

And new UTs.
…pport for schema system (#26)

### What changes were proposed in this pull request?

This PR proposes to add protobuf SerDe support for schema support. The
protobuf SerDe support will be mainly used in schema persistence and RPC
communication.

### Why are the changes needed?

The adds of protobuf SerDe support will be used in schema persistence
and RPC communication.

Fix: #21

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Adds new UT to cover the codes.
…#30)

### What changes were proposed in this pull request?

As discussed in #22 , we planned to rename the project from Unified
Catalog to Graviton. This PR aims to change all the affected items from
Unified Catalog to Graviton.


Fix: #22 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Existing UTs.
### What changes were proposed in this pull request?

This PR introduces a Spark-style like strong typed config system for the
project.

### Why are the changes needed?

The config system is a cornerstone of the project. By comparing
different config system implementations, Spark's one is strong type
verified, and also can extend to other functions, so introducing a
Spark-style like config system for the project.

Fix: #27 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Add new UTs to cover the test
…tifier (#33)

### What changes were proposed in this pull request?

This PR is a preconditional PR to support REST API for Graviton. This PR
defines:

1. Entity's name identifier to distinguish between entities.
2. Entity operation interfaces. We will later on implement this
interface to manipulate the entities.

### Why are the changes needed?

This PR is a preconditional PR to support REST API for Graviton.

Fix: #32 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

New UTs to cover the test.
### What changes were proposed in this pull request?

This PR proposes to add Jetty server support for Graviton.

### Why are the changes needed?

The purpose of introducing Jetty as embedded web server is that:

1. Jetty is a light-weighted web server that can be easily embedded into
our project compared to Tomcat and other services.
2. We basically don't want to introduce a bunch of Springboot related
code to build our REST API. In that case, Jersey + Jetty would be the
best choice.
3. If later on the performance of Jetty cannot meet our requirements, we
can shift to use other web servers instead.

Fix: #5

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Local manual test
…APIs (#38)

### What changes were proposed in this pull request?

This PR adds Jersey support with Jetty for REST APIs, also adding tenant
operation REST APIs.

### Why are the changes needed?

The REST API is mainly exposed to users and compute engines to
manipulate the metadata. Adding a basic Jersey support as well as
referenced tenant operation implementations.

Fix: #34 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Manual e2e tests. Jersey test framework will be added later.
…(#40)

### What changes were proposed in this pull request?

This PR proposes to introduce Jersey test framework and mock tool, so we
could add jersey UTs later on. Also complement the left tenant operation
UTs for last PR.

### Why are the changes needed?

This PR introduces jersey test framework, which could be used later on
when we add more rest interfaces.

Fix: #39

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

UTs
… (#45)

### What changes were proposed in this pull request?

This PR changes include:

1. refactors the whole metadata system, which simplifies the current
tenant/lakehouse/zone/table schema structure to
lakehouse/catalog/<meta-structure> (refer to what metacat did).
2. Add catalog interfaces and define a series of catalog behaviors. the
specific catalog implementation could inherit these interfaces to
achieve its own one (refer to what Spark connector/catalog did).
3. rearchitect the code structure.

### Why are the changes needed?

As #43 described, the current schema system is a bit difficult to manage
(to pursue virtual semantics), we needs to simplify the current design.

Fix: #44

### Does this PR introduce _any_ user-facing change?

We're still in the early stage of the project, the change is unavoidable
and acceptable.

### How was this patch tested?

UTs to cover the codes.
### What changes were proposed in this pull request?

This PR proposes to add the entity store interface for graviton. The
implementation of this interface will store the entities to the
underlying storage system.

### Why are the changes needed?

This is the basic interface for entity store, which defines the
supported behavior of underlying storage.

Fix: #48 

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

Add UTs to cover the code.
…stem (#47)

### What changes were proposed in this pull request?

This PR tracks the work of #46 to update the rfc-1 to match the
refactoring work in #45 .

### Why are the changes needed?

This is the subtask for the schema system refactoring work.

Fix: #46 

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

NA
### What changes were proposed in this pull request?

This PR adds the `EntitySerDe` interface and Protobuf implementation for
Graviton.

### Why are the changes needed?

The SerDe interface will be used for storage system to serialize and
deserialize entity objects when interacting with underlying storage.

Fix: #51 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Existing UTs to cover the code.
### What changes were proposed in this pull request?

1. This PR adds the HiveCatalog implementation for Graviton.
2. This HiveCatalog includes a hive client. 
3. This hive client is created by reflection and can support hive1,
hive2, and hive3.

### Why are the changes needed?

With this change, we could add maintain namespace for Graviton as the
next step.

Fix: #60 

### Does this PR introduce _any_ user-facing change?
As the early stage of the project, the change is unavoidable.

### How was this patch tested?

UTs to cover the test.
### What changes were proposed in this pull request?

This PR propose to do several reafactoring works:

1. Introduce a API module and extract all the interfaces to this module.
This module is mainly for users and graviton internally to manipulate
metadata.
2. Introduce a comment module and implement all the DTOs for graviton.
These DTOs will be mainly used to transmit objects between client and
server.
3. Add Lakehouse update request to the service.

### Why are the changes needed?

With this change, we could add a common client for Graviton as the next
step, and implement a Spark/trino connector later on.

Fix: #56 

### Does this PR introduce _any_ user-facing change?

As the early stage of the project, the change is unavoidable.

### How was this patch tested?

UTs to cover the test.
…raviton (#68)

### What changes were proposed in this pull request?

This PR propose to change the Graviton terminology `Lakehouse` to
`Metalake`.

### Why are the changes needed?

Compared to `Lakehouse`, `Metalake` is more closer to the goal of
Graviton as a unified metadata repository (lake), so proposing to change
the name.

Fix: #67 

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

Existing tests.
### What changes were proposed in this pull request?

As a cornerstone work of building Graviton client, this PR proposes to
add REST client support for Graviton Client. This work is mainly
referred from Apache Iceberg.

### Why are the changes needed?

This is the cornerstone work of building Graviton Client.

Fix: #64 

### Does this PR introduce _any_ user-facing change?

NA

### How was this patch tested?

Add UTs to cover the code.
…rt (#70)

### What changes were proposed in this pull request?

This PR proposes to add Graviton client metalake manipulation support.

### Why are the changes needed?

This PR is a part of the work to build a Graviton client.

Fix: #66 

### Does this PR introduce _any_ user-facing change?

This PR introduces new `GravitonClient` and `GravitonMetalake`
interfaces.

### How was this patch tested?

Add new UTs to cover the code.
Add a minimal NOTICE file so any content from 3rd party ALvL NOTICE
files or required 3rd party notices can be placed here.
### What changes were proposed in this pull request?

This PR adds the support of catalog rest implementation for Graviton.

### Why are the changes needed?

With this, users could issue REST requests to manipulate catalogs.

Fix: #72 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Add UTs to test.
Set contributing expectations for external contributors.
…talog (#74)

### What changes were proposed in this pull request?

This HiveCatalog will be used for `SupportsNamespaces` interface to
implement create/drop/alter namespaces in the hive.

### Why are the changes needed?

1. Creating a graviton namespace equivalent to creating a hive database.
2. Dropping a graviton namespace equivalent to dropping a hive database.
3. Altering a graviton namespace properties equivalent to Altering a
hive database properties.


### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Added `HiveNamespaceTest` unit test.
mchades and others added 17 commits October 12, 2023 15:43
### What changes were proposed in this pull request?
Update the documentation in `deleteColumn` to clarify the usage of
`ifExists`.

### Why are the changes needed?
It's unclear how `ifExists` works

Fix: #460 

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
UT added
### What changes were proposed in this pull request?

Change the `TestGravitonServer` to use random port instead of a default
one to avoid port conflicts.

### Why are the changes needed?

The port conflicts issue makes the test quite flaky, see #498

Fix: #498 

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.
…ention (#502)

### What changes were proposed in this pull request?
Rename some Hive table properties' name to follow the property name
convention

### Why are the changes needed?
Follow the unified standard for Graviton property

Fix: #490 

### Does this PR introduce _any_ user-facing change?
Yes, some names of Hive table properties changed

### How was this patch tested?
Existing UTs
### What changes were proposed in this pull request?
logs are located at `integration-test/build/integration-test.log`, while
github cat logs from `integration-test/build/graviton-server.log`

### Why are the changes needed?

Fix: #503 

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
### What changes were proposed in this pull request?

In this PR, we introduce the following points:
- Remove the catalog-related configuration from `Graviton` server
configuration file.
- Support setting catalog configuration with a separate config file
located in `catalogs` module
- Provide catalog-related configurations for a specific engine with an
unique prefix.

### Why are the changes needed?

Fix: #478 
Fix: #196 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

New UTs
…eberg REST server (#468)

### What changes were proposed in this pull request?
support load HDFS config files for Iceberg REST server, support multiple
classpath for `graviton.auxService.iceberg-rest.classpath`, such as
`catalogs/lakehouse-iceberg/libs, catalogs/lakehouse-iceberg/conf`

### Why are the changes needed?
HDFS configs are complicated, we should put `core-site.xml` and
`hdfs-site.xml` to the classpath

Fix: #445 

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
1. setup local HDFS cluster.
2. start the graviton server with HDFS config files.
…in Trino (#506)

### What changes were proposed in this pull request?

Implementing data insertion and selection in Trino using the Graviton
Connector.

### Why are the changes needed?

Fix: #385 Implement data insertion and selection in Trino using the
Graviton Connector.

### Does this PR introduce _any_ user-facing change?


### How was this patch tested?
1. UT
### What changes were proposed in this pull request?
Throw an exception when encountering a non-nullable column type in Hive
catalog

### Why are the changes needed?
Currently, we only support Hive 2.x for Hive catalog, but 'NOT NULL'
column constraint was supported since Hive3.0

Fix: #462 

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
UTs added
…473)

### What changes were proposed in this pull request?

Implement `catalogPropertiesMetadata` for `HiveCatalogOperations`

### Why are the changes needed?

Currently `catalogPropertiesMetadata` will return an empty map, and we
need to implement it to return a real value for the hive property
metadata.


Fix: #470 
Fix: #362 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Add test `testCatalogProperty` in `TestHiveCatalog`
### What changes were proposed in this pull request?
1. MySQL bind 0.0.0.0
2. create iceberg user and grant all priviges

### Why are the changes needed?
Iceberg JDBC IT need access mysql

Fix: #508

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
1. MySQL client out of docker could connect MySQL server using `iceberg`
user
…nfigurations that will be passed by to specific engines (#510)

### What changes were proposed in this pull request?

Make sure Gravition configuration like `a.b` will overwrite
configurations(`gravition.passby.a.b`) that will be passby to `Hive`

### Why are the changes needed?

The Graviton configuration has a higher priority, so it should overwrite
the pass-by configurations.

Fix: #509 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

UT
…ata (#486)

### What changes were proposed in this pull request?
Implement catalogPropertiesMetadata、tablePropertiesMetadata for
IcebergCatalogOperations

### Why are the changes needed?
Currently catalogPropertiesMetadatatablePropertiesMetadata will return
an empty map, and we need to implement it to return a real value for the
iceberg property metadata.

Fix: #446 

### Does this PR introduce any user-facing change?
N/A

### How was this patch tested?
Add test testCatalogProperty in TestIcebergCatalog
Add test testTableProperty in TestIcebergTable
… (#515)

### What changes were proposed in this pull request?

Add UT to test restart and reopen for `RocksDBBackend`

### Why are the changes needed?

Fully test `RocksDBBackend` to verify if it works well.

Fix: #514 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

UT
…eIT (#493)

### What changes were proposed in this pull request?

add hive catalog to IcebergRESTServiceIT
1. custom graviton config file with different Iceberg catalog types
2. The hive catalog warehouse location is using localfs to bypass HDFS
3. unify test namespace to `iceberg_rest_` prefix, to drop all test
namespace and tables before each test.

### Why are the changes needed?


Part of: #480

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
1. existing UTs
4. HiveCatalog UTs
@@ -15,7 +15,7 @@ public class HiveCatalogPropertiesMeta extends BaseCatalogPropertiesMetadata {
public static final String CLIENT_POOL_SIZE = "client.pool-size";
public static final int DEFAULT_CLIENT_POOL_SIZE = 1;

public static final String METASTORE_URIS = "metastore.uris";
public static final String METASTORE_URIS = "hive.metastore.uris";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you change the value of METASTORE_URIS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it has been change elsewhere and tests fail unless it has this value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to change it as it's designed as the key of the configuration entry for gravitino that will be translated into hive.metastore.uris for Hive.

Copy link
Member Author

@justinmclean justinmclean Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can explain why all of the tests fail without the change and how to fix the tests I can do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to 'CatalogHiveIT#createCatalog' and add the following entires to the properties map when creating catalog:

 
    properties.put(METASTORE_URIS, HIVE_METASTORE_URIS);
    properties.put(
        CATALOG_BYPASS_PREFIX + HiveConf.ConfVars.METASTORETHRIFTCONNECTIONRETRIES.varname, "30");
    properties.put(
        CATALOG_BYPASS_PREFIX + HiveConf.ConfVars.METASTORETHRIFTFAILURERETRIES.varname, "30");
    properties.put(
        CATALOG_BYPASS_PREFIX + HiveConf.ConfVars.METASTORE_CLIENT_CONNECT_RETRY_DELAY.varname,
        "5");

And it works well with your test cases;

image

What I 'm trying to say is that properties passed to the metalake.createCatalog were defined by user and we have clear the fact that Hive configuration keys and Gravitino keys are not necessarily the same. If you have any further problem, please let me know, thanks.

Copy link
Member Author

@justinmclean justinmclean Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is allready done in this code:

  public static HiveConf hiveConfig() {
    HiveConf hiveConf = new HiveConf();
    hiveConf.set(HiveConf.ConfVars.METASTOREURIS.varname, HIVE_METASTORE_URIS);

    return hiveConf;
  }

  public static Map<String, String> hiveConfigProperties() {
    Map<String, String> catalogProps = Maps.newHashMap();
    catalogProps.put("provider", "hive");
    catalogProps.put(HiveConf.ConfVars.METASTOREURIS.varname, HIVE_METASTORE_URIS);
    catalogProps.put(HiveConf.ConfVars.METASTORETHRIFTCONNECTIONRETRIES.varname, "30");
    catalogProps.put(HiveConf.ConfVars.METASTORETHRIFTFAILURERETRIES.varname, "30");
    catalogProps.put(HiveConf.ConfVars.METASTORE_CLIENT_CONNECT_RETRY_DELAY.varname, "5");

    return catalogProps;
  }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's common code in GravitonITUtils.

@yuqi1129
Copy link
Contributor

@justinmclean, could you please rebase your PR and resolve the conflict?

@justinmclean
Copy link
Member Author

@mchades is looking into it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest move this to packge com.datastrato.graviton.integration.test.catalog.hive

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should it move when it's dealing with E2E schema tests and not catalog tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's schema tests about the Catalog-Hive, right? Catalog-Hive including Hive catalog, Hive schema, and Hive table, Therefore, I believe the file should be moved under com.datastrato.graviton.integration.test.catalog.hive. The directory layout will then look like:

integration/test/catalog/hive
├── CatalogIT.java
├── schema
│   └── SchemaIT.java
└── table
    └── TableIT.java

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep Hive in the name as per the WET vs DRY principle for tests

@justinmclean justinmclean self-assigned this Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Supplemental e2e coverage test
8 participants