Skip to content

Commit

Permalink
[apache#2287] feat(hadoop-catalog): Enable Hadoop catalog module in G…
Browse files Browse the repository at this point in the history
…ravitino project (apache#2291)

### What changes were proposed in this pull request?

This PR enables the Hadoop catalog in Gravitino's build, so users can
use it in end-to-end test and local verification.

### Why are the changes needed?

Change the build script to enable the Hadoop catalog module.

Fix: apache#2287

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Local verification.
  • Loading branch information
jerryshao authored Feb 21, 2024
1 parent 6d5bb64 commit bb86d40
Show file tree
Hide file tree
Showing 8 changed files with 56 additions and 4 deletions.
4 changes: 2 additions & 2 deletions build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -612,8 +612,8 @@ tasks {
":catalogs:catalog-hive:copyLibAndConfig",
":catalogs:catalog-lakehouse-iceberg:copyLibAndConfig",
":catalogs:catalog-jdbc-mysql:copyLibAndConfig",
":catalogs:catalog-jdbc-postgresql:copyLibAndConfig"
// TODO. add fileset catalog to the distribution when ready.
":catalogs:catalog-jdbc-postgresql:copyLibAndConfig",
":catalogs:catalog-hadoop:copyLibAndConfig"
)
}

Expand Down
4 changes: 3 additions & 1 deletion catalogs/catalog-hadoop/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,9 @@ tasks {
from("src/main/resources")
into("$rootDir/distribution/package/catalogs/hadoop/conf")

// TODO. add configuration file later on.
include("hadoop.conf")
include("core-site.xml.template")
include("hdfs-site.xml.template")

rename { original ->
if (original.endsWith(".template")) {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#
# Copyright 2024 Datastrato Pvt Ltd.
# This software is licensed under the Apache License version 2.
#
com.datastrato.gravitino.catalog.hadoop.HadoopCatalog
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<!--
~ Copyright 2024 Datastrato Pvt Ltd.
~ This software is licensed under the Apache License version 2.
-->
<configuration>
</configuration>

16 changes: 16 additions & 0 deletions catalogs/catalog-hadoop/src/main/resources/hadoop.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#
# Copyright 2024 Datastrato Pvt Ltd.
# This software is licensed under the Apache License version 2.
#

# This file holds common properties for Hadoop catalog. All the created Hadoop catalog will
# leverage this conf file as default configuration. In the meantime, user could specify catalog
# properties to override the default configuration.

# location = hdfs://localhost:9000/<path-to-catalog>

# If we want to specify Hadoop catalog-related configuration like 'fs.defaultFS', we can do it like this:
# gravitino.bypass.fs.defaultFS = hdfs://localhost:9000, and 'gravitino.bypass' is the prefix that
# the configuration will be directly by pass to backend engine, and 'fs.defaultFS' is the configuration key.

# gravitino.bypass.fs.defaultFS = hdfs://localhost:9000
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<!--
~ Copyright 2024 Datastrato Pvt Ltd.
~ This software is licensed under the Apache License version 2.
-->
<configuration>
</configuration>
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,22 @@ public void testEntitiesSerDe() throws IOException {
catalogBytes, com.datastrato.gravitino.meta.CatalogEntity.class);
Assertions.assertEquals(catalogEntity, catalogEntityFromBytes);

// Test Fileset catalog
com.datastrato.gravitino.meta.CatalogEntity filesetCatalogEntity =
com.datastrato.gravitino.meta.CatalogEntity.builder()
.withId(catalogId)
.withName(catalogName)
.withComment(comment)
.withType(com.datastrato.gravitino.Catalog.Type.FILESET)
.withProvider(provider)
.withAuditInfo(auditInfo)
.build();
byte[] filesetCatalogBytes = protoEntitySerDe.serialize(filesetCatalogEntity);
com.datastrato.gravitino.meta.CatalogEntity filesetCatalogEntityFromBytes =
protoEntitySerDe.deserialize(
filesetCatalogBytes, com.datastrato.gravitino.meta.CatalogEntity.class);
Assertions.assertEquals(filesetCatalogEntity, filesetCatalogEntityFromBytes);

// Test SchemaEntity
Long schemaId = 1L;
String schemaName = "schema";
Expand Down
2 changes: 1 addition & 1 deletion meta/src/main/proto/gravitino_meta.proto
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ message Metalake {
message Catalog {
enum Type {
RELATIONAL = 0; // Catalog Type for Relational Data Structure, like db.table.
FILE = 1; // Catalog Type for File System (including HDFS, S3, etc.), like path/to/file.
FILESET = 1; // Catalog Type for File System (including HDFS, S3, etc.), like path/to/file.
STREAM = 2; // Catalog Type for Streaming Data, like kafka://topic.
}

Expand Down

0 comments on commit bb86d40

Please sign in to comment.