enrich the doc

apache · Oct 24, 2023 · a7d533d · a7d533d
1 parent 4d3aa90
commit a7d533d
Showing 1 changed file with 127 additions and 12 deletions.
diff --git a/docs/gravitino-manage-hive.md b/docs/gravitino-manage-hive.md
@@ -29,7 +29,7 @@ Example JSON:
 
 ```json
    {
-       "name": "test",
+       "name": "test_hive_catalog",
        "comment": "my test Hive catalog",
        "type": "RELATIONAL",
        "provider": "hive",
@@ -39,22 +39,137 @@ Example JSON:
    }
 ```
 
-* `provider`: Set this to "hive" to use Hive as the catalog provider.
-* `metastore.uris`: This is a required configuration, and it should be the Hive metastore service URIs.
-* Other configuration parameters with the `gravitino.bypass.` prefix can be added to the "properties" section and passed down to the underlying Hive metastore.
+* `name`: The name of the Hive catalog to be created.
+* `comment`:  Optional, user custom catalog comment.
+* `provider`: Must set this to "hive" in order to use Hive as the catalog provider.
+* `type`: Must set this to "RELATIONAL" because Hive has a relational data structure, like `db.table`.
+* `properties`: The properties of the Hive catalog. More properties information see the following catalog properties table.
 
-### configuration
+### catalog properties
 
-| Configuration item | Description                                                                                | value                     |
-|--------------------|--------------------------------------------------------------------------------------------|---------------------------|
-| `metastore.uris`   | Hive metastore service address, separate multiple addresses with commas                    | `thrift://127.0.0.1:9083` |
-| `client.pool-size` | The maximum number of Hive metastore clients in the pool for Gravitino. 1 by default value | 1                         |
+| Property name       | Description                                                                                                                  | example value                                                                                                            | Since version |
+|---------------------|------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|---------------|
+| `metastore.uris`    | This is a required configuration, and it should be the Hive metastore service URIs, separate multiple addresses with commas. | `thrift://127.0.0.1:9083`                                                                                                | 0.2.0         |
+| `client.pool-size`  | The maximum number of Hive metastore clients in the pool for Gravitino. 1 by default value.                                  | 1                                                                                                                        | 0.2.0         |
+| `gravitino.bypass.` | Property name with this prefix will be passed down to the underlying HMS client for use. Empty by default value.             | `gravitino.bypass.hive.metastore.failure.retries = 3` indicate 3 times of retries upon failure of Thrift metastore calls | 0.2.0         |
 
-## After the catalog is initialized
+## Creating a Hive Schema
 
-You can manage and operate on tables using the following URL format:
+After the catalog is created, you can submit a schema JSON example to the Gravitino server using the URL format:
+
+```shell
+http://{GravitinoServerHost}:{GravitinoServerPort}/api/metalakes/{metalake}/catalogs/{catalog}/schemas
+```
+
+Example JSON:
+
+```json
+{
+    "name": "test_schema",
+    "comment": "my test schema",
+    "properties": {
+        "location": "/user/hive/warehouse"
+    }
+}
+```
+
+* `name`: The name of the Hive database to be created.
+* `comment`: Optional, user custom Hive database comment.
+* `properties`: The properties of the Hive database. More properties information see the following schema properties table.
+
+### schema properties
+
+| Property name       | Description                                                                                                                                                     | example value                            | Since version |
+|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------|---------------|
+| `location`          | The directory for Hive database storage. Not required, HMS will use the value of `hive.metastore.warehouse.dir` in the Hive conf file hive-site.xml by default. | `/user/hive/warehouse`                   | 0.1.0         |
+| `gravitino.bypass.` | Property name with this prefix will be passed down to the Hive database parameters without the prefix.                                                          | `"gravitino.bypass.my-key" = "my-value"` | 0.2.0         |
+
+
+## Creating a Hive Table
+
+After the schema is created, you can submit a table JSON example to the Gravitino server using the URL format:
 
 ```shell
 http://{GravitinoServerHost}:{GravitinoServerPort}/api/metalakes/{metalake}/catalogs/{catalog}/schemas/{schema}/tables
 ```
-Now you can use Hive as a catalog for managing your metadata in Gravitino. If you encounter any issues or need further assistance, refer to the Gravitino documentation or seek help from the support team.
+
+Example JSON:
+
+```json
+{
+    "name": "test_table",
+    "comment": "my test table",
+    "columns": [
+        {
+            "name": "id",
+            "type": "int",
+            "comment": "id column comment"
+        },
+        {
+            "name": "name",
+            "type": "string",
+            "comment": "name column comment"
+        },
+        {
+            "name": "age",
+            "type": "int",
+            "comment": "age column comment"
+        },
+        {
+            "name": "dt",
+            "type": "date",
+            "comment": "dt column comment"
+        }
+    ],
+    "partitions": [
+        {
+            "strategy": "identity",
+            "fieldName": ["dt"]
+        }
+    ],
+    "distribution": {
+        "strategy": "hash",
+        "number": 32,
+        "expressions": [
+            {
+                "expressionType": "field",
+                "fieldName": ["id"]
+            }
+        ]
+    },
+    "sortOrders": [
+        {
+            "expression": {
+                "expressionType": "field",
+                "fieldName": ["age"]
+            },
+            "direction": "asc",
+            "nullOrdering": "first"
+        }
+    ],
+    "properties": {
+        "format": "ORC"
+    }
+}
+```
+
+* `name`: The name of the Hive table to be created.
+* `comment`: Optional, user custom Hive table comment.
+* `columns`: The columns of the Hive table.
+* `partitions`: Optional, the partitions of the Hive table, above example is a partitioned table with `dt` column.
+* `distribution`: Optional, equivalent to the `CLUSTERED BY` clause in Hive DDL, above example table is bucketed(cluster by) `id` column.
+* `sortOrders`: Optional, equivalent to the `SORTED BY` clause in Hive DDL, above example table data is sorted in increasing order of `age` in each bucket.
+* `properties`: The properties of the Hive table. More properties information see the following table properties table, other properties will be passed down to the underlying Hive table parameters.
+
+### table properties
+
+| Configuration item | Description                                                                                                                                                                                | example value                                                                                | Since version |
+|--------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|---------------|
+| `location`         | The location for table storage. Not required, HMS will use the database location as parent directory by default.                                                                           | `/user/hive/warehouse/test_table`                                                            | 0.2.0         |
+| `table-type`       | Type of the table. Valid values include `MANAGED_TABLE` and `EXTERNAL_TABLE`. `MANAGED_TABLE` by default value.                                                                            | `MANAGED_TABLE`                                                                              | 0.2.0         |
+| `format`           | The table file format. Valid values include `TEXTFILE`, `SEQUENCEFILE`, `RCFILE`, `ORC`, `PARQUET`, `AVRO`, `JSON`, `CSV`, and `REGEX`. `TEXTFILE` by default value.                       | `ORC`                                                                                        | 0.2.0         |
+| `input-format`     | The input format class for the table. The property `format` sets the default value `org.apache.hadoop.mapred.TextInputFormat` and can change it to a different default.                    | `org.apache.hadoop.hive.ql.io.orc.OrcInputFormat`                                            | 0.2.0         |
+| `output-format`    | The output format class for the table. The property `format` sets the default value `org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat` and can change it to a different default. | `org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat`                                           | 0.2.0         |
+| `serde-lib`        | The serde library class for the table. The property `format` sets the default value `org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe` and can change it to a different default.         | `org.apache.hadoop.hive.ql.io.orc.OrcSerde`                                                  | 0.2.0         |
+| `serde.parameter.` | The prefix of serde parameter, empty by default.                                                                                                                                           | `"serde.parameter.orc.create.index" = "true"` indicate `ORC` serde lib to create row indexes | 0.2.0         |
+