Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#562] docs(hive): add user doc of Hive catalog #569

Merged
merged 4 commits into from
Oct 24, 2023

Conversation

mchades
Copy link
Contributor

@mchades mchades commented Oct 20, 2023

What changes were proposed in this pull request?

add user doc of Hive catalog

Why are the changes needed?

Fix: #562

Does this PR introduce any user-facing change?

no

How was this patch tested?

not need

@mchades mchades changed the title [#562] docs: add user doc of Hive catalog [#562] docs(hive): add user doc of Hive catalog Oct 20, 2023
@github-actions
Copy link

github-actions bot commented Oct 20, 2023

Code Coverage Report

Overall Project 66.9% 🟢
Module Coverage
catalog-hive 67.22% 🟢
Files
Module File Coverage
catalog-hive HiveCatalogPropertiesMeta.java 100% 🟢

docs/Gravitino-manage-Hive.md Outdated Show resolved Hide resolved
docs/Gravitino-manage-Hive.md Outdated Show resolved Hide resolved
docs/Gravitino-manage-Hive.md Outdated Show resolved Hide resolved
docs/Gravitino-manage-Hive.md Outdated Show resolved Hide resolved
docs/Gravitino-manage-Hive.md Outdated Show resolved Hide resolved
docs/Gravitino-manage-Hive.md Outdated Show resolved Hide resolved
Copy link
Member

@justinmclean justinmclean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor formatting and English issues that need fixing

docs/Gravitino-manage-Hive.md Outdated Show resolved Hide resolved
docs/Gravitino-manage-Hive.md Outdated Show resolved Hide resolved
docs/Gravitino-manage-Hive.md Outdated Show resolved Hide resolved
docs/Gravitino-manage-Hive.md Outdated Show resolved Hide resolved
docs/Gravitino-manage-Hive.md Outdated Show resolved Hide resolved
docs/Gravitino-manage-Hive.md Outdated Show resolved Hide resolved
license: "Copyright 2023 Datastrato.
This software is licensed under the Apache License version 2."
---
## Using Hive as a Catalog in Gravitino
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be we can add more Hive catalog capacities? such as it works as a proxy mode now, supports basic namespace&table DDL operations, not support partition operations yet. could it manage the tables not created by Gravitino? what's the different with the tables created by Gravitino ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just keeping the document format consistent with Iceberg catalog(#537). Additionally, the table-related operations you mentioned should be placed under the sub-directory in Hive catalog document directory, but the current document framework does not support a hierarchical directory.

}
```

* `provider`: Set this to "hive" to use Hive as the catalog provider.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provider is immutable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not mention that the provider is mutable, what would you recommend revising this sentence to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should tell users they can't change provider when they alter catalog info. otherwise, besides create we should add drop&alter&load&list docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about change to:

* `provider`: Must set this to "hive" in order to use Hive as the catalog provider.


## After the catalog is initialized

You can manage and operate on tables using the following URL format:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive Catalog provides some custom table properties, such as format, we should tell the users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but it's a table property, not a catalog

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to list all the table properties here to tell users how to configure.

You can manage and operate on tables using the following URL format:

```shell
http://{GravitinoServerHost}:{GravitinoServerPort}/api/metalakes/{metalake}/catalogs/{catalog}/schemas/{schema}/tables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to provide a simple example to create a hive table

@mchades
Copy link
Contributor Author

mchades commented Oct 23, 2023

@yuqi1129 @justinmclean I have made corresponding modifications based on the comments. Can you help me review it again?


### configuration

| Configuration item | Description | value |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is Default value or just an example value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you please add a column named Since version as @jerryshao suggested to mark in which version we introduce this configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added


* `provider`: Set this to "hive" to use Hive as the catalog provider.
* `metastore.uris`: This is a required configuration, and it should be the Hive metastore service URIs.
* Other configuration parameters with the `gravitino.bypass.` prefix can be added to the "properties" section and passed down to the underlying Hive metastore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd better add an example like 'gravitino.bypass.hive.metastore.client.capability.check', then we would pass hive.metastore.client.capability.check to hive metastore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does gravitino.bypass.hive.metastore.client.capability.check mean? I think we can offer meaningful examples that can be useful for the Hive catalog but I don't know what there is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hive.metastore.client.capability.check is the key of an exact configuration in HiveConf, it's just a example of how to use gravition.bypas. prefix. If users want to override default hive value, they can use gravitino.bypass.xxxx to overwrite xxx in hive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to catalog properties section

@jerryshao jerryshao merged commit 38731aa into apache:main Oct 24, 2023
2 checks passed
jerryshao pushed a commit that referenced this pull request Oct 26, 2023
### What changes were proposed in this pull request?
add user doc of Hive catalog

### Why are the changes needed?
Fix: #562 

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
not need
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Add user doc for Hive-Catalog
5 participants