Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4952] feat(hudi-catalog): add implementation of HMSBackend for Hudi catalog #4942

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mchades
Copy link
Contributor

@mchades mchades commented Sep 13, 2024

What changes were proposed in this pull request?

support read operations for Hudi catalog HMS backend

Why are the changes needed?

Fix: #4952

Does this PR introduce any user-facing change?

no

How was this patch tested?

UTs added

@mchades mchades force-pushed the hudi-hms branch 2 times, most recently from b235c0c to 17f93c8 Compare September 18, 2024 06:48
@mchades mchades changed the title implementation of HMSBackend for Hudi catalog [#4952] feat(hudi-catalog): add implementation of HMSBackend for Hudi catalog Sep 18, 2024
@mchades mchades marked this pull request as ready for review September 18, 2024 06:52
c ->
c.getTables(schemaIdent.name(), "*").stream()
.map(table -> NameIdentifier.of(namespace, table))
.toArray(NameIdentifier[]::new));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will here list all the tables, not just Hudi table, should we filter out non-hudi table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 64 to 74
partitioning = HiveTableConverter.getPartitioning(hmsTable);
sortOrders = HiveTableConverter.getSortOrders(hmsTable);
distribution = HiveTableConverter.getDistribution(hmsTable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Hudi store such information in HMS, is it compatible with Hive table? As I know, for Iceberg, we need some Iceberg APIs to get partitioning, sortOrders, because Iceberg will store such information in it's metadata file, not in HMS, I guess Hudi is similar, can you please confirm this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments added to the code

@mchades
Copy link
Contributor Author

mchades commented Sep 27, 2024

All comments resolved, plz help to review again when you have time, thanks! @jerryshao

.filter(
t ->
t.getSd().getInputFormat() != null
&& t.getSd().getInputFormat().startsWith("org.apache.hudi"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please make this "org.apache.hudi" a static variable to avoid hard coding here? Also adding some comments about the purpose here. This is quite hacky, because if the Hudi package is changed or somehow, the assumption here will be failed.

try {
Table table =
clientPool.run(client -> client.getTable(schemaIdent.name(), tableIdent.name()));
return HudiHMSTable.builder().withBackendTable(table).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also make sure that the loaded table is a Hudi table, otherwise throw an exception instead?

columns = HiveTableConverter.getColumns(hmsTable, HudiColumn.builder());
partitioning = HiveTableConverter.getPartitioning(hmsTable);

// should be always SortOrders.NONE since Hudi using clustering to sort data (see
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Should always be..."

protected HudiHMSTable buildFromTable(Table hmsTable) {
name = hmsTable.getTableName();
comment = hmsTable.getParameters().get(COMMENT);
columns = HiveTableConverter.getColumns(hmsTable, HudiColumn.builder());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Hudi's column type exactly the same as Hive table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the table in the Hudi document, the data types supported by Hudi are fewer than those of Hive. I assume they are a subset of Hive. So I think there should be no problem with the conversion here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Support read operations of HMSBackend for Hudi catalog
2 participants