Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#318] refactor(core, catalog-*): Refactor the catalog operations to guarantee SSOT #403

Merged
merged 9 commits into from
Sep 24, 2023

Conversation

jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

This is the final work of #250 , with this PR there're several major refactorings:

  1. Removing all the entity store operations in HiveCatalogOperation, which makes each CatalogOperation only focus on its own logic.
  2. Processing all the additional metadata information in CatalogOperationDispatcher, also guarantees the SSOT.
  3. Refactor the BaseXXX (BaseTable, BaseSchema and BaseColumn), to separate the metadata logics from entity information.
  4. With all the above changes, changing the UTs accordingly.

Why are the changes needed?

With this PR, we have several advantages:

  1. No need to handle entity store operations in each catalog, unify all of them in core module.
  2. Remove the complex transaction semantics, using SSOT best effort mechanism.

Fix: #318

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Adding new UTs to cover the code

@jerryshao jerryshao self-assigned this Sep 15, 2023
@jerryshao jerryshao changed the title [#318] refactor(core, catalog*): Refactor the catalog operations to guarantee SSOT [#318] refactor(core, catalog-*): Refactor the catalog operations to guarantee SSOT Sep 15, 2023
@github-actions
Copy link

github-actions bot commented Sep 15, 2023

Code Coverage Report

Overall Project 63.99% -0.67% 🟢
Files changed 87.1% 🟢

Module Coverage
core 75.98% -2.07% 🟢
catalog-hive 60.7% -0.32% 🟢
Files
Module File Coverage
core SchemaEntitySerDe.java 100% 🟢
TableEntitySerde.java 100% 🟢
EntityCombinedSchema.java 100% 🟢
EntityCombinedTable.java 90.7% -9.3% 🟢
CatalogOperationDispatcher.java 90.13% -4.71% 🟢
BaseColumn.java 75.68% -12.43% 🟢
SchemaEntity.java 73.08% -26.92% 🟢
TableEntity.java 73.08% -26.92% 🟢
BaseSchema.java 64.47% -35.53% 🟢
ProtoEntitySerDe.java 64.34% 🟢
BaseTable.java 57.89% -2.26% 🔴
GravitonEnv.java 0% -7.24% 🔴
catalog-hive HiveTable.java 95.97% 🟢
HiveSchema.java 86.3% 🟢
HiveColumn.java 78.33% -21.67% 🟢
HiveCatalogOperations.java 67.76% 🟢

@yuqi1129
Copy link
Contributor

yuqi1129 commented Sep 15, 2023

I suggest we come up with a systematic solution to log error operations in graviton and use it to make some modifications in graviton. As this PR allows failed actions in graviton and only log in warn level, though we do not need to keep metadata consistent with that in external systems, but we should do our best to make them consistent and warn message is too simple to get context.

@jerryshao
Copy link
Contributor Author

@mchades @yuqi1129 can you please help to review when you have time, thanks.

identifier -> store.get(identifier, TABLE, TableEntity.class),
"GET",
stringId.id(),
true /* throwIfNotFound */);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very clear: Why here is true? As you ignore the failure when store TableEntity to Graviton store in method CreateTable, Chances are that we won't get TableEntity in Graviton store, so here should be false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, for load operation, the behavior is that if we cannot get an entity from graviton store, we fail the operation rather than giving the user a half-complete metadata object.

My thinking of why we fail the operation rather than giving the user a half-complete metadata object is that: for load operation, we could fail without side-effect, so we deliver the user a consistent behavior; but for other operations that have side effects, if hive operation (for example) is succeed, we should not fail the operation to keep SSOT, otherwise it will be misleading and inconsistent (where hive operation succeeds but entity store operation fails).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment /* throwIfNotFound */ here is inconsistent with the actual behavior since NoSuchEntityException was caught in operateOnEntity method

Copy link
Member

@xunliu xunliu Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, for load operation, the behavior is that if we cannot get an entity from graviton store, we fail the operation rather than giving the user a half-complete metadata object.

@jerryshao I know your mean.
But I have a concern, for example,
1、In the CreateTable(tab1) API, operation hive success, operation graviton's backend storage failed, The Graviton returns success.
2、 The user call loadTable(tab) API return fails.
In this case, the User saves data successfully, and load data fails. and We can't help users to fix this problem, because we also use this loadTable() API, and we also fail.

I think maybe we can return success and tell the user loss information of the graviton's backend storage in the REST Response,
In this way, the user can call alertTable() API to refill lost information to fix this problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, let me change the code.

@yuqi1129
Copy link
Contributor

@xunliu @mchades Please also take some time to review this PR, as it involves significant changes to our Graviton storage system.

@xunliu
Copy link
Member

xunliu commented Sep 21, 2023

I think maybe we need to add some test cases to cover Graviton's backend storage failures.
Maybe need to add BackendStorage::stop() function only provided to test.

@xunliu
Copy link
Member

xunliu commented Sep 21, 2023

For this multi-step operation, I think we needs to provide a queue computing function.
The relationship between clear functions can be simplified and also better adapted to more complex data sources in the future.
This function maybe like this,

public Table testFunQueueComputing(NameIdentifier ident) {
    Consumer<String>[] mustFuns = new Consumer[]{
            (s) -> System.out.println("mustSuccessFun1: " + s),
            (s) -> System.out.println("mustSuccessFun2: " + s),
            (s) -> {
                throw new RuntimeException("mustSuccessFun3 failed");
            }
    };

    Consumer<String>[] maybeSuccFuns = new Consumer[]{
        (s) -> System.out.println("maybeSuccFuns1: " + s),
        (s) -> System.out.println("maybeSuccFuns2: " + s)
    };

    Consumer<String>[] exceptionFuns = new Consumer[]{
            (s) -> System.out.println("If mustSuccessFun1 failed the executing me: " + s),
            (s) -> System.out.println("If mustSuccessFun2 failed the executing me" + s),
            (s) -> System.out.println("If mustSuccessFun3 failed the executing me" + s),
    };

    funQueueComputing(mustFuns, maybeSuccFuns, exceptionFuns);
}

public void funQueueComputing(Consumer<String>[] mustSuccFuns, Consumer<String>[] maybeSuccFuns, Consumer<String>[] exceptionFuns) {
    int mustIndex = 0;
    try {
        for (Consumer<String> function : mustSuccFuns) {
            function.accept("A");
            mustIndex ++
        }
    } catch (Exception e) {
        exceptionFuns[mustIndex].accept("B");
        throw Exception("quit funQueueComputing()")
    }
    try {
        for (Consumer<String> function : maybeSuccFuns) {
            function.accept("C");
        }
    } catch (Exception e) {
        // ignore
    }
}

Use funQueueComputing() in the loadTable().

public Table loadTable(NameIdentifier ident) throws NoSuchTableException {
    Consumer<String>[] mustFuns = new Consumer[]{
        (s) -> loadDataFromHive(s)
    };

    Consumer<String>[] maybeSuccFuns = new Consumer[]{
        (s) -> loadDataFromBackendStorage(s)
    };

    Consumer<String>[] exceptionFuns = new Consumer[]{
    };

    funQueueComputing(mustFuns, maybeSuccFuns, exceptionFuns);
}

@jerryshao
Copy link
Contributor Author

For this multi-step operation, I think we needs to provide a queue computing function.
The relationship between clear functions can be simplified and also better adapted to more complex data sources in the future.
This function maybe like this,

Let me think a bit on how to address the thing, but maybe we should not block on this.

@jerryshao
Copy link
Contributor Author

@xunliu can you please review it again, I would suggest not blocking the improvements you mentioned above. Several other PRs depend on this.

Copy link
Member

@xunliu xunliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xunliu xunliu merged commit ccb2d9a into apache:main Sep 24, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Refactoring the secondary storage implementation in HiveCatalog
4 participants