[#318] refactor(core, catalog-*): Refactor the catalog operations to guarantee SSOT #403

jerryshao · 2023-09-15T09:48:54Z

What changes were proposed in this pull request?

This is the final work of #250 , with this PR there're several major refactorings:

Removing all the entity store operations in HiveCatalogOperation, which makes each CatalogOperation only focus on its own logic.
Processing all the additional metadata information in CatalogOperationDispatcher, also guarantees the SSOT.
Refactor the BaseXXX (BaseTable, BaseSchema and BaseColumn), to separate the metadata logics from entity information.
With all the above changes, changing the UTs accordingly.

Why are the changes needed?

With this PR, we have several advantages:

No need to handle entity store operations in each catalog, unify all of them in core module.
Remove the complex transaction semantics, using SSOT best effort mechanism.

Fix: #318

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Adding new UTs to cover the code

github-actions · 2023-09-15T09:59:34Z

Code Coverage Report

Overall Project	63.99% `-0.67%`	🟢
Files changed	87.1%	🟢

Module	Coverage
core	75.98% `-2.07%`	🟢
catalog-hive	60.7% `-0.32%`	🟢

Files

Module	File	Coverage
core	SchemaEntitySerDe.java	100%	🟢
	TableEntitySerde.java	100%	🟢
	EntityCombinedSchema.java	100%	🟢
	EntityCombinedTable.java	90.7% `-9.3%`	🟢
	CatalogOperationDispatcher.java	90.13% `-4.71%`	🟢
	BaseColumn.java	75.68% `-12.43%`	🟢
	SchemaEntity.java	73.08% `-26.92%`	🟢
	TableEntity.java	73.08% `-26.92%`	🟢
	BaseSchema.java	64.47% `-35.53%`	🟢
	ProtoEntitySerDe.java	64.34%	🟢
	BaseTable.java	57.89% `-2.26%`	🔴
	GravitonEnv.java	0% `-7.24%`	🔴
catalog-hive	HiveTable.java	95.97%	🟢
	HiveSchema.java	86.3%	🟢
	HiveColumn.java	78.33% `-21.67%`	🟢
	HiveCatalogOperations.java	67.76%	🟢

core/src/main/java/com/datastrato/graviton/catalog/CatalogOperationDispatcher.java

yuqi1129 · 2023-09-15T14:02:35Z

I suggest we come up with a systematic solution to log error operations in graviton and use it to make some modifications in graviton. As this PR allows failed actions in graviton and only log in warn level, though we do not need to keep metadata consistent with that in external systems, but we should do our best to make them consistent and warn message is too simple to get context.

jerryshao · 2023-09-20T02:16:52Z

@mchades @yuqi1129 can you please help to review when you have time, thanks.

core/src/main/java/com/datastrato/graviton/meta/TableEntity.java

yuqi1129 · 2023-09-20T08:59:54Z

core/src/main/java/com/datastrato/graviton/catalog/CatalogOperationDispatcher.java

+            identifier -> store.get(identifier, TABLE, TableEntity.class),
+            "GET",
+            stringId.id(),
+            true /* throwIfNotFound */);


I am not very clear: Why here is true? As you ignore the failure when store TableEntity to Graviton store in method CreateTable, Chances are that we won't get TableEntity in Graviton store, so here should be false?

Currently, for load operation, the behavior is that if we cannot get an entity from graviton store, we fail the operation rather than giving the user a half-complete metadata object.

My thinking of why we fail the operation rather than giving the user a half-complete metadata object is that: for load operation, we could fail without side-effect, so we deliver the user a consistent behavior; but for other operations that have side effects, if hive operation (for example) is succeed, we should not fail the operation to keep SSOT, otherwise it will be misleading and inconsistent (where hive operation succeeds but entity store operation fails).

The comment /* throwIfNotFound */ here is inconsistent with the actual behavior since NoSuchEntityException was caught in operateOnEntity method

Currently, for load operation, the behavior is that if we cannot get an entity from graviton store, we fail the operation rather than giving the user a half-complete metadata object.

@jerryshao I know your mean.
But I have a concern, for example,
1、In the CreateTable(tab1) API, operation hive success, operation graviton's backend storage failed, The Graviton returns success.
2、 The user call loadTable(tab) API return fails.
In this case, the User saves data successfully, and load data fails. and We can't help users to fix this problem, because we also use this loadTable() API, and we also fail.

I think maybe we can return success and tell the user loss information of the graviton's backend storage in the REST Response,
In this way, the user can call alertTable() API to refill lost information to fix this problem.

I see, let me change the code.

core/src/main/java/com/datastrato/graviton/catalog/CatalogOperationDispatcher.java

yuqi1129 · 2023-09-21T08:35:24Z

@xunliu @mchades Please also take some time to review this PR, as it involves significant changes to our Graviton storage system.

core/src/main/java/com/datastrato/graviton/catalog/CatalogOperationDispatcher.java

core/src/main/java/com/datastrato/graviton/catalog/rel/BaseSchema.java

xunliu · 2023-09-21T14:04:54Z

I think maybe we need to add some test cases to cover Graviton's backend storage failures.
Maybe need to add BackendStorage::stop() function only provided to test.

xunliu · 2023-09-21T14:34:59Z

For this multi-step operation, I think we needs to provide a queue computing function.
The relationship between clear functions can be simplified and also better adapted to more complex data sources in the future.
This function maybe like this,

public Table testFunQueueComputing(NameIdentifier ident) {
    Consumer<String>[] mustFuns = new Consumer[]{
            (s) -> System.out.println("mustSuccessFun1: " + s),
            (s) -> System.out.println("mustSuccessFun2: " + s),
            (s) -> {
                throw new RuntimeException("mustSuccessFun3 failed");
            }
    };

    Consumer<String>[] maybeSuccFuns = new Consumer[]{
        (s) -> System.out.println("maybeSuccFuns1: " + s),
        (s) -> System.out.println("maybeSuccFuns2: " + s)
    };

    Consumer<String>[] exceptionFuns = new Consumer[]{
            (s) -> System.out.println("If mustSuccessFun1 failed the executing me: " + s),
            (s) -> System.out.println("If mustSuccessFun2 failed the executing me" + s),
            (s) -> System.out.println("If mustSuccessFun3 failed the executing me" + s),
    };

    funQueueComputing(mustFuns, maybeSuccFuns, exceptionFuns);
}

public void funQueueComputing(Consumer<String>[] mustSuccFuns, Consumer<String>[] maybeSuccFuns, Consumer<String>[] exceptionFuns) {
    int mustIndex = 0;
    try {
        for (Consumer<String> function : mustSuccFuns) {
            function.accept("A");
            mustIndex ++
        }
    } catch (Exception e) {
        exceptionFuns[mustIndex].accept("B");
        throw Exception("quit funQueueComputing()")
    }
    try {
        for (Consumer<String> function : maybeSuccFuns) {
            function.accept("C");
        }
    } catch (Exception e) {
        // ignore
    }
}

Use funQueueComputing() in the loadTable().

public Table loadTable(NameIdentifier ident) throws NoSuchTableException {
    Consumer<String>[] mustFuns = new Consumer[]{
        (s) -> loadDataFromHive(s)
    };

    Consumer<String>[] maybeSuccFuns = new Consumer[]{
        (s) -> loadDataFromBackendStorage(s)
    };

    Consumer<String>[] exceptionFuns = new Consumer[]{
    };

    funQueueComputing(mustFuns, maybeSuccFuns, exceptionFuns);
}

jerryshao · 2023-09-22T07:48:56Z

For this multi-step operation, I think we needs to provide a queue computing function.
The relationship between clear functions can be simplified and also better adapted to more complex data sources in the future.
This function maybe like this,

Let me think a bit on how to address the thing, but maybe we should not block on this.

jerryshao · 2023-09-23T01:09:47Z

@xunliu can you please review it again, I would suggest not blocking the improvements you mentioned above. Several other PRs depend on this.

xunliu

LGTM

jerryshao requested review from xunliu, yuqi1129 and mchades September 15, 2023 09:48

jerryshao self-assigned this Sep 15, 2023

jerryshao changed the title ~~[#318] refactor(core, catalog*): Refactor the catalog operations to guarantee SSOT~~ [#318] refactor(core, catalog-*): Refactor the catalog operations to guarantee SSOT Sep 15, 2023

yuqi1129 reviewed Sep 15, 2023

View reviewed changes

jerryshao force-pushed the issue-318 branch from 23f6f60 to 301b552 Compare September 19, 2023 01:49

yuqi1129 reviewed Sep 20, 2023

View reviewed changes

jerryshao force-pushed the issue-318 branch from ad01aff to f076e33 Compare September 21, 2023 03:33

jerryshao added 8 commits September 21, 2023 17:10

Refactor catalog operations to guarantee the SSOT schemantics

0307e9b

Add UTs for CatalogOperationDispatcher

6ee175c

Remove the flaky test

870f1da

Address the comments

216971c

Change from warning to error log

c64d95e

fix rebase issue

d52b93e

Fix merge issue

f1a230f

Address comments

4228339

jerryshao force-pushed the issue-318 branch from c050c00 to 4228339 Compare September 21, 2023 09:16

mchades reviewed Sep 21, 2023

View reviewed changes

core/src/main/java/com/datastrato/graviton/catalog/CatalogOperationDispatcher.java Show resolved Hide resolved

mchades reviewed Sep 21, 2023

View reviewed changes

core/src/main/java/com/datastrato/graviton/catalog/rel/BaseSchema.java Outdated Show resolved Hide resolved

Address comments

9658ee1

mchades mentioned this pull request Sep 22, 2023

[#412] feat(core): Generic property system support #441

Merged

xunliu approved these changes Sep 24, 2023

View reviewed changes

xunliu merged commit ccb2d9a into apache:main Sep 24, 2023
2 checks passed

yuqi1129 mentioned this pull request Oct 23, 2023

[Bug report] alterSchema throw NoSuchSchemaException #316

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#318] refactor(core, catalog-*): Refactor the catalog operations to guarantee SSOT #403

[#318] refactor(core, catalog-*): Refactor the catalog operations to guarantee SSOT #403

jerryshao commented Sep 15, 2023

github-actions bot commented Sep 15, 2023 •

edited

Loading

yuqi1129 commented Sep 15, 2023 •

edited

Loading

jerryshao commented Sep 20, 2023

yuqi1129 Sep 20, 2023

jerryshao Sep 21, 2023

mchades Sep 21, 2023

xunliu Sep 21, 2023 •

edited

Loading

jerryshao Sep 22, 2023

yuqi1129 commented Sep 21, 2023

xunliu commented Sep 21, 2023

xunliu commented Sep 21, 2023

jerryshao commented Sep 22, 2023

jerryshao commented Sep 23, 2023

xunliu left a comment

[#318] refactor(core, catalog-*): Refactor the catalog operations to guarantee SSOT #403

[#318] refactor(core, catalog-*): Refactor the catalog operations to guarantee SSOT #403

Conversation

jerryshao commented Sep 15, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

github-actions bot commented Sep 15, 2023 • edited Loading

Code Coverage Report

yuqi1129 commented Sep 15, 2023 • edited Loading

jerryshao commented Sep 20, 2023

yuqi1129 Sep 20, 2023

Choose a reason for hiding this comment

jerryshao Sep 21, 2023

Choose a reason for hiding this comment

mchades Sep 21, 2023

Choose a reason for hiding this comment

xunliu Sep 21, 2023 • edited Loading

Choose a reason for hiding this comment

jerryshao Sep 22, 2023

Choose a reason for hiding this comment

yuqi1129 commented Sep 21, 2023

xunliu commented Sep 21, 2023

xunliu commented Sep 21, 2023

jerryshao commented Sep 22, 2023

jerryshao commented Sep 23, 2023

xunliu left a comment

Choose a reason for hiding this comment

github-actions bot commented Sep 15, 2023 •

edited

Loading

yuqi1129 commented Sep 15, 2023 •

edited

Loading

xunliu Sep 21, 2023 •

edited

Loading