-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improvement](statistics)Sync stats cache while task finished, doesn't need to query column_statistics table. #30609
Conversation
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 37429 ms
|
TPC-DS: Total hot run time: 175923 ms
|
ClickBench: Total hot run time: 30.47 s
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code completed what it is aiming for, so LGTM.
PR approved by anyone and no changes requested. |
gensrc/thrift/FrontendService.thrift
Outdated
@@ -1179,7 +1179,7 @@ struct TGetBinlogLagResult { | |||
|
|||
struct TUpdateFollowerStatsCacheRequest { | |||
1: optional string key; | |||
2: list<string> statsRows; | |||
2: optional string colStatsData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not reuse same order number
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 36804 ms
|
TPC-DS: Total hot run time: 175272 ms
|
ClickBench: Total hot run time: 30.84 s
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
PR approved by at least one committer and no changes requested. |
Before, when analyze job finished, the last finished task will query column_statistics table to get the latest stats for each column and update the stats cache in all FEs. Query for column_statistics could be slow and unnecessary.
This pr remove the query logic, move the update cache logic to each task. When the task finished, it already have the latest stats for that column in memory, simply update cache use the data in memory.
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...