Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement](statistics)Sync stats cache while task finished, doesn't need to query column_statistics table. #30609

Merged
merged 1 commit into from
Jan 31, 2024

Conversation

Jibing-Li
Copy link
Contributor

Before, when analyze job finished, the last finished task will query column_statistics table to get the latest stats for each column and update the stats cache in all FEs. Query for column_statistics could be slow and unnecessary.
This pr remove the query logic, move the update cache logic to each task. When the task finished, it already have the latest stats for that column in memory, simply update cache use the data in memory.

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Jibing-Li Jibing-Li marked this pull request as ready for review January 31, 2024 06:46
@Jibing-Li
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 37429 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9846dc1475091586bcc56f2d0a69c144604f7003, data reload: false

------ Round 1 ----------------------------------
q1	17654	4824	4602	4602
q2	2049	136	127	127
q3	10605	919	904	904
q4	4680	725	785	725
q5	7684	2968	2942	2942
q6	187	127	124	124
q7	1134	726	715	715
q8	9261	2020	1973	1973
q9	7213	6390	6344	6344
q10	8110	2483	2460	2460
q11	420	211	212	211
q12	815	278	295	278
q13	18019	3329	3331	3329
q14	266	252	257	252
q15	530	505	503	503
q16	475	421	414	414
q17	945	534	470	470
q18	6933	5994	5912	5912
q19	1586	1463	1397	1397
q20	590	322	336	322
q21	6608	3131	3132	3131
q22	810	295	294	294
Total cold run time: 106574 ms
Total hot run time: 37429 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4506	4440	4449	4440
q2	335	224	229	224
q3	2982	2884	2887	2884
q4	1861	1628	1675	1628
q5	5228	5261	5242	5242
q6	200	117	119	117
q7	2139	1827	1818	1818
q8	3160	3307	3290	3290
q9	8381	8325	8397	8325
q10	5782	3604	3631	3604
q11	539	471	470	470
q12	725	564	577	564
q13	12951	3105	3186	3105
q14	269	243	256	243
q15	537	490	489	489
q16	530	465	470	465
q17	1863	1667	1708	1667
q18	8121	7669	7566	7566
q19	8649	1564	1548	1548
q20	2141	1906	1935	1906
q21	4784	4518	4669	4518
q22	557	460	483	460
Total cold run time: 76240 ms
Total hot run time: 54573 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 175923 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9846dc1475091586bcc56f2d0a69c144604f7003, data reload: false

query1	944	334	333	333
query2	6549	1997	1774	1774
query3	6716	212	206	206
query4	31650	22129	22244	22129
query5	4454	351	424	351
query6	250	165	155	155
query7	4608	264	261	261
query8	261	173	172	172
query9	9066	2269	2262	2262
query10	411	207	207	207
query11	18441	15452	15277	15277
query12	117	67	63	63
query13	1610	369	375	369
query14	9957	7460	7390	7390
query15	238	180	184	180
query16	7337	271	253	253
query17	1811	487	461	461
query18	1927	258	252	252
query19	282	129	128	128
query20	72	66	69	66
query21	198	133	131	131
query22	4973	4824	4895	4824
query23	31271	30388	30375	30375
query24	10620	2773	2766	2766
query25	511	309	311	309
query26	727	140	143	140
query27	2196	284	286	284
query28	5685	1858	1808	1808
query29	911	620	602	602
query30	282	130	135	130
query31	917	716	737	716
query32	87	52	55	52
query33	588	205	203	203
query34	803	450	468	450
query35	842	743	743	743
query36	1262	1201	1204	1201
query37	89	57	62	57
query38	3318	3220	3184	3184
query39	1296	1249	1256	1249
query40	191	82	79	79
query41	37	38	35	35
query42	89	81	81	81
query43	503	477	471	471
query44	1038	686	696	686
query45	195	181	175	175
query46	1042	642	637	637
query47	1605	1553	1501	1501
query48	381	310	298	298
query49	1123	286	277	277
query50	693	319	317	317
query51	5339	5090	5165	5090
query52	101	79	73	73
query53	328	258	256	256
query54	269	185	194	185
query55	91	79	77	77
query56	182	174	165	165
query57	973	895	913	895
query58	175	151	156	151
query59	2447	2247	2366	2247
query60	209	175	176	175
query61	89	81	83	81
query62	657	366	357	357
query63	284	260	280	260
query64	4844	3713	3601	3601
query65	3273	3249	3211	3211
query66	937	335	311	311
query67	14820	14153	14323	14153
query68	5429	490	510	490
query69	468	300	311	300
query70	1482	1551	1550	1550
query71	288	231	211	211
query72	6075	3123	2822	2822
query73	686	329	313	313
query74	6716	6237	6284	6237
query75	3013	2288	2345	2288
query76	3491	973	992	973
query77	388	234	236	234
query78	9550	9046	8640	8640
query79	2667	505	509	505
query80	2199	319	320	319
query81	538	195	205	195
query82	845	80	77	77
query83	252	117	121	117
query84	287	69	74	69
query85	2328	324	326	324
query86	510	370	390	370
query87	3495	3308	3277	3277
query88	3905	2280	2262	2262
query89	463	351	356	351
query90	1917	194	188	188
query91	171	116	117	116
query92	61	44	46	44
query93	3682	428	408	408
query94	1387	160	156	156
query95	493	461	443	443
query96	645	322	320	320
query97	4238	4118	4134	4118
query98	212	201	182	182
query99	1154	666	739	666
Total cold run time: 284227 ms
Total hot run time: 175923 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.47 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 9846dc1475091586bcc56f2d0a69c144604f7003, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.02	0.02
query3	0.23	0.06	0.07
query4	1.64	0.11	0.10
query5	0.52	0.51	0.51
query6	1.19	0.65	0.64
query7	0.02	0.01	0.01
query8	0.04	0.02	0.02
query9	0.56	0.51	0.49
query10	0.56	0.56	0.57
query11	0.12	0.09	0.09
query12	0.12	0.09	0.09
query13	0.60	0.60	0.61
query14	0.78	0.82	0.81
query15	0.78	0.78	0.78
query16	0.39	0.36	0.38
query17	1.06	1.00	1.02
query18	0.20	0.24	0.23
query19	1.84	1.74	1.76
query20	0.01	0.00	0.01
query21	15.41	0.55	0.57
query22	2.36	2.83	1.93
query23	17.52	0.85	0.89
query24	2.69	1.19	0.39
query25	0.31	0.18	0.09
query26	0.53	0.15	0.16
query27	0.06	0.05	0.05
query28	12.06	0.86	0.83
query29	12.55	3.03	3.14
query30	0.62	0.58	0.53
query31	2.79	0.34	0.35
query32	3.35	0.49	0.50
query33	3.20	3.24	3.22
query34	15.96	4.30	4.23
query35	4.33	4.32	4.32
query36	1.11	1.06	1.06
query37	0.07	0.05	0.05
query38	0.05	0.03	0.03
query39	0.03	0.02	0.01
query40	0.19	0.13	0.13
query41	0.07	0.02	0.02
query42	0.03	0.01	0.02
query43	0.03	0.02	0.02
Total cold run time: 106.07 s
Total hot run time: 30.47 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 9846dc1475091586bcc56f2d0a69c144604f7003 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      33 seconds loaded 861443392 Bytes, about 24 MB/s
Insert into select:       13.5 seconds inserted 10000000 Rows, about 740K ops/s

Copy link
Contributor

@freemandealer freemandealer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code completed what it is aiming for, so LGTM.

Copy link
Contributor

PR approved by anyone and no changes requested.

@@ -1179,7 +1179,7 @@ struct TGetBinlogLagResult {

struct TUpdateFollowerStatsCacheRequest {
1: optional string key;
2: list<string> statsRows;
2: optional string colStatsData;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not reuse same order number

@Jibing-Li
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 36804 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ea76f22fe0d8b075c30adab3ef64e186e96c0808, data reload: false

------ Round 1 ----------------------------------
q1	17663	4610	4355	4355
q2	2046	137	128	128
q3	10678	930	940	930
q4	4647	771	695	695
q5	7682	2871	2790	2790
q6	182	119	118	118
q7	1132	719	715	715
q8	9263	1993	2015	1993
q9	7255	6325	6307	6307
q10	8080	2454	2440	2440
q11	415	200	211	200
q12	823	277	295	277
q13	18038	3310	3287	3287
q14	267	259	240	240
q15	528	496	488	488
q16	465	408	400	400
q17	935	561	495	495
q18	6751	5935	5996	5935
q19	1563	1378	1406	1378
q20	587	320	315	315
q21	6613	3162	3026	3026
q22	797	293	292	292
Total cold run time: 106410 ms
Total hot run time: 36804 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4482	4439	4439	4439
q2	336	241	232	232
q3	2969	2912	2830	2830
q4	1875	1684	1626	1626
q5	5211	5210	5251	5210
q6	191	112	113	112
q7	2153	1803	1783	1783
q8	3091	3234	3267	3234
q9	8354	8388	8295	8295
q10	5907	3534	3567	3534
q11	543	461	445	445
q12	742	582	596	582
q13	15014	3088	3069	3069
q14	288	262	262	262
q15	525	494	487	487
q16	522	473	490	473
q17	1840	1681	1706	1681
q18	8090	7670	7458	7458
q19	8587	1498	1560	1498
q20	2143	1904	1907	1904
q21	4724	4603	4565	4565
q22	543	441	473	441
Total cold run time: 78130 ms
Total hot run time: 54160 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 175272 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ea76f22fe0d8b075c30adab3ef64e186e96c0808, data reload: false

query1	934	338	322	322
query2	6573	1927	1977	1927
query3	6703	199	196	196
query4	32040	22037	22026	22026
query5	4457	419	399	399
query6	259	160	160	160
query7	4615	262	254	254
query8	242	169	183	169
query9	9245	2261	2244	2244
query10	424	206	217	206
query11	18319	15431	15459	15431
query12	125	64	65	64
query13	1614	394	373	373
query14	9226	7339	7353	7339
query15	249	180	184	180
query16	7343	264	243	243
query17	1848	511	464	464
query18	1937	254	251	251
query19	239	131	128	128
query20	74	72	68	68
query21	195	133	134	133
query22	4994	4775	4757	4757
query23	31184	30378	30398	30378
query24	9951	2755	2780	2755
query25	529	318	315	315
query26	733	142	137	137
query27	2220	280	278	278
query28	5939	1855	1833	1833
query29	903	635	604	604
query30	283	131	137	131
query31	926	722	717	717
query32	91	54	51	51
query33	571	219	209	209
query34	833	464	464	464
query35	841	755	735	735
query36	1283	1170	1203	1170
query37	100	57	60	57
query38	3316	3174	3180	3174
query39	1311	1268	1253	1253
query40	200	92	80	80
query41	38	35	35	35
query42	89	80	81	80
query43	508	474	495	474
query44	993	681	698	681
query45	196	186	172	172
query46	1032	621	603	603
query47	1610	1479	1526	1479
query48	407	312	304	304
query49	1123	283	280	280
query50	689	305	314	305
query51	5296	5104	5181	5104
query52	90	88	86	86
query53	323	253	262	253
query54	236	174	185	174
query55	78	76	74	74
query56	179	161	160	160
query57	950	915	903	903
query58	181	152	154	152
query59	2409	2382	2283	2283
query60	201	174	175	174
query61	87	86	83	83
query62	617	380	356	356
query63	277	250	258	250
query64	4687	3750	3378	3378
query65	3269	3244	3227	3227
query66	955	319	312	312
query67	14424	14553	13968	13968
query68	4044	492	500	492
query69	436	298	303	298
query70	1561	1582	1486	1486
query71	285	210	210	210
query72	5652	3105	2818	2818
query73	695	313	310	310
query74	6648	6382	6236	6236
query75	2989	2278	2323	2278
query76	2546	952	960	952
query77	370	228	228	228
query78	9118	8854	8559	8559
query79	3081	490	494	490
query80	2192	322	307	307
query81	512	195	194	194
query82	801	90	77	77
query83	260	121	114	114
query84	284	73	72	72
query85	1950	350	340	340
query86	528	397	383	383
query87	3482	3292	3243	3243
query88	4066	2159	2148	2148
query89	433	340	355	340
query90	1934	185	179	179
query91	152	117	119	117
query92	61	41	42	41
query93	4620	431	440	431
query94	1232	161	157	157
query95	497	447	452	447
query96	628	307	315	307
query97	4257	4154	4113	4113
query98	230	186	204	186
query99	1179	694	709	694
Total cold run time: 280536 ms
Total hot run time: 175272 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.84 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ea76f22fe0d8b075c30adab3ef64e186e96c0808, data reload: false

query1	0.04	0.03	0.03
query2	0.06	0.02	0.02
query3	0.22	0.06	0.06
query4	1.65	0.10	0.11
query5	0.54	0.51	0.51
query6	1.19	0.65	0.63
query7	0.02	0.01	0.01
query8	0.04	0.03	0.03
query9	0.55	0.50	0.50
query10	0.55	0.55	0.55
query11	0.13	0.09	0.09
query12	0.12	0.09	0.09
query13	0.61	0.62	0.62
query14	0.79	0.80	0.79
query15	0.80	0.78	0.78
query16	0.39	0.39	0.40
query17	1.03	1.04	1.04
query18	0.20	0.27	0.23
query19	1.91	1.78	1.72
query20	0.02	0.01	0.01
query21	15.43	0.59	0.58
query22	2.56	2.29	1.89
query23	16.96	0.91	0.86
query24	2.68	1.43	0.47
query25	0.36	0.18	0.21
query26	0.46	0.15	0.14
query27	0.05	0.06	0.06
query28	12.10	0.85	0.86
query29	12.51	3.23	3.24
query30	0.62	0.55	0.59
query31	2.79	0.35	0.35
query32	3.34	0.48	0.47
query33	3.26	3.22	3.28
query34	15.74	4.27	4.24
query35	4.26	4.26	4.28
query36	1.10	1.04	1.07
query37	0.07	0.05	0.05
query38	0.05	0.03	0.03
query39	0.02	0.01	0.02
query40	0.17	0.12	0.14
query41	0.07	0.01	0.01
query42	0.02	0.02	0.01
query43	0.02	0.03	0.02
Total cold run time: 105.5 s
Total hot run time: 30.84 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit ea76f22fe0d8b075c30adab3ef64e186e96c0808 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       13.2 seconds inserted 10000000 Rows, about 757K ops/s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 31, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@Jibing-Li Jibing-Li merged commit a9b93ed into apache:master Jan 31, 2024
24 of 26 checks passed
@Jibing-Li Jibing-Li deleted the loadcache branch January 31, 2024 13:33
yiguolei pushed a commit that referenced this pull request Jan 31, 2024
Jibing-Li added a commit to Jibing-Li/incubator-doris that referenced this pull request Jan 31, 2024
yiguolei pushed a commit that referenced this pull request Jan 31, 2024
Jibing-Li added a commit that referenced this pull request Feb 1, 2024
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.5-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants