Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement](iceberg)Parallelize splits for count(*) #41169

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

wuwenchi
Copy link
Contributor

Proposed changes

  1. Parallelize splits to prevent bottlenecks in a single split.
  2. Only a single column needs to be resized.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@wuwenchi
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41872 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1e4d7958f740efbc3a7cc16293db85c6c92e1485, data reload: false

------ Round 1 ----------------------------------
q1	17624	7442	7233	7233
q2	2023	283	283	283
q3	12178	1078	1188	1078
q4	10585	732	769	732
q5	7778	3153	3122	3122
q6	237	149	145	145
q7	1039	612	615	612
q8	9447	2074	2064	2064
q9	6790	6454	6407	6407
q10	7058	2303	2288	2288
q11	428	238	250	238
q12	411	217	209	209
q13	17768	3018	3036	3018
q14	250	221	217	217
q15	579	523	537	523
q16	686	644	618	618
q17	979	850	847	847
q18	7347	6782	6755	6755
q19	1410	997	1082	997
q20	581	298	285	285
q21	4021	3210	3250	3210
q22	1142	1013	991	991
Total cold run time: 110361 ms
Total hot run time: 41872 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7263	7207	7267	7207
q2	337	237	252	237
q3	3122	2972	2971	2971
q4	2129	1860	1730	1730
q5	5689	5590	5683	5590
q6	229	147	143	143
q7	2206	1857	1783	1783
q8	3326	3472	3435	3435
q9	8828	8838	8912	8838
q10	3441	3458	3484	3458
q11	577	497	485	485
q12	828	652	607	607
q13	10134	3196	3194	3194
q14	322	283	282	282
q15	576	530	531	530
q16	705	685	681	681
q17	1826	1622	1619	1619
q18	8238	7803	7834	7803
q19	1749	1548	1555	1548
q20	2136	1926	1882	1882
q21	5484	5396	5407	5396
q22	1167	1053	1072	1053
Total cold run time: 70312 ms
Total hot run time: 60472 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.27% (9606/25774)
Line Coverage: 28.69% (79530/277191)
Region Coverage: 28.13% (41127/146194)
Branch Coverage: 24.77% (20961/84630)
Coverage Report: http://coverage.selectdb-in.cc/coverage/1e4d7958f740efbc3a7cc16293db85c6c92e1485_1e4d7958f740efbc3a7cc16293db85c6c92e1485/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 191948 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1e4d7958f740efbc3a7cc16293db85c6c92e1485, data reload: false

query1	955	394	400	394
query2	6245	2152	2118	2118
query3	8684	192	200	192
query4	33683	23850	23520	23520
query5	3381	473	460	460
query6	268	160	162	160
query7	4183	314	289	289
query8	279	220	218	218
query9	9501	2627	2621	2621
query10	487	338	292	292
query11	17883	15321	15138	15138
query12	149	100	96	96
query13	1540	401	393	393
query14	9729	7664	7498	7498
query15	309	171	168	168
query16	8096	435	445	435
query17	1750	616	599	599
query18	2134	316	315	315
query19	360	168	154	154
query20	126	113	112	112
query21	206	105	107	105
query22	4857	4456	4387	4387
query23	35106	34515	34515	34515
query24	11273	2975	2989	2975
query25	611	413	433	413
query26	1130	165	162	162
query27	2467	285	285	285
query28	7685	2442	2431	2431
query29	814	428	416	416
query30	265	160	172	160
query31	1014	805	774	774
query32	102	55	56	55
query33	748	299	290	290
query34	895	490	472	472
query35	873	739	738	738
query36	1084	919	951	919
query37	148	90	91	90
query38	4083	3870	3875	3870
query39	1507	1405	1532	1405
query40	204	98	99	98
query41	49	48	46	46
query42	125	95	100	95
query43	532	487	502	487
query44	1274	824	806	806
query45	195	163	163	163
query46	1147	774	740	740
query47	1905	1820	1838	1820
query48	464	361	374	361
query49	894	423	402	402
query50	832	408	409	408
query51	7129	6901	6818	6818
query52	100	91	90	90
query53	255	204	174	174
query54	1032	449	454	449
query55	74	79	79	79
query56	276	244	269	244
query57	1191	1115	1087	1087
query58	234	231	229	229
query59	3316	2905	2920	2905
query60	289	269	257	257
query61	103	107	103	103
query62	792	673	672	672
query63	220	190	181	181
query64	3868	650	634	634
query65	3279	3185	3204	3185
query66	790	297	300	297
query67	15957	15614	15511	15511
query68	4885	590	554	554
query69	580	288	295	288
query70	1196	1134	1141	1134
query71	360	276	280	276
query72	7626	4097	4020	4020
query73	750	324	327	324
query74	10461	9096	8984	8984
query75	4359	2641	2693	2641
query76	3596	931	892	892
query77	702	296	288	288
query78	11257	9699	9197	9197
query79	2190	544	542	542
query80	1789	444	444	444
query81	582	247	241	241
query82	519	137	142	137
query83	309	130	136	130
query84	290	85	77	77
query85	762	287	287	287
query86	476	304	296	296
query87	4513	4298	4428	4298
query88	3368	2320	2328	2320
query89	397	290	281	281
query90	2088	182	188	182
query91	185	148	148	148
query92	64	51	47	47
query93	1445	538	539	538
query94	1046	306	295	295
query95	362	258	256	256
query96	606	279	274	274
query97	3258	3205	3122	3122
query98	231	202	190	190
query99	1536	1301	1270	1270
Total cold run time: 302924 ms
Total hot run time: 191948 ms

@wuwenchi
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.27% (9605/25774)
Line Coverage: 28.68% (79493/277191)
Region Coverage: 28.12% (41115/146194)
Branch Coverage: 24.77% (20961/84630)
Coverage Report: http://coverage.selectdb-in.cc/coverage/e926872a20b83f8d318ea2dfdf1aae313c15b532_e926872a20b83f8d318ea2dfdf1aae313c15b532/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 41613 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e926872a20b83f8d318ea2dfdf1aae313c15b532, data reload: false

------ Round 1 ----------------------------------
q1	17953	7907	7338	7338
q2	2027	285	289	285
q3	12191	1074	1192	1074
q4	10573	780	762	762
q5	7760	3083	3042	3042
q6	235	152	149	149
q7	1037	628	624	624
q8	9429	2028	2015	2015
q9	6840	6452	6409	6409
q10	7014	2292	2303	2292
q11	435	242	243	242
q12	411	213	212	212
q13	17774	3009	2977	2977
q14	238	219	224	219
q15	600	528	514	514
q16	669	607	610	607
q17	966	795	805	795
q18	7406	6665	6721	6665
q19	1402	1000	967	967
q20	580	288	283	283
q21	3992	3178	3180	3178
q22	1067	964	1015	964
Total cold run time: 110599 ms
Total hot run time: 41613 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7261	7238	7247	7238
q2	337	235	241	235
q3	3072	2947	2969	2947
q4	2061	1916	1783	1783
q5	5623	5569	5696	5569
q6	238	145	147	145
q7	2196	1797	1804	1797
q8	3296	3477	3438	3438
q9	8752	8841	8816	8816
q10	3426	3444	3492	3444
q11	583	501	479	479
q12	840	654	610	610
q13	9698	3156	3200	3156
q14	300	288	264	264
q15	592	534	530	530
q16	711	686	673	673
q17	1805	1598	1602	1598
q18	8229	7909	7825	7825
q19	1740	1492	1575	1492
q20	2123	1894	1900	1894
q21	5449	5344	5536	5344
q22	1162	1046	1032	1032
Total cold run time: 69494 ms
Total hot run time: 60309 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191560 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e926872a20b83f8d318ea2dfdf1aae313c15b532, data reload: false

query1	924	392	383	383
query2	6333	2009	2042	2009
query3	8693	193	203	193
query4	33867	23668	23516	23516
query5	3690	458	463	458
query6	277	163	167	163
query7	4198	305	308	305
query8	299	231	230	230
query9	9557	2663	2663	2663
query10	476	276	273	273
query11	18112	15199	15075	15075
query12	158	103	99	99
query13	1536	416	404	404
query14	9556	7352	7360	7352
query15	256	165	177	165
query16	7950	454	501	454
query17	1592	571	556	556
query18	2120	311	309	309
query19	309	151	149	149
query20	126	109	108	108
query21	211	111	103	103
query22	4618	4417	4380	4380
query23	35320	34424	35514	34424
query24	11849	2962	2930	2930
query25	545	399	404	399
query26	1137	162	160	160
query27	2665	288	277	277
query28	7752	2461	2433	2433
query29	666	433	423	423
query30	259	155	153	153
query31	1004	788	793	788
query32	101	54	58	54
query33	754	288	292	288
query34	935	503	486	486
query35	864	744	739	739
query36	1088	933	944	933
query37	215	88	85	85
query38	4058	3927	3871	3871
query39	1504	1422	1409	1409
query40	260	96	99	96
query41	51	51	49	49
query42	116	95	93	93
query43	519	488	491	488
query44	1292	849	810	810
query45	192	164	164	164
query46	1136	757	768	757
query47	1921	1796	1869	1796
query48	460	362	366	362
query49	1025	427	398	398
query50	831	410	414	410
query51	7131	6882	6811	6811
query52	106	92	84	84
query53	254	180	178	178
query54	1007	472	468	468
query55	78	75	78	75
query56	308	260	281	260
query57	1207	1105	1120	1105
query58	230	245	244	244
query59	3264	2946	3136	2946
query60	311	275	277	275
query61	123	126	126	126
query62	810	681	695	681
query63	231	193	211	193
query64	4890	649	618	618
query65	3308	3185	3219	3185
query66	1167	291	305	291
query67	16110	15603	15512	15512
query68	4532	585	570	570
query69	610	288	301	288
query70	1172	1102	1067	1067
query71	421	268	281	268
query72	7539	4054	4043	4043
query73	757	316	324	316
query74	10444	8966	9087	8966
query75	4286	2668	2687	2668
query76	3681	915	920	915
query77	631	299	300	299
query78	9976	9248	9106	9106
query79	1660	535	543	535
query80	977	439	439	439
query81	589	240	238	238
query82	650	147	144	144
query83	309	132	136	132
query84	280	77	90	77
query85	1352	290	277	277
query86	428	298	303	298
query87	4498	4317	4302	4302
query88	3201	2313	2350	2313
query89	396	285	280	280
query90	1955	182	187	182
query91	194	142	160	142
query92	62	49	49	49
query93	2366	541	541	541
query94	992	281	293	281
query95	361	255	254	254
query96	639	282	278	278
query97	3287	3195	3128	3128
query98	214	203	191	191
query99	1626	1322	1299	1299
Total cold run time: 303997 ms
Total hot run time: 191560 ms

@wuwenchi
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40848 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 035275d728823db5d317f33a127c007ebeefa893, data reload: false

------ Round 1 ----------------------------------
q1	17692	8261	7293	7293
q2	2027	289	270	270
q3	11550	1064	1166	1064
q4	10594	734	767	734
q5	7795	2890	2769	2769
q6	243	152	150	150
q7	997	616	617	616
q8	9368	1955	1966	1955
q9	6655	6462	6402	6402
q10	6998	2327	2307	2307
q11	446	247	243	243
q12	411	220	221	220
q13	17811	2987	3005	2987
q14	233	212	213	212
q15	573	537	527	527
q16	638	594	573	573
q17	977	634	595	595
q18	7276	6657	6792	6657
q19	1403	948	976	948
q20	497	205	206	205
q21	4142	3140	3251	3140
q22	1096	981	1004	981
Total cold run time: 109422 ms
Total hot run time: 40848 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7266	7243	7763	7243
q2	336	227	230	227
q3	3101	2962	2997	2962
q4	2180	1834	1852	1834
q5	5737	5726	5745	5726
q6	241	140	140	140
q7	2240	1833	1802	1802
q8	3412	3419	3523	3419
q9	8902	8985	8877	8877
q10	3587	3518	3585	3518
q11	586	499	491	491
q12	838	623	618	618
q13	10595	3171	3171	3171
q14	317	280	290	280
q15	590	525	521	521
q16	683	647	654	647
q17	1885	1640	1601	1601
q18	8436	7700	7692	7692
q19	1718	1430	1427	1427
q20	2107	1886	1880	1880
q21	5600	5529	5228	5228
q22	1133	1046	1117	1046
Total cold run time: 71490 ms
Total hot run time: 60350 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.31% (9635/25823)
Line Coverage: 28.71% (79723/277669)
Region Coverage: 28.14% (41228/146512)
Branch Coverage: 24.77% (20994/84770)
Coverage Report: http://coverage.selectdb-in.cc/coverage/035275d728823db5d317f33a127c007ebeefa893_035275d728823db5d317f33a127c007ebeefa893/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 192051 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 035275d728823db5d317f33a127c007ebeefa893, data reload: false

query1	956	393	403	393
query2	6257	2040	2008	2008
query3	8684	196	202	196
query4	33541	23562	23449	23449
query5	3867	470	468	468
query6	279	167	182	167
query7	4186	310	307	307
query8	293	210	214	210
query9	9639	2683	2688	2683
query10	469	280	279	279
query11	17768	15230	15331	15230
query12	146	102	96	96
query13	1526	428	410	410
query14	9927	7431	6757	6757
query15	251	172	171	171
query16	8016	407	467	407
query17	1631	625	603	603
query18	2155	324	331	324
query19	342	164	153	153
query20	121	112	113	112
query21	227	108	106	106
query22	5028	4869	4752	4752
query23	34883	33998	33998	33998
query24	10978	2852	2884	2852
query25	630	419	414	414
query26	1190	165	167	165
query27	2366	300	308	300
query28	7530	2439	2431	2431
query29	850	439	435	435
query30	258	149	151	149
query31	1036	804	818	804
query32	102	60	55	55
query33	773	302	304	302
query34	951	508	519	508
query35	901	748	740	740
query36	1115	944	954	944
query37	151	98	91	91
query38	4004	3989	3995	3989
query39	1475	1409	1405	1405
query40	213	100	99	99
query41	53	47	50	47
query42	122	100	95	95
query43	526	497	481	481
query44	1251	818	805	805
query45	201	171	171	171
query46	1164	722	709	709
query47	1978	1868	1874	1868
query48	459	379	369	369
query49	947	416	411	411
query50	842	411	426	411
query51	6944	6992	6905	6905
query52	102	90	91	90
query53	261	179	182	179
query54	1160	467	483	467
query55	79	82	78	78
query56	277	285	264	264
query57	1228	1124	1092	1092
query58	237	232	244	232
query59	3087	2901	2959	2901
query60	291	263	267	263
query61	106	98	105	98
query62	893	681	670	670
query63	215	191	184	184
query64	4001	641	620	620
query65	3269	3195	3196	3195
query66	844	321	299	299
query67	16027	15565	15708	15565
query68	4378	567	569	567
query69	499	293	310	293
query70	1144	1179	1141	1141
query71	373	269	272	269
query72	7387	4071	4099	4071
query73	776	343	354	343
query74	10497	9058	9002	9002
query75	3484	2671	2694	2671
query76	3098	984	851	851
query77	613	289	299	289
query78	10489	9672	9538	9538
query79	3611	618	601	601
query80	1413	459	439	439
query81	587	238	239	238
query82	871	136	136	136
query83	303	133	136	133
query84	280	77	74	74
query85	2141	293	282	282
query86	483	303	272	272
query87	4543	4419	4503	4419
query88	4208	2413	2390	2390
query89	418	292	289	289
query90	2012	187	184	184
query91	177	151	153	151
query92	61	48	48	48
query93	3507	554	550	550
query94	1063	292	294	292
query95	362	253	255	253
query96	632	290	282	282
query97	3247	3093	3132	3093
query98	224	202	194	194
query99	1555	1329	1265	1265
Total cold run time: 305715 ms
Total hot run time: 192051 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.81 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 035275d728823db5d317f33a127c007ebeefa893, data reload: false

query1	0.05	0.04	0.05
query2	0.09	0.05	0.04
query3	0.22	0.05	0.05
query4	1.68	0.08	0.07
query5	0.51	0.49	0.51
query6	1.14	0.75	0.74
query7	0.01	0.02	0.01
query8	0.05	0.05	0.04
query9	0.54	0.49	0.50
query10	0.55	0.56	0.55
query11	0.17	0.12	0.12
query12	0.15	0.12	0.12
query13	0.60	0.60	0.60
query14	2.82	2.82	2.75
query15	0.90	0.83	0.84
query16	0.37	0.38	0.37
query17	1.06	1.00	1.04
query18	0.18	0.17	0.18
query19	1.94	1.80	1.98
query20	0.02	0.01	0.01
query21	15.35	0.66	0.65
query22	4.16	8.41	1.44
query23	18.29	1.42	1.31
query24	2.10	0.22	0.22
query25	0.17	0.08	0.08
query26	0.25	0.18	0.18
query27	0.07	0.07	0.09
query28	13.29	1.16	1.14
query29	12.63	3.38	3.38
query30	0.24	0.07	0.06
query31	2.86	0.42	0.41
query32	3.22	0.48	0.48
query33	3.00	3.01	3.08
query34	17.11	4.47	4.50
query35	4.59	4.58	4.53
query36	0.66	0.49	0.51
query37	0.19	0.17	0.15
query38	0.15	0.15	0.14
query39	0.05	0.04	0.04
query40	0.16	0.13	0.13
query41	0.09	0.05	0.05
query42	0.06	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 111.79 s
Total hot run time: 32.81 s

@@ -95,6 +94,9 @@ IcebergTableReader::IcebergTableReader(std::unique_ptr<GenericReader> file_forma
ADD_CHILD_TIMER(_profile, "DeleteFileReadTime", iceberg_profile);
_iceberg_profile.delete_rows_sort_time =
ADD_CHILD_TIMER(_profile, "DeleteRowsSortTime", iceberg_profile);
if (range.table_format_params.iceberg_params.__isset.row_count) {
_remaining_push_down_count = range.table_format_params.iceberg_params.row_count;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the _remaining_push_down_count is not initialized if row_count is not set.

@wuwenchi
Copy link
Contributor Author

wuwenchi commented Oct 8, 2024

run buildall

Copy link
Contributor

github-actions bot commented Oct 8, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.27% (9629/25834)
Line Coverage: 28.67% (79860/278589)
Region Coverage: 28.09% (41273/146934)
Branch Coverage: 24.71% (21029/85088)
Coverage Report: http://coverage.selectdb-in.cc/coverage/39b217cb21dee7d5491124a8fc45ddee421fc0fc_39b217cb21dee7d5491124a8fc45ddee421fc0fc/report/index.html

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 10, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants