Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](routine load) optimize routine load timeout logic (#40818) #41137

Merged
merged 3 commits into from
Sep 26, 2024

Conversation

sollhui
Copy link
Contributor

@sollhui sollhui commented Sep 23, 2024

pick (#40818)

If IO/CPU resources are tight, routine load task is likely to timeout. The current method is self-adaption backoff
#32227, but the problem is it will do some ineffective work to match proper timeout. For one routine load task, a better way to handle task is finishing executing instead of retry when resources are tight. Therefore, this pr increase timeout to make "task always finish even if it is slow when resources are tight".

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@sollhui
Copy link
Contributor Author

sollhui commented Sep 23, 2024

run buildall

@github-actions github-actions bot added the area/load Issues or PRs related to all kinds of load label Sep 23, 2024
    If IO/CPU resources are tight, routine load task is likely to timeout.
    The current method is self-adaption backoff
    apache#32227, but the problem is it will
    do some ineffective work to match proper timeout. For one routine load
    task, a better way to handle task is finishing executing instead of
    retry when resources are tight. Therefore, this pr increase timeout to
    make "task always finish even if it is slow when resources are tight".
@sollhui
Copy link
Contributor Author

sollhui commented Sep 23, 2024

run buildall

@sollhui
Copy link
Contributor Author

sollhui commented Sep 23, 2024

run buildall

@sollhui
Copy link
Contributor Author

sollhui commented Sep 23, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 49147 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2ceda26dc6a02cb375ae25d26abc56e2097acfc7, data reload: false

------ Round 1 ----------------------------------
q1	17854	4456	4403	4403
q2	2069	151	148	148
q3	10435	1912	1958	1912
q4	10249	1255	1330	1255
q5	8449	3902	3924	3902
q6	237	122	122	122
q7	2054	1622	1599	1599
q8	9329	2754	2713	2713
q9	10442	9966	9698	9698
q10	8628	3529	3546	3529
q11	413	256	254	254
q12	470	295	299	295
q13	18352	4002	4047	4002
q14	350	354	344	344
q15	510	465	463	463
q16	560	463	467	463
q17	1134	982	929	929
q18	7338	6805	6762	6762
q19	1698	1516	1545	1516
q20	542	316	311	311
q21	4501	4163	4139	4139
q22	515	388	398	388
Total cold run time: 116129 ms
Total hot run time: 49147 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4312	4377	4323	4323
q2	327	231	225	225
q3	4185	4152	4155	4152
q4	2808	2761	2775	2761
q5	7203	7136	7174	7136
q6	237	119	120	119
q7	3298	2808	2802	2802
q8	4365	4463	4474	4463
q9	13677	13823	13687	13687
q10	4262	4293	4235	4235
q11	733	690	702	690
q12	1031	842	832	832
q13	6904	3738	3742	3738
q14	458	424	421	421
q15	513	453	457	453
q16	651	595	581	581
q17	3886	3796	3857	3796
q18	8693	8697	8800	8697
q19	1723	1691	1633	1633
q20	2373	2140	2088	2088
q21	8574	8559	8386	8386
q22	1019	918	939	918
Total cold run time: 81232 ms
Total hot run time: 76136 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 211239 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2ceda26dc6a02cb375ae25d26abc56e2097acfc7, data reload: false

query1	932	419	386	386
query2	6529	2226	2187	2187
query3	6926	203	206	203
query4	23274	21202	21499	21202
query5	19769	6494	6563	6494
query6	296	225	239	225
query7	4337	309	312	309
query8	255	268	257	257
query9	3075	2673	2620	2620
query10	466	310	304	304
query11	15865	14957	14931	14931
query12	127	81	73	73
query13	1023	442	446	442
query14	17165	13089	13098	13089
query15	378	224	224	224
query16	5806	275	264	264
query17	1749	925	897	897
query18	889	324	320	320
query19	214	153	161	153
query20	100	101	95	95
query21	191	104	97	97
query22	5257	4896	5051	4896
query23	34314	33446	33494	33446
query24	7815	6258	6212	6212
query25	517	441	421	421
query26	1340	166	163	163
query27	2469	297	295	295
query28	6079	2282	2244	2244
query29	2951	2768	2732	2732
query30	245	166	169	166
query31	966	732	760	732
query32	70	63	49	49
query33	450	243	260	243
query34	868	482	471	471
query35	1136	914	920	914
query36	1483	1103	1173	1103
query37	177	59	60	59
query38	3075	2995	2995	2995
query39	1364	1312	1314	1312
query40	313	92	93	92
query41	40	37	37	37
query42	82	83	86	83
query43	595	597	576	576
query44	1167	721	723	721
query45	245	235	231	231
query46	1234	967	966	966
query47	1856	1952	1839	1839
query48	509	423	417	417
query49	658	370	382	370
query50	860	591	602	591
query51	4725	4691	4723	4691
query52	93	74	85	74
query53	234	190	193	190
query54	2669	2480	2485	2480
query55	85	84	82	82
query56	253	201	205	201
query57	1285	1184	1156	1156
query58	216	212	212	212
query59	3712	3293	3213	3213
query60	228	218	193	193
query61	98	95	122	95
query62	822	487	435	435
query63	208	182	173	173
query64	3609	1591	1520	1520
query65	3576	3574	3510	3510
query66	796	402	410	402
query67	15225	15537	14795	14795
query68	9389	679	661	661
query69	500	253	263	253
query70	1502	1366	1404	1366
query71	396	312	314	312
query72	6882	4720	4729	4720
query73	767	332	328	328
query74	6307	5809	5767	5767
query75	5224	3578	3647	3578
query76	5279	1143	1154	1143
query77	914	261	262	261
query78	12639	11676	11727	11676
query79	8318	642	640	640
query80	1724	395	378	378
query81	489	237	245	237
query82	1698	93	98	93
query83	174	134	134	134
query84	256	71	69	69
query85	916	313	319	313
query86	332	308	286	286
query87	3241	3032	3032	3032
query88	4823	2314	2300	2300
query89	478	280	287	280
query90	1935	210	220	210
query91	173	131	125	125
query92	55	54	49	49
query93	6325	573	573	573
query94	724	206	205	205
query95	1930	1966	2037	1966
query96	632	330	327	327
query97	6512	6326	6339	6326
query98	241	205	210	205
query99	2781	953	856	856
Total cold run time: 318891 ms
Total hot run time: 211239 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.66 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2ceda26dc6a02cb375ae25d26abc56e2097acfc7, data reload: false

query1	0.02	0.02	0.02
query2	0.07	0.03	0.02
query3	0.26	0.04	0.05
query4	1.80	0.07	0.07
query5	0.54	0.53	0.53
query6	1.29	0.61	0.61
query7	0.02	0.01	0.02
query8	0.04	0.03	0.02
query9	0.53	0.48	0.48
query10	0.55	0.53	0.53
query11	0.12	0.08	0.09
query12	0.12	0.09	0.10
query13	0.63	0.62	0.61
query14	0.80	0.77	0.78
query15	0.80	0.76	0.75
query16	0.39	0.38	0.36
query17	1.00	1.01	1.02
query18	0.24	0.26	0.26
query19	1.92	1.87	1.87
query20	0.01	0.00	0.01
query21	15.46	0.55	0.56
query22	2.06	2.33	1.59
query23	17.13	0.89	0.84
query24	6.70	1.32	0.98
query25	0.39	0.11	0.07
query26	0.73	0.14	0.15
query27	0.04	0.04	0.04
query28	6.19	0.73	0.73
query29	12.87	2.34	2.29
query30	0.57	0.54	0.54
query31	2.81	0.41	0.37
query32	3.35	0.49	0.49
query33	3.10	3.09	3.04
query34	15.26	4.78	4.81
query35	4.87	4.81	4.87
query36	1.05	1.01	1.02
query37	0.06	0.04	0.05
query38	0.03	0.02	0.02
query39	0.02	0.01	0.02
query40	0.17	0.15	0.15
query41	0.07	0.01	0.02
query42	0.02	0.02	0.01
query43	0.02	0.02	0.02
Total cold run time: 104.12 s
Total hot run time: 30.66 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 2ceda26dc6a02cb375ae25d26abc56e2097acfc7 with default session variables
Stream load json:         20 seconds loaded 2358488459 Bytes, about 112 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       20.9 seconds inserted 10000000 Rows, about 478K ops/s

@dataroaring dataroaring merged commit d45e219 into apache:branch-2.0 Sep 26, 2024
21 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/load Issues or PRs related to all kinds of load
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants