Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](Schema Change ) fix schema change fail as internal sorting will no change to run #39979

Open
wants to merge 1 commit into
base: branch-2.0
Choose a base branch
from

Conversation

GoGoWen
Copy link
Contributor

@GoGoWen GoGoWen commented Aug 27, 2024

Proposed changes

when doing sorting schema change, the memory limit for changer and internal sorting now is memory limit of schema change task, witch will case the internal sorting no chance to run, and the schema change failed.

this pr try to limit the changer and internal sorting to std::min(0.5*memory_limit_of_schema_change_per_thread, memory_limitation_per_thread_for_schema_change_internal_sorting_bytes) to let the internal sorting and changer have enough memory to run.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@GoGoWen
Copy link
Contributor Author

GoGoWen commented Aug 27, 2024

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -235,6 +235,8 @@ class StorageEngine {

int64_t memory_limitation_bytes_per_thread_for_schema_change() const;

int64_t memory_limitation_bytes_per_thread_for_schema_change_internal_sorting() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'memory_limitation_bytes_per_thread_for_schema_change_internal_sorting' should be marked [[nodiscard]] [modernize-use-nodiscard]

Suggested change
int64_t memory_limitation_bytes_per_thread_for_schema_change_internal_sorting() const;
[[nodiscard]] int64_t memory_limitation_bytes_per_thread_for_schema_change_internal_sorting() const;

@doris-robot
Copy link

TPC-H: Total hot run time: 49891 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 09f2604941de989e80f7d139a08a2d8e0c8a50df, data reload: false

------ Round 1 ----------------------------------
q1	18076	4499	4362	4362
q2	2062	158	151	151
q3	10437	2033	1892	1892
q4	10263	1270	1373	1270
q5	8584	3910	3937	3910
q6	232	121	123	121
q7	2033	1603	1615	1603
q8	9336	2762	2725	2725
q9	10698	10341	10285	10285
q10	8758	3537	3521	3521
q11	425	235	254	235
q12	463	298	307	298
q13	18371	3948	3982	3948
q14	349	324	323	323
q15	515	469	459	459
q16	687	579	565	565
q17	1163	972	994	972
q18	7320	6891	7010	6891
q19	1711	1598	1553	1553
q20	559	326	285	285
q21	4508	4135	4181	4135
q22	506	389	387	387
Total cold run time: 117056 ms
Total hot run time: 49891 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4378	4320	4316	4316
q2	319	230	228	228
q3	4164	4125	4120	4120
q4	2764	2774	2754	2754
q5	7228	7172	7079	7079
q6	239	118	121	118
q7	3267	2793	2857	2793
q8	4403	4515	4511	4511
q9	16796	16745	16870	16745
q10	4241	4249	4268	4249
q11	745	718	673	673
q12	1028	846	845	845
q13	7277	3742	3781	3742
q14	442	433	413	413
q15	505	463	465	463
q16	736	690	684	684
q17	3919	3841	3855	3841
q18	8902	8810	8871	8810
q19	1732	1707	1667	1667
q20	2361	2118	2093	2093
q21	8428	8513	8468	8468
q22	1019	938	932	932
Total cold run time: 84893 ms
Total hot run time: 79544 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.86% (8140/21500)
Line Coverage: 29.60% (66989/226333)
Region Coverage: 29.09% (34562/118811)
Branch Coverage: 25.00% (17808/71244)
Coverage Report: http://coverage.selectdb-in.cc/coverage/09f2604941de989e80f7d139a08a2d8e0c8a50df_09f2604941de989e80f7d139a08a2d8e0c8a50df/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 202783 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 09f2604941de989e80f7d139a08a2d8e0c8a50df, data reload: false

query1	945	395	415	395
query2	6540	2233	2193	2193
query3	6920	212	203	203
query4	21325	18132	18085	18085
query5	19743	6541	6551	6541
query6	292	216	238	216
query7	4150	308	321	308
query8	268	262	230	230
query9	3117	2745	2621	2621
query10	435	290	305	290
query11	11415	10730	10708	10708
query12	131	80	74	74
query13	5598	651	665	651
query14	18009	13766	13013	13013
query15	366	214	233	214
query16	6458	285	261	261
query17	1706	1453	872	872
query18	2316	410	417	410
query19	211	155	151	151
query20	82	78	80	78
query21	189	96	91	91
query22	5215	5063	4860	4860
query23	32708	31910	31980	31910
query24	7015	6517	6465	6465
query25	515	426	413	413
query26	527	165	167	165
query27	1915	297	305	297
query28	6065	2294	2251	2251
query29	2933	2713	2885	2713
query30	241	168	166	166
query31	917	770	773	770
query32	71	68	64	64
query33	387	264	253	253
query34	860	481	481	481
query35	1132	893	929	893
query36	1325	1254	1138	1138
query37	89	64	60	60
query38	3071	2926	2972	2926
query39	1375	1330	1322	1322
query40	212	98	95	95
query41	42	37	39	37
query42	86	88	86	86
query43	649	615	580	580
query44	1140	719	723	719
query45	247	233	230	230
query46	1237	967	985	967
query47	1971	1772	1663	1663
query48	980	676	653	653
query49	638	376	376	376
query50	876	613	621	613
query51	4715	4718	4584	4584
query52	97	87	86	86
query53	444	323	326	323
query54	2668	2464	2469	2464
query55	95	82	88	82
query56	233	235	206	206
query57	1136	1114	1092	1092
query58	223	213	216	213
query59	3553	3363	3460	3363
query60	218	212	202	202
query61	96	92	93	92
query62	822	430	487	430
query63	484	364	347	347
query64	2573	1526	1379	1379
query65	3591	3555	3565	3555
query66	815	379	387	379
query67	16221	16911	16096	16096
query68	8373	652	654	652
query69	579	357	395	357
query70	1651	1322	1433	1322
query71	404	312	329	312
query72	6511	3512	3506	3506
query73	740	333	330	330
query74	6329	5897	5900	5897
query75	4811	3729	3720	3720
query76	4867	1157	1216	1157
query77	693	261	268	261
query78	12677	11730	18466	11730
query79	6427	664	651	651
query80	1712	405	414	405
query81	505	241	232	232
query82	269	98	101	98
query83	177	136	142	136
query84	262	74	70	70
query85	1167	319	324	319
query86	348	294	334	294
query87	3255	3041	3009	3009
query88	4322	2378	2388	2378
query89	345	285	304	285
query90	1827	216	214	214
query91	163	122	128	122
query92	61	53	56	53
query93	924	564	609	564
query94	811	215	214	214
query95	1146	1099	1059	1059
query96	640	336	339	336
query97	6530	6380	6395	6380
query98	194	173	179	173
query99	2940	932	883	883
Total cold run time: 304299 ms
Total hot run time: 202783 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.42 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 09f2604941de989e80f7d139a08a2d8e0c8a50df, data reload: false

query1	0.02	0.02	0.02
query2	0.07	0.02	0.03
query3	0.24	0.04	0.05
query4	1.80	0.06	0.07
query5	0.53	0.52	0.52
query6	1.24	0.60	0.61
query7	0.02	0.01	0.01
query8	0.04	0.03	0.02
query9	0.52	0.50	0.46
query10	0.53	0.55	0.54
query11	0.11	0.09	0.08
query12	0.12	0.09	0.08
query13	0.63	0.62	0.63
query14	0.80	0.78	0.78
query15	0.79	0.75	0.76
query16	0.36	0.35	0.39
query17	1.01	1.02	1.03
query18	0.21	0.27	0.23
query19	1.92	1.87	1.79
query20	0.02	0.01	0.01
query21	15.47	0.62	0.56
query22	2.51	2.06	1.74
query23	17.43	0.98	0.82
query24	4.68	3.50	1.57
query25	0.42	0.13	0.05
query26	0.76	0.16	0.16
query27	0.04	0.04	0.03
query28	5.44	0.76	0.77
query29	12.81	2.47	2.38
query30	0.56	0.54	0.54
query31	2.80	0.40	0.38
query32	3.34	0.49	0.50
query33	3.07	3.07	3.05
query34	15.24	4.79	4.82
query35	4.86	4.87	4.84
query36	1.06	1.02	1.02
query37	0.06	0.05	0.04
query38	0.04	0.02	0.02
query39	0.02	0.02	0.01
query40	0.16	0.14	0.14
query41	0.07	0.01	0.02
query42	0.02	0.01	0.02
query43	0.02	0.01	0.02
Total cold run time: 101.86 s
Total hot run time: 31.42 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 09f2604941de989e80f7d139a08a2d8e0c8a50df with default session variables
Stream load json:         20 seconds loaded 2358488459 Bytes, about 112 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       21.2 seconds inserted 10000000 Rows, about 471K ops/s

@GoGoWen
Copy link
Contributor Author

GoGoWen commented Aug 27, 2024

close it as #39995 is fixed

@GoGoWen
Copy link
Contributor Author

GoGoWen commented Aug 27, 2024

fixed in #39995

@GoGoWen GoGoWen closed this Aug 27, 2024
@GoGoWen GoGoWen reopened this Aug 28, 2024
@GoGoWen GoGoWen closed this Aug 28, 2024
@GoGoWen GoGoWen reopened this Aug 28, 2024
@GoGoWen
Copy link
Contributor Author

GoGoWen commented Aug 28, 2024

as pr #39995 not fix the schema change failure totally, in our env, the failure still exist for some big table even change the config: memory_limitation_per_thread_for_schema_change_bytes . so reopen this, as this pr introduce another config: memory_limitation_per_thread_for_schema_change_internal_sorting_bytes to limit the memory usage in sorting schema change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants