Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](memory) Fix erase invalid MemTrackerLimiter from tracker pool #33074

Merged
merged 3 commits into from
Mar 31, 2024

Conversation

xinyiZzz
Copy link
Contributor

Proposed changes

Bug introduced by #32039

F20240330 22:37:15.508097 61876 mem_tracker_limiter.cpp:596] Check failed: tracker != nullptr

 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:421
 1# 0x00007FE2CE248090 in /lib/x86_64-linux-gnu/libc.so.6
 2# raise at ../sysdeps/unix/sysv/linux/raise.c:51
 3# abort at /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:81
 4# 0x0000559E053C857D in /home/work/unlimit_teamcity/TeamCity/Agents/20240330193530agent_172.16.0.76_1/work/60183217f6ee2a9c/output/be/lib/doris_be
 5# google::LogMessage::SendToLog() in /home/work/unlimit_teamcity/TeamCity/Agents/20240330193530agent_172.16.0.76_1/work/60183217f6ee2a9c/output/be/lib/doris_be
 6# google::LogMessage::Flush() in /home/work/unlimit_teamcity/TeamCity/Agents/20240330193530agent_172.16.0.76_1/work/60183217f6ee2a9c/output/be/lib/doris_be
 7# google::LogMessageFatal::~LogMessageFatal() in /home/work/unlimit_teamcity/TeamCity/Agents/20240330193530agent_172.16.0.76_1/work/60183217f6ee2a9c/output/be/lib/doris_be
 8# doris::MemTrackerLimiter::free_top_overcommit_query(long, doris::MemTrackerLimiter::Type, std::vector<doris::TrackerLimiterGroup, std::allocator<doris::TrackerLimiterGroup> >&, std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&, doris::RuntimeProfile*, doris::MemTrackerLimiter::GCType) in /home/work/unlimit_teamcity/TeamCity/Agents/20240330193530agent_172.16.0.76_1/work/60183217f6ee2a9c/output/be/lib/doris_be
 9# doris::MemTrackerLimiter::free_top_overcommit_query(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, doris::RuntimeProfile*, doris::MemTrackerLimiter::Type) at /root/doris/be/src/runtime/memory/mem_tracker_limiter.cpp:547
10# doris::MemInfo::process_minor_gc() at /root/doris/be/src/util/mem_info.cpp:162
11# doris::Daemon::memory_gc_thread() in /home/work/unlimit_teamcity/TeamCity/Agents/20240330193530agent_172.16.0.76_1/work/60183217f6ee2a9c/output/be/lib/doris_be
12# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:499

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@xinyiZzz
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.54% (8837/24864)
Line Coverage: 27.28% (72465/265610)
Region Coverage: 26.48% (37497/141584)
Branch Coverage: 23.30% (19122/82074)
Coverage Report: http://coverage.selectdb-in.cc/coverage/5ad2b58439946ebd4dd99937cf285af01a833bb1_5ad2b58439946ebd4dd99937cf285af01a833bb1/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38658 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5ad2b58439946ebd4dd99937cf285af01a833bb1, data reload: false

------ Round 1 ----------------------------------
q1	17735	4122	4041	4041
q2	2100	191	180	180
q3	10470	1212	1382	1212
q4	10198	956	977	956
q5	7463	2947	2927	2927
q6	212	131	132	131
q7	1087	633	601	601
q8	9410	2043	2017	2017
q9	6738	6190	6134	6134
q10	8446	3483	3506	3483
q11	415	240	225	225
q12	389	222	213	213
q13	17785	2887	2948	2887
q14	275	243	248	243
q15	534	474	473	473
q16	510	389	380	380
q17	951	912	906	906
q18	7626	6452	6455	6452
q19	1592	1513	1522	1513
q20	607	294	308	294
q21	3486	3089	3088	3088
q22	363	311	302	302
Total cold run time: 108392 ms
Total hot run time: 38658 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4040	4002	3996	3996
q2	331	223	229	223
q3	2945	2935	2917	2917
q4	1885	1822	1819	1819
q5	5226	5189	5234	5189
q6	209	125	124	124
q7	2207	1803	1791	1791
q8	3189	3252	3253	3252
q9	8445	8462	8443	8443
q10	3697	3942	3990	3942
q11	557	468	457	457
q12	741	596	617	596
q13	14394	3133	3120	3120
q14	309	276	276	276
q15	546	486	504	486
q16	513	472	465	465
q17	1804	1724	1726	1724
q18	8138	7739	7730	7730
q19	1693	1671	1664	1664
q20	2071	1807	1845	1807
q21	5136	4962	4972	4962
q22	494	449	453	449
Total cold run time: 68570 ms
Total hot run time: 55432 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181201 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5ad2b58439946ebd4dd99937cf285af01a833bb1, data reload: false

query1	1221	353	1112	353
query2	6318	1771	1921	1771
query3	6671	210	215	210
query4	24903	21442	21440	21440
query5	4171	401	411	401
query6	266	180	178	178
query7	4604	299	314	299
query8	237	176	174	174
query9	8496	2222	2229	2222
query10	438	253	255	253
query11	15038	14548	14552	14548
query12	145	98	96	96
query13	1636	388	371	371
query14	8492	6766	6901	6766
query15	209	177	175	175
query16	6861	279	280	279
query17	976	589	572	572
query18	1864	293	282	282
query19	211	168	162	162
query20	100	97	98	97
query21	194	134	136	134
query22	4962	4880	4760	4760
query23	33173	32797	32596	32596
query24	12680	3232	3293	3232
query25	719	439	442	439
query26	1899	182	177	177
query27	3287	385	384	384
query28	7081	1909	1879	1879
query29	1288	673	650	650
query30	301	165	168	165
query31	1023	764	778	764
query32	96	64	67	64
query33	712	252	259	252
query34	1142	516	513	513
query35	896	765	736	736
query36	1016	874	864	864
query37	274	85	89	85
query38	3737	3532	3556	3532
query39	1659	1630	1608	1608
query40	241	148	147	147
query41	49	47	46	46
query42	115	111	114	111
query43	453	404	417	404
query44	1187	745	735	735
query45	285	260	264	260
query46	1136	855	822	822
query47	2005	1902	1918	1902
query48	391	324	318	318
query49	953	381	379	379
query50	838	418	413	413
query51	6844	6798	6900	6798
query52	110	101	99	99
query53	379	321	310	310
query54	321	269	245	245
query55	95	83	82	82
query56	258	245	225	225
query57	1269	1195	1222	1195
query58	256	237	241	237
query59	2628	2703	2290	2290
query60	270	236	230	230
query61	93	91	88	88
query62	628	454	461	454
query63	320	294	289	289
query64	5835	3461	3276	3276
query65	3037	3032	3031	3031
query66	1307	324	330	324
query67	15372	15206	14697	14697
query68	9377	563	598	563
query69	592	325	326	325
query70	1357	1102	1105	1102
query71	543	267	277	267
query72	6384	2599	2434	2434
query73	1525	328	325	325
query74	6879	6238	6263	6238
query75	3754	2306	2286	2286
query76	5723	1108	1214	1108
query77	566	250	249	249
query78	10821	10168	10051	10051
query79	9288	535	536	535
query80	1538	421	429	421
query81	506	220	227	220
query82	329	100	99	99
query83	211	162	164	162
query84	265	85	89	85
query85	941	303	292	292
query86	370	278	305	278
query87	3702	3450	3478	3450
query88	3081	2323	2295	2295
query89	545	378	378	378
query90	2059	176	185	176
query91	142	115	105	105
query92	60	50	53	50
query93	5546	516	513	513
query94	1247	198	194	194
query95	432	337	333	333
query96	619	271	268	268
query97	2625	2509	2486	2486
query98	224	228	218	218
query99	1191	840	805	805
Total cold run time: 298113 ms
Total hot run time: 181201 ms

@xinyiZzz
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

ClickBench: Total hot run time: 30.42 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5ad2b58439946ebd4dd99937cf285af01a833bb1, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.24	0.04	0.04
query4	1.68	0.06	0.06
query5	0.50	0.50	0.49
query6	1.13	0.66	0.65
query7	0.02	0.02	0.01
query8	0.04	0.04	0.04
query9	0.56	0.52	0.50
query10	0.55	0.56	0.56
query11	0.15	0.12	0.11
query12	0.14	0.12	0.11
query13	0.60	0.59	0.59
query14	0.78	0.78	0.80
query15	0.87	0.83	0.84
query16	0.36	0.35	0.36
query17	0.97	1.00	0.99
query18	0.26	0.25	0.25
query19	1.84	1.73	1.75
query20	0.02	0.01	0.01
query21	15.56	0.77	0.67
query22	3.54	4.96	1.96
query23	17.48	1.43	1.11
query24	1.75	0.28	0.24
query25	0.13	0.09	0.09
query26	0.28	0.17	0.18
query27	0.08	0.09	0.09
query28	13.40	0.94	0.92
query29	12.65	3.46	3.65
query30	0.28	0.08	0.09
query31	2.80	0.39	0.39
query32	3.29	0.48	0.47
query33	2.86	2.86	2.90
query34	15.48	4.32	4.29
query35	4.35	4.35	4.35
query36	0.67	0.48	0.48
query37	0.20	0.16	0.18
query38	0.18	0.17	0.15
query39	0.04	0.04	0.03
query40	0.18	0.14	0.15
query41	0.10	0.05	0.06
query42	0.06	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 106.24 s
Total hot run time: 30.42 s

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39542 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit cded4dbb0a455bb29ffd969ced419406b62dc112, data reload: false

------ Round 1 ----------------------------------
q1	17624	4119	4072	4072
q2	2094	197	192	192
q3	10566	1260	1396	1260
q4	10371	900	1053	900
q5	7486	2973	2945	2945
q6	216	133	129	129
q7	1120	654	632	632
q8	9420	2078	2061	2061
q9	7458	6806	6726	6726
q10	8512	3530	3508	3508
q11	430	246	246	246
q12	399	223	217	217
q13	17789	2900	2894	2894
q14	279	241	247	241
q15	529	475	475	475
q16	519	393	391	391
q17	982	930	904	904
q18	7770	6584	6501	6501
q19	1618	1536	1572	1536
q20	605	309	315	309
q21	3596	3104	3161	3104
q22	358	299	321	299
Total cold run time: 109741 ms
Total hot run time: 39542 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4049	4037	4039	4037
q2	329	216	218	216
q3	2977	3001	2934	2934
q4	1895	1839	1857	1839
q5	5261	5221	5227	5221
q6	213	121	123	121
q7	2234	1813	1802	1802
q8	3219	3281	3306	3281
q9	9099	9114	9072	9072
q10	3805	3837	3810	3810
q11	534	451	432	432
q12	710	521	551	521
q13	5152	2926	2895	2895
q14	294	257	256	256
q15	514	485	476	476
q16	468	407	406	406
q17	1700	1710	1679	1679
q18	7629	7446	7214	7214
q19	1648	1642	1646	1642
q20	1948	1736	1743	1736
q21	4988	4779	4740	4740
q22	500	437	430	430
Total cold run time: 59166 ms
Total hot run time: 54760 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181537 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit cded4dbb0a455bb29ffd969ced419406b62dc112, data reload: false

query1	1260	1131	1124	1124
query2	6483	1844	1801	1801
query3	6668	216	211	211
query4	24922	21570	21511	21511
query5	4233	419	407	407
query6	271	187	179	179
query7	4607	303	311	303
query8	233	183	189	183
query9	8496	2244	2229	2229
query10	558	253	262	253
query11	15000	14692	14633	14633
query12	149	106	100	100
query13	1646	397	394	394
query14	8643	6724	6759	6724
query15	221	182	190	182
query16	7159	276	275	275
query17	1002	617	587	587
query18	1914	303	299	299
query19	218	177	172	172
query20	100	98	97	97
query21	201	128	126	126
query22	4982	4860	4851	4851
query23	33676	32782	32721	32721
query24	12615	3179	3152	3152
query25	694	403	412	403
query26	1899	165	170	165
query27	3049	341	333	333
query28	6698	1841	1840	1840
query29	1319	618	612	612
query30	304	158	153	153
query31	997	729	728	728
query32	98	62	63	62
query33	722	270	267	267
query34	1023	496	508	496
query35	858	703	704	703
query36	1035	861	866	861
query37	290	83	81	81
query38	3573	3371	3331	3331
query39	1629	1564	1656	1564
query40	299	143	145	143
query41	51	48	50	48
query42	121	105	109	105
query43	434	393	392	392
query44	1080	716	721	716
query45	282	271	266	266
query46	1094	820	787	787
query47	1892	1786	1809	1786
query48	380	322	317	317
query49	1175	375	380	375
query50	823	401	413	401
query51	6739	6516	6595	6516
query52	115	99	98	98
query53	370	304	294	294
query54	330	252	256	252
query55	96	93	86	86
query56	265	235	239	235
query57	1225	1103	1105	1103
query58	254	245	239	239
query59	2366	2279	2240	2240
query60	281	252	270	252
query61	116	112	110	110
query62	715	464	451	451
query63	320	292	294	292
query64	6633	3078	3041	3041
query65	3095	3038	3036	3036
query66	1451	347	337	337
query67	15636	15065	15271	15065
query68	9023	569	578	569
query69	578	340	360	340
query70	1394	1089	1121	1089
query71	505	284	279	279
query72	6447	2600	2448	2448
query73	1550	324	335	324
query74	6714	6298	6337	6298
query75	3670	2294	2293	2293
query76	5397	1054	1168	1054
query77	616	271	259	259
query78	10869	10044	10085	10044
query79	9811	536	533	533
query80	1398	441	435	435
query81	496	228	228	228
query82	396	100	110	100
query83	225	167	171	167
query84	275	91	90	90
query85	1023	292	287	287
query86	369	297	281	281
query87	3635	3483	3491	3483
query88	3973	2319	2311	2311
query89	557	382	381	381
query90	2079	180	187	180
query91	142	111	116	111
query92	63	53	53	53
query93	6470	532	519	519
query94	1363	195	198	195
query95	430	328	333	328
query96	627	271	269	269
query97	2632	2477	2512	2477
query98	240	224	211	211
query99	1175	835	805	805
Total cold run time: 301225 ms
Total hot run time: 181537 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.55% (8840/24864)
Line Coverage: 27.29% (72495/265610)
Region Coverage: 26.50% (37523/141584)
Branch Coverage: 23.31% (19132/82074)
Coverage Report: http://coverage.selectdb-in.cc/coverage/cded4dbb0a455bb29ffd969ced419406b62dc112_cded4dbb0a455bb29ffd969ced419406b62dc112/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 32.1 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit cded4dbb0a455bb29ffd969ced419406b62dc112, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.04	0.03
query3	0.24	0.04	0.04
query4	1.68	0.07	0.07
query5	0.50	0.47	0.50
query6	1.13	0.64	0.65
query7	0.02	0.01	0.01
query8	0.06	0.05	0.04
query9	0.56	0.51	0.50
query10	0.56	0.57	0.56
query11	0.15	0.11	0.11
query12	0.14	0.12	0.12
query13	0.60	0.60	0.60
query14	0.77	0.78	0.79
query15	0.85	0.83	0.83
query16	0.34	0.36	0.35
query17	1.01	1.00	0.99
query18	0.26	0.26	0.26
query19	1.80	1.70	1.71
query20	0.01	0.02	0.01
query21	15.54	0.76	0.67
query22	3.11	4.42	3.67
query23	17.35	1.27	1.06
query24	2.00	0.23	0.22
query25	0.15	0.10	0.08
query26	0.28	0.18	0.18
query27	0.08	0.08	0.08
query28	13.37	0.99	0.95
query29	12.61	3.42	3.41
query30	0.28	0.09	0.08
query31	2.81	0.39	0.38
query32	3.28	0.47	0.47
query33	2.86	2.88	2.90
query34	15.51	4.33	4.36
query35	4.39	4.37	4.37
query36	0.68	0.47	0.48
query37	0.19	0.19	0.18
query38	0.17	0.16	0.15
query39	0.05	0.04	0.05
query40	0.18	0.15	0.15
query41	0.10	0.05	0.06
query42	0.07	0.05	0.06
query43	0.04	0.04	0.05
Total cold run time: 105.88 s
Total hot run time: 32.1 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit cded4dbb0a455bb29ffd969ced419406b62dc112 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       16.8 seconds inserted 10000000 Rows, about 595K ops/s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 31, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@dataroaring dataroaring merged commit 9ceceb4 into apache:master Mar 31, 2024
29 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants